DevAgent — Architecture & Usage Guide

Autonomous ticket-to-fix pipeline. Receives Jira tickets via webhook, classifies them, investigates the codebase, and posts structured triage reports back to Jira. Runs as a systemd service on Hetzner (136.243.36.27) as the devagent user.

1. System Overview

DevAgent is an autonomous development system that processes support tickets end-to-end. It uses a two-phase architecture: Phase A classifies the ticket in ~6 seconds, and Phase B performs a deep codebase investigation in ~4 minutes. The system spawns Claude Code CLI as subprocesses, uses Mem0 (Qdrant + Ollama) for a learning loop, and posts structured reports back to Jira.

Component Technology
Runtime Node.js 20+, Express
AI Engine Claude Code CLI (spawned as subprocesses via claude --print)
Memory Mem0 — Qdrant (vector store at :6333) + Ollama (bge-m3 embeddings at :11434)
Ticket System Jira REST API (webhooks in, comments out)
Subagents Claude Code agent definitions (.md files): triage, dev-planner, dev-executor
Host Hetzner AX41 (136.243.36.27), systemd service, devagent user

2. Architecture Diagram

Current two-phase pipeline (solid) and planned future phases (dashed).

Live Current Pipeline Two-phase triage
Jira Webhook POST /webhooks/jira
Express Port 9200
Phase A: Classify ~6s, max-turns 3
Mem0 Search Prior context
Phase B: Investigate ~4 min, max-turns 25
Jira Comment REST API
Mem0 Store Learning loop
Planned Fix Pipeline Post-approval automation
Phase B Output Triage report
Approval Gate Slack approve/reject
DevPlanner Subagent
DevExecutor Subagent
Pull Request GitHub
Shared Infrastructure
Q
Qdrant
localhost:6333 — Vector store
O
Ollama
localhost:11434 — bge-m3 embeddings
J
Jira REST API
Comments, transitions
C
Claude Code CLI
Spawned as subprocess

3. Source Files

Located at /root/repos/datastudios-dev-agent/ on Hetzner.

File Purpose
src/index.js Express app and orchestration. Entry point with classifyTicket()investigateTicket()handleTicket() flow. Also contains test endpoints (/test/classify, /test/triage, /health).
src/triage-ai.js Prompt builders: buildClassifyPrompt() (Phase A — slim client registry, returns JSON), buildInvestigationPrompt() (Phase B — full prompt with Mem0 context, data assets, migration awareness), buildTriagePrompt() (legacy single-phase).
src/core/session-manager.js Spawns claude CLI as subprocesses via child_process.spawn(). Manages session state in /tmp/triage-agent/sessions/. 30-minute timeout. Handles Phase A (claude --print --max-turns 3) and Phase B (claude --print --agent triage --max-turns 25).
src/core/client-registry.js Loads and caches config/clients.json. Provides client lookup by Jira project key, email domain, or client ID.
src/mem0-client.js Searches Qdrant directly using Ollama bge-m3 embeddings. searchMemories() queries with user_id: devagent-{client_id} filter. buildStoreInstruction() appends learning-loop instructions to triage prompt.
src/jira-updater.js Posts ADF-formatted comments and transitions issues via Jira REST API. Used after Phase B to post the investigation report.
src/webhook-listener.js Express router for Jira webhooks at POST /webhooks/jira. Validates shared secret header, extracts ticket data from webhook payload.
config/clients.json Client registry — maps clients to repos, functions, data assets, external APIs. See Section 5 for schema details.

4. Subagent Definitions

Located at /root/.claude/agents/ on Hetzner (symlinked from datastudios-ops/agents/).

Phase B triage.md

Read-only investigator. Explores the target repo, reads CLAUDE.md, examines code, and produces a structured triage report. Has a Data Investigation Protocol for pipeline issues (reads dbt models, checks OData sources, verifies field population).

Model claude-sonnet-4-6
Max turns 25
Tools
Read Glob Grep Bash Agent
Output Structured report: TRIAGE, ROOT CAUSE, AFFECTED FILES, PROPOSED FIX, COMPLEXITY, CONFIDENCE
Planned dev-planner.md

Read-only planner. Produces implementation plans with affected files, ordered steps, downstream impact analysis, and test requirements. Has Data Enhancement Protocol for pipeline schema changes.

Model claude-sonnet-4-6
Tools
Read Glob Grep Bash Agent WebFetch
Output Implementation plan with files, steps, impact, tests
Planned dev-executor.md

Write-capable executor. Creates feature branches, implements changes, runs tests, and commits. Uses isolation: worktree for safe, isolated execution.

Model claude-sonnet-4-6
Tools
Read Write Edit Glob Grep Bash Agent
Isolation Git worktree (creates feature branch)

5. Client Registry

config/clients.json maps clients to repos, functions, data assets, and external APIs. Phase A uses a slim version (no functions metadata) for fast classification. Phase B gets the full registry.

{ "clients": [ { "client_id": "hmr-designs", "display_name": "HMR Designs", "jira_project": "HD", // ① "email_domains": ["hmrdesigns.com"], "repos": [ { "name": "hmr-aws-lambda-functions", "path": "/root/repos/hmr-aws-...", "tech_stack": ["Python", "AWS Lambda"], "key_paths": { "lambdas": "src/lambdas/", "shared": "src/shared/" }, "functions": [...] // ② } ], "external_apis": { // ③ "nutshell": { "base_url": "https://app.nutshell.com/...", "auth_type": "basic", "investigation_guide": "..." } }, "data_assets": { ... } // ④ } ] }
1 Jira Project Key

Used to route incoming webhooks to the correct client. Phase A matches the ticket's project key against this field.

2 Functions Array

For multi-function repos (e.g., 20 Lambdas in one repo). Each function has name, path, description, and example_requests. Stripped in slim registry for Phase A speed.

3 External APIs

API credentials and investigation guides for external systems (e.g., Nutshell CRM). Includes auth type, methods, and step-by-step investigation playbooks.

4 Data Assets

Source systems, database schemas, and key tables. Used for data-type tickets. Includes legacy flags and migration targets.

Currently Configured Clients

Client Jira Key Repos
datastudios DAT lead-generator, article-writer, contactos
creme-collective CC creme-analytics (legacy), creme-report-automation, creme-elt
hmr-designs HD hmr-aws-lambda-functions (20 Lambdas), hmr-nutshell-integration (8 Lambdas)

6. How It Works

End-to-end flow from ticket creation to Jira comment.

  1. Ticket Created
    Jira webhook fires to POST /webhooks/jira. The webhook listener validates the shared secret header and extracts ticket ID, summary, description, and reporter email from the payload.
  2. Phase A — Classify
    Runs claude --print --max-turns 3 with a slim client registry (no functions metadata). Returns JSON: {client_id, repo, issue_type, severity, summary}. The issue_type determines which investigation protocol Phase B uses (bug, data, feature, infra).
    ~6 seconds
  3. Mem0 Search
    Embeds the ticket summary via Ollama bge-m3, searches Qdrant for prior context using user_id: devagent-{client_id} filter. If relevant memories exist, they are injected into the Phase B prompt as additional context.
  4. Phase B — Investigate
    Runs claude --print --agent triage --max-turns 25 with cwd set to the target repo path from Phase A. The triage agent reads CLAUDE.md, explores code with Glob/Grep/Read, and produces a structured report with: triage summary, root cause analysis, affected files, proposed fix, complexity rating, and confidence level. For data-type issues, follows the Data Investigation Protocol (dbt models, OData sources, field population checks).
    ~2–4 minutes
  5. Post to Jira
    Adds the investigation report as an ADF-formatted comment on the original ticket. The comment includes all sections of the structured report (typically 3,000–5,000 characters).
  6. Learning Loop
    The triage agent's prompt includes instructions to store findings in Mem0 for future reference. Key findings (root causes, patterns, recurring issues) are embedded and stored in Qdrant for retrieval on subsequent tickets.

7. Infrastructure

All components run on the Hetzner AX41 server.

Component Details
Service systemctl status devagent — runs as devagent user (not root)
Port 9200
Repo path /root/repos/datastudios-dev-agent/
Session files /tmp/triage-agent/sessions/
Claude auth /home/devagent/.claude/.credentials.json (OAuth, refreshed every 4h via cron)
Token refresh /root/scripts/refresh-claude-token.sh — cron every 4 hours. OAuth endpoint: https://platform.claude.com/v1/oauth/token
AWS access /home/devagent/.aws/ (hmr-designs profile for CloudWatch)
Mem0 — Qdrant localhost:6333
Mem0 — Ollama localhost:11434 (bge-m3 embedding model)
Agent definitions /root/.claude/agents/ (symlinked from datastudios-ops/agents/)
Logs journalctl -u devagent -f

8. Test Endpoints

Use these endpoints to test classification and triage without triggering real Jira webhooks.

Health Check
curl http://localhost:9200/health
Returns JSON with status, uptime, and version.
Classify Only
curl -X POST \ http://localhost:9200/test/classify \ -H 'Content-Type: application/json' \ -d '{"ticketId":"HD-92", "summary":"Nutshell not converting", "description":"..."}'
Returns Phase A classification JSON. ~6 seconds.
Full Triage
curl -X POST \ http://localhost:9200/test/triage \ -H 'Content-Type: application/json' \ -d '{"ticketId":"HD-92", "summary":"Nutshell not converting", "description":"...", "reporter":"user@hmrdesigns.com"}'
Async — runs Phase A + B, posts result to Jira. Check Jira for the triage comment.

9. Known Gaps & Next Steps

Roadmap with Jira ticket references. Items are ordered by priority.

Item Status Ticket
Two-phase triage (classify → investigate) Done DAT-38
Subagent definitions (triage, dev-planner, dev-executor) Done DAT-46
HMR test case (Nutshell event creation — HD-92) Done HD-92
Creme test case (data investigation — DSS-2) Done DSS-2
Architecture & usage guide In Progress DAT-47
Repo rename (triage-agent → dev-agent) To Do DAT-48
Approval Gate (Slack approve/reject before fix) To Do DAT-43
Fix on approval (dev-executor integration) To Do DAT-44
HMR COB hours test case To Do DAT-42
Meeting-prep DevAgent handoff (Step D9) To Do DAT-45
Mem0 historical context (seed with past investigations) To Do DAT-41
Email → Jira pipeline (support@datastudios.ai auto-creates tickets) Not Filed
Live data access (AWS CLI, external APIs) from triage agent Not Filed

10. Lessons Learned

Hard-won findings from building and testing DevAgent. Read these before making changes.

Non-root is mandatory
Claude Code blocks --dangerously-skip-permissions when running as root. The service must run as a non-root user (devagent). This required creating the user, copying Claude auth + AWS creds, and updating the systemd unit file.
OAuth tokens expire
The Claude OAuth token expires periodically. A cron job at /root/scripts/refresh-claude-token.sh runs every 4 hours to refresh it. The token is stored at /home/devagent/.claude/.credentials.json.
Session file ownership
Session files in /tmp/triage-agent/sessions/ created by a previous user (e.g., root) cause EACCES errors after switching to the devagent user. Clean with rm -rf /tmp/triage-agent after user changes.
Phase A must be fast
Phase A classification must complete in <10 seconds. Use a slim client registry (no functions metadata) to keep the prompt small. Current measured time: ~6 seconds with max-turns: 3.
Phase B max-turns: 25
The original max-turns: 10 was too low for deep investigation — the agent would cut short before fully exploring the codebase. 25 is the sweet spot, completing in ~2–4 minutes.
Code ≠ live data
The triage agent found a real code bug (validation gap) but missed the primary issue (API instability) because it couldn’t query CloudWatch or the Nutshell API. Live data access is the biggest remaining gap.
Multiple investigation layers
In the HD-92 test, three independent investigations (DevAgent, CloudWatch analysis, developer review) each found real issues at different layers. DevAgent is complementary to human debugging, not a replacement.
Nutshell API reference
An LLM-friendly API reference at docs/NUTSHELL_API_REFERENCE.md in the hmr-nutshell-integration repo enables the triage agent to understand external API interactions. Consider adding similar references for other external APIs.

11. Mem0 Memory Lifecycle

How DevAgent stores, retrieves, and expires memories across triage runs.

Memory Types

Type Metadata Expiration Example
Permanent learning type: "permanent_learning" Never Nutshell API intermittently returns non-JSON responses — check CloudWatch first
Known bug type: "known_bug", jira_ticket: "HD-93" When Jira ticket is Done/Closed start_end_valid validation fails silently when Event End Date is not set

user_id Convention

Client user_id
HMR Designs devagent-hmr-designs
Creme Collective devagent-creme-collective
DataStudios devagent-datastudios

Retrieval Rules (Ticket-Linked Memories)

  1. Search Mem0
    After Phase A classification, searchMemories() queries Qdrant with the ticket summary and user_id: devagent-{client_id}.
  2. Check metadata
    For each returned memory, read metadata.type. If permanent_learning, include it directly. If known_bug, proceed to step 3.
  3. Verify Jira status
    Read metadata.jira_ticket and check its status via Jira REST API. If the ticket is Done or Closed, skip the memory or flag it as resolved. If Open or In Progress, inject it into the Phase B prompt normally.

Future: Automated Cleanup (Option 4)

When a Jira ticket transitions to Done, a webhook can trigger automatic Mem0 cleanup — updating or deleting memories tagged with that ticket ID. This automates the retrieval-time check above and removes stale context proactively. Not yet implemented.