Backlog
Raw ideas, not yet scoped.
Most AI coding tools optimize one agent writing code faster. Voss optimizes verified parallel engineering: multiple agents, declared roles, independent review, hard budgets, scoped tools, and a replayable audit.
Engineering Manager loop
The Engineering Manager loop is the orchestrator: a constrained tech lead that decomposes, delegates, verifies, and integrates — and asks for you only when it matters.
The EM converts one human idea into bounded work cards with acceptance criteria.
Each card is routed to a declared role from the team roster, with a recorded rationale.
Budget and scope fan out down the session tree. No child can overspend its parent.
Workers run concurrently inside their scope, within WIP limits, where it is safe to.
Reviewer-A authors the verification bar from the original idea — not the EM's summary.
Reviewer-B judges the diff narrative-blind and can fail idea-divergent work.
Unverified or out-of-scope work is blocked with a reason. Clean work is integrated.
A replayable audit report is produced. Humans sign off only at meaningful moments.
External agents
Voss can supervise agents it launches more tightly than tools it only observes. For adopted terminal agents, the product direction is deliberately honest: give them clear identity, ownership signals, review requests, and handoff conventions, while avoiding claims that Voss can sandbox tools outside its control.
The result is a practical coordination layer for mixed local work: Voss-native runs, external CLI agents, and human review can share the same board, attention model, and audit expectations without collapsing into hidden chat history.
Roles
The default roster ships with architect, backend, frontend, tester, reviewer, skeptic, and docs. Each role carries its own scope, budget, tool subset, and model tier. The EM can only dispatch to declared roles — and voss team check fails the build if a role widens scope or names an unknown capability.
team "default" {
ceiling {
budget: 120000 tokens
scope: ["src/**", "tests/**", "docs/**"]
latency: 30m
}
principles {
diff: "Smallest diff that solves it"
evidence: "No claim without evidence"
}
role architect {
model: "strong"
mode: "plan"
scope: ["src/**", "docs/**"]
tools: ["fs", "code", "git"]
budget: 12000 tokens
}
role backend {
model: "cheap"
mode: "edit"
scope: ["src/server/**", "tests/server/**"]
tools: ["fs", "code", "test", "git"]
budget: 24000 tokens
}
role reviewer {
model: "strong"
mode: "plan"
scope: ["src/**", "tests/**"]
tools: ["fs", "code", "test", "git"]
budget: 16000 tokens
}
}Board
Work moves across a board with WIP limits and transition gates. Agents cannot mark their own work Done. Every blocked card carries a reason.
Raw ideas, not yet scoped.
EM has authored acceptance criteria + role.
Scope and budget allocated; worker running.
Artifact exists; awaiting independent review.
Timeout, budget, scope error, or reviewer block.
Tests pass and independent review passed.
| From | To | Gate |
|---|---|---|
| Backlog | Planned | EM creates acceptance criteria + role assignment |
| Planned | InProgress | Scope and budget allocated |
| InProgress | InReview | Artifact exists |
| InReview | Done | Tests/evals pass AND independent review passes |
| Any | Blocked | Timeout, budget, scope error, reviewer block, or human decision |
| Blocked | Planned | EM rescope or human approval |
Reviewer-A derives the verification bar from your original idea — not the EM's acceptance criteria — and authors the tests, evals, or checklist. Worker agents cannot author their own final gate.
Reviewer-B judges the artifact, diff, and tests narrative-blind, and can fail work when Reviewer-A's verification has drifted from the original idea. Verdicts carry confidence, evidence references, and an inferred domain.
Reviewer verdicts in the auditEvery agent and subagent is a first-class recorded node with its own budget, scope, status, and artifacts. The invariant is hard: sum(child budgets) + reserve ≤ parent budget.
Budget is treated as a security boundary, not telemetry. Rejected budget-raise attempts are recorded, and failed, killed, or timed-out children still reach a terminal state — so a run reconstructs without reading the chat transcript.
Session tree docsThe cage
Autonomy is bounded by invariants the runtime enforces — not by trust in the model.