Voss
Orchestration

A controlled AI engineering organization.

Most AI coding tools optimize one agent writing code faster. Voss optimizes verified parallel engineering: multiple agents, declared roles, independent review, hard budgets, scoped tools, and a replayable audit.

Engineering Manager loop

One idea in. Audited work out.

The Engineering Manager loop is the orchestrator: a constrained tech lead that decomposes, delegates, verifies, and integrates — and asks for you only when it matters.

01

Scope into cards

The EM converts one human idea into bounded work cards with acceptance criteria.

02

Assign roles

Each card is routed to a declared role from the team roster, with a recorded rationale.

03

Partition budget

Budget and scope fan out down the session tree. No child can overspend its parent.

04

Execute in parallel

Workers run concurrently inside their scope, within WIP limits, where it is safe to.

05

Verify continuously

Reviewer-A authors the verification bar from the original idea — not the EM's summary.

06

Review independently

Reviewer-B judges the diff narrative-blind and can fail idea-divergent work.

07

Block or integrate

Unverified or out-of-scope work is blocked with a reason. Clean work is integrated.

08

Audit and sign off

A replayable audit report is produced. Humans sign off only at meaningful moments.

External agents

Handoffs without false control.

Voss can supervise agents it launches more tightly than tools it only observes. For adopted terminal agents, the product direction is deliberately honest: give them clear identity, ownership signals, review requests, and handoff conventions, while avoiding claims that Voss can sandbox tools outside its control.

The result is a practical coordination layer for mixed local work: Voss-native runs, external CLI agents, and human review can share the same board, attention model, and audit expectations without collapsing into hidden chat history.

Roles

A roster, declared in .voss.

The default roster ships with architect, backend, frontend, tester, reviewer, skeptic, and docs. Each role carries its own scope, budget, tool subset, and model tier. The EM can only dispatch to declared roles — and voss team check fails the build if a role widens scope or names an unknown capability.

team "default" {
  ceiling {
    budget: 120000 tokens
    scope: ["src/**", "tests/**", "docs/**"]
    latency: 30m
  }

  principles {
    diff: "Smallest diff that solves it"
    evidence: "No claim without evidence"
  }

  role architect {
    model: "strong"
    mode:  "plan"
    scope: ["src/**", "docs/**"]
    tools: ["fs", "code", "git"]
    budget: 12000 tokens
  }

  role backend {
    model: "cheap"
    mode:  "edit"
    scope: ["src/server/**", "tests/server/**"]
    tools: ["fs", "code", "test", "git"]
    budget: 24000 tokens
  }

  role reviewer {
    model: "strong"
    mode:  "plan"
    scope: ["src/**", "tests/**"]
    tools: ["fs", "code", "test", "git"]
    budget: 16000 tokens
  }
}

Board

Orchestration you can watch.

Work moves across a board with WIP limits and transition gates. Agents cannot mark their own work Done. Every blocked card carries a reason.

Backlog

Raw ideas, not yet scoped.

Planned

EM has authored acceptance criteria + role.

InProgress

Scope and budget allocated; worker running.

InReview

Artifact exists; awaiting independent review.

Blocked

Timeout, budget, scope error, or reviewer block.

Done

Tests pass and independent review passed.

FromToGate
BacklogPlannedEM creates acceptance criteria + role assignment
PlannedInProgressScope and budget allocated
InProgressInReviewArtifact exists
InReviewDoneTests/evals pass AND independent review passes
AnyBlockedTimeout, budget, scope error, reviewer block, or human decision
BlockedPlannedEM rescope or human approval

Independent review.

Reviewer-A derives the verification bar from your original idea — not the EM's acceptance criteria — and authors the tests, evals, or checklist. Worker agents cannot author their own final gate.

Reviewer-B judges the artifact, diff, and tests narrative-blind, and can fail work when Reviewer-A's verification has drifted from the original idea. Verdicts carry confidence, evidence references, and an inferred domain.

Reviewer verdicts in the audit

Session tree + budget fan-out.

Every agent and subagent is a first-class recorded node with its own budget, scope, status, and artifacts. The invariant is hard: sum(child budgets) + reserve ≤ parent budget.

Budget is treated as a security boundary, not telemetry. Rejected budget-raise attempts are recorded, and failed, killed, or timed-out children still reach a terminal state — so a run reconstructs without reading the chat transcript.

Session tree docs

The cage

What the orchestrator cannot do.

Autonomy is bounded by invariants the runtime enforces — not by trust in the model.

The EM cannot invent roles outside the declared roster
Workers cannot write outside their assigned scope
Budget cannot be oversold — it is a security boundary
Agents cannot mark their own work Done
Done requires independent reviewer evidence
Ceiling, confidence threshold, and roster are immutable mid-run