Skip to content

router/v1 — Intent Classification + Sink Routing

Status: Draft (design-locked, ready for first implementation) · Stability: v1 will be frozen with the first reference router (router-v1) · Implementations: in-tree only until v1 is frozen.

The router/v1 surface sits between asr/v1 (which produces transcribed text in partially-filled IntentEnvelopes) and the orchestrator (which fans envelopes out to sinks per sink/v1). The router does two jobs:

  1. Classify the envelope's intent (Intent.Kind, Intent.Confidence, optional Intent.Reasoning).
  2. Route the envelope to sinks (Routing.PrimarySink, Routing.AlsoTo, Routing.Suppress).

Optionally, the router also derives envelopes — splitting compound utterances, rewriting transcripts via coreference, emitting session summaries, and dropping noise.

This document is the contract. Routers that conform to it can be loaded by any version of the Vox core that supports router/v1.


Scope

router/v1 covers:

  • The router's input contract (partially-filled IntentEnvelope)
  • The classifier chain (pluggable: rules / local-model / LLM)
  • Per-stream history (shallow context window)
  • Routing rule format (declarative table + per-source overrides)
  • Suppress patterns (kill-switch list)
  • Bundled routing presets
  • Derived envelope rules (splitting, coreference, summaries)
  • Drop rules
  • Provenance stamping for derived envelopes
  • Router interface lifecycle
  • Error semantics
  • Audit hook contract
  • Versioning and stability rules

router/v1 does not cover:

  • Audio capture or transcription (capture/v1, asr/v1)
  • Speaker diarization (segment/v1)
  • Sink delivery (sink/v1)
  • The orchestrator that drives the pipeline (internal)
  • Predicate-based dynamic routing rules (deferred to v1.x — additive)
  • RBAC / authz checks on routing decisions (Enterprise: authz/v1)

Input — what the router consumes

The router receives a partially-filled IntentEnvelope (full schema in sink-v1.md). When asr/v1 finishes transcribing a speech segment, the envelope has:

  • Populated: Identity (EnvelopeID, SessionID, StreamID, ParentID), Time span (StartedAt, EndedAt, Duration), Content (Transcript, Language, Confidence), Speaker (Label, SourceKind, optional Embedding), Provenance (ASRBackend, SegmenterImpl, CapturedAt, Pipeline), optional AudioRef.
  • Empty: Intent block, Routing block.

The router fills Intent and Routing, then emits the completed envelope.

Per-stream history

The router maintains a per-stream sliding window of recent envelopes for coreference and short-term context.

router:
  history_depth: 1                    # default: just the previous envelope
                                      # range: 0 (stateless) to 20
  • Per-stream, not per-session — different streams (your mic vs system audio) keep independent histories.
  • Read-only from the classifier's perspective. The router never mutates past envelopes; if it needs to derive new content from past context, it emits a new envelope with ParentID set.
  • Stateless mode (history_depth: 0) supported for deterministic-replay and diagnostic use cases.

Classifier Chain

The router doesn't classify directly. It dispatches to classifiers in a configurable chain. Each classifier returns {intent_kind, confidence, reasoning} or "abstain".

Three classifier families (v1)

Classifier Cost Latency budget Determinism Default in chain?
rules Free 10 ms Yes Yes
local-model Free 500 ms Mostly Yes
llm Tokens 5 s No No (opt-in)

Chain semantics

  1. Classifiers run in declared order.
  2. Each classifier returns confidence ∈ [0.0, 1.0] or abstains.
  3. First classifier with confidence ≥ min_confidence_threshold wins; chain stops.
  4. If all classifiers abstain or fall below threshold → Intent.Kind = unclassified → routes per routes.unclassified (fallback).

Configuration

router:
  history_depth: 1
  min_confidence_threshold: 0.7

  classifier_chain:
    - type: rules
      patterns_file: ~/.vox/router/rules.yaml
    - type: local-model
      model_path: ~/.vox/router/intent-classifier.onnx
    # llm classifier omitted by default — opt-in
    # - type: llm
    #   sink: llm-anthropic              # reuse an existing LLM sink in "classify" mode
    #   system_prompt_template: ~/.vox/router/classify.tmpl

rules classifier

Lightweight regex / keyword / verb-pattern matching. The file format:

rules:
  - intent: command
    patterns:
      - "^(create|make|add|file|open)\\s+(a |an )?(bd |beads |task |issue |ticket)"
      - "^(remind me to|todo|action item:)"
    confidence: 0.85

  - intent: question
    patterns:
      - "\\?$"
      - "^(what|why|how|when|where|who|can you|could you|would you|do you|does)"
    confidence: 0.75

  - intent: prompt
    patterns:
      - "^(write|generate|draft|compose|tell me)"
    confidence: 0.7

  - intent: note
    patterns:
      - "^(note:|fyi:|just noting|btw)"
    confidence: 0.8

Bundled defaults ship with Vox; users override per repo or per user.

local-model classifier

A small ONNX text classifier (DistilBERT-class), fetchable via vox model download intent-classifier-v1. Not bundled (license + size). Custom models welcome with the same input/output signature:

  • Input: UTF-8 transcript text
  • Output: one of the 9 IntentKind values + a confidence float

llm classifier

Reuses an existing LLM sink in "classify" mode. The router sends a structured prompt; the LLM returns JSON with intent_kind + confidence + reasoning. Reasoning is captured in Intent.Reasoning for audit.

Off by default because every utterance becomes an LLM call → cost + latency + offline-story breakage. Users with budget and quality requirements enable it as the chain's terminal classifier.

Latency budgets

Classifier Default budget On overrun
rules 10 ms Hard cap; abort classifier
local-model 500 ms Skipped if budget exhausted
llm 5 s User-tuned; configurable
End-to-end router 600 ms with rules+local-model router.latency_exceeded telemetry event

Routing Rule Format

The v1 routing table

Declarative intent_kind → {primary, also_to} with optional by_source override.

router:
  routes:
    prompt:
      primary: llm-anthropic
      also_to: [s3]
    command:
      primary: llm-anthropic
      also_to: [bd, s3]
    todo:
      primary: bd
      also_to: [s3, ox-ledger]
    note:
      primary: ox-ledger
      also_to: [s3]
    question:
      primary: llm-anthropic
      also_to: [s3]
    summary:
      primary: email-smtp
      also_to: [s3, ox-ledger]
    raw_transcript:
      primary: s3
      also_to: [ox-ledger]
    llm_response:
      primary: s3
      also_to: [email-smtp]
    unclassified:
      primary: local-file              # fallback when classifier gives up
      also_to: []

  # Per-source-kind overrides (sparse — only entries that differ from default)
  by_source:
    online:
      summary:
        primary: email-smtp
        also_to: [s3, ox-ledger, bd]    # meetings extract action items into bd
      todo:
        primary: bd
        also_to: [s3, ox-ledger, email-smtp]
    self:
      prompt:
        primary: llm-anthropic
        also_to: []                     # don't archive every dictation prompt
    file:
      raw_transcript:
        primary: local-file
        also_to: []

Lookup order

  1. by_source[envelope.Speaker.SourceKind][envelope.Intent.Kind] — most specific
  2. routes[envelope.Intent.Kind] — default for that intent
  3. routes.unclassified — terminal fallback

Result populates Routing.PrimarySink and Routing.AlsoTo. Sinks then apply their own filter blocks (locked in sink/v1) — both layers must pass for actual delivery.


Suppress Patterns — The Safety Net

Pattern-based opt-outs that go straight into Routing.Suppress. Additive to any Suppress entries from upstream stages.

router:
  suppress:
    - name: private-marker
      pattern: "(?i)\\bprivate\\b"
      sinks: [s3, ox-ledger, email-smtp]      # mentioned "private" — keep off shared surfaces
    - name: secret-detection
      pattern: "(?i)\\b(password|secret|api[_\\s]?key|token|bearer\\s)\\b"
      sinks: [s3, ox-ledger, email-smtp, llm-anthropic, llm-openai, llm-google]
                                                # secrets stay local-only
    - name: pii-email-fallback
      pattern: "[\\w.+-]+@[\\w-]+\\.[\\w.-]+"
      sinks: [llm-google]                      # example: never send emails to a specific LLM

Defaults

Two patterns ship enabled by default — table-stakes safety:

  • private-marker — "private" keyword → off shared surfaces
  • secret-detection — common secret patterns → no network sinks

Disable via router.suppress.enabled: false (not recommended).


Bundled Presets

Most users won't write a routing table from scratch. They pick a preset and tweak.

router:
  preset: meetings-and-dictation          # bundled preset name
  overrides:                              # applied on top of the preset
    summary:
      also_to: [s3, ox-ledger, my-custom-sink]

The six bundled presets

Preset Primary use case Key routing characteristic
meetings-and-dictation (default) Mixed: meetings + dictation + calls Balanced; all intent_kinds routed sensibly
dictation-only Voice memos, prompt composition Self source emphasized; commands → bd; prompts → LLM; minimal archive
meeting-capture Recording meetings (in-person + online) online + in-person emphasized; summary → email; action items → bd; full archive
archive-everything Compliance / over-record posture Every envelope to s3 + ox-ledger regardless of intent
local-only No-network / offline / paranoid All sinks are local-file or bd; no LLM, no email, no S3, no ox
llm-heavy Power user with budget LLM in classification chain enabled; LLM primary for most intents; full archive

Presets live at internal/router/presets/<name>.yaml. vox router presets lists them; vox router preset show <name> prints the resolved config. Copy and edit:

vox router preset show meetings-and-dictation > ~/.vox/router.yaml

Derived Envelopes

The router can emit zero, one, or many envelopes per input. Four derivation modes:

(a) Splitting compound utterances — opt-in

Compound utterances ("send Sarah an email and create a bd issue") become multiple envelopes. The LLM classifier handles segmentation; rules and local-model can't reliably detect compounds.

router:
  splitting:
    enabled: false                    # opt-in; off by default
    classifier: llm                   # LLM is required for reliable splitting
    min_segment_confidence: 0.75

When enabled, each output segment becomes its own envelope:

  • New EnvelopeID for each segment
  • ParentID = <original envelope's EnvelopeID> (audit linkage)
  • Same SessionID, StreamID, StartedAt, EndedAt, Speaker, Provenance as the parent
  • Transcript = <segment text>
  • Intent.Kind = <segment kind>

(b) Coreference resolution

If the current transcript contains a pronominal reference (it, that, this, one, the same, do so, them) AND the previous envelope is recent enough, the router rewrites the transcript.

router:
  coreference:
    enabled: true                     # default ON — basic mode is cheap and useful
    mode: prepend-previous            # prepend-previous | rewrite-llm | off
    pronouns: [it, that, this, one, the same, "do so", "them"]
    max_context_chars: 200
    require_short_gap: 30s

prepend-previous (default): rewrite by appending the previous transcript as parenthetical context.

Original:    "do that for next Tuesday"
Previous:    "remind me to email Sarah"
Rewritten:   "do that for next Tuesday (referring to: 'remind me to email Sarah')"

Dumb but reliable. Transparent (downstream can see what was done). Zero LLM cost.

rewrite-llm (opt-in): sends current + previous to an LLM with a "rewrite to be self-contained" prompt. More accurate, costs tokens.

When coreference applies, Custom["router.coreference_applied"] = true is stamped on the envelope.

(c) Summary triggers — opt-in

The router accumulates envelopes per session and emits derived summary envelopes on:

Trigger Default Configurable
Session end (orchestrator emits session.ended event) Always when summaries enabled triggers.on_session_end: bool
Idle timeout 10 minutes triggers.idle_timeout: <duration>
Envelope count threshold Off triggers.envelope_threshold: <int>
Explicit command (vox session summarize) Always available

Two summary content modes:

Mode Behavior Use case
concatenate (default) Transcript = newline-joined envelopes in time order, with speaker labels prefixed Cheap; works offline; raw transcript IS the summary
llm Route the concatenated transcript through a designated LLM sink in summarize mode; LLM response becomes the summary envelope's transcript Higher quality; costs tokens; needs LLM sink configured
router:
  summaries:
    enabled: true
    mode: concatenate                 # concatenate | llm
    llm_sink: llm-anthropic           # required if mode: llm
    llm_prompt_template: ~/.vox/router/summarize.tmpl
    triggers:
      on_session_end: true
      idle_timeout: 10m
      envelope_threshold: 0

The emitted summary envelope:

  • Intent.Kind = summary
  • ParentID = "" (derived from many envelopes — use Custom["router.source_envelope_ids"] to list them)
  • SessionID matches the session being summarized
  • StreamID = "" (multi-stream summary)
  • StartedAt / EndedAt span the entire session
  • Provenance.RouterImpl = "router-v1"
  • Custom["router.summary_mode"] = "concatenate" or "llm"

Routes through the normal routing table (default summary route: email-smtp + s3 + ox-ledger).

(d) Drop semantics

Configurable drop_rules — envelopes matching are dropped entirely, not routed anywhere.

router:
  drop_rules:
    - intent_kind: unclassified
      max_confidence: 0.3             # drop ONLY if classifier was REALLY uncertain
    - intent_kind: raw_transcript
      always: true                    # drop all raw transcripts (rare config)
    - max_transcript_chars: 5         # drop anything 5 chars or less
  log_dropped: true                   # log at DEBUG; counter always increments

Default drop rules:

  • unclassified with confidence < 0.3 — filler ("uh", "hmm")
  • Any transcript ≤ 5 chars

Dropped envelopes increment router.dropped counter and emit an audit/v1 event when audit is loaded.


Derived Envelope Provenance

Every router-produced envelope (split, coreference-rewritten, summary) carries uniform provenance stamping:

Field Value
EnvelopeID new UUID
ParentID source envelope's EnvelopeID, or "" for many-to-one (summary)
Custom["router.source_envelope_ids"] for summaries: comma-separated list of contributing envelope IDs
Custom["router.derivation"] "split" / "coreference" / "summary"
Provenance.RouterImpl the router's identifier (e.g., "router-v1")
Custom["router.summary_mode"] populated for summary envelopes
Custom["router.coreference_applied"] true if coreference rewrote the transcript

An auditor can reconstruct: "this summary covers these 47 envelopes from session X; this envelope had its transcript rewritten by coreference against envelope Y."


Router Interface

Router {
  # Identity
  Name()          -> string
  Capabilities()  -> Capabilities

  # Lifecycle
  Open(config)    -> Error
  Close()         -> Error            # drains internal state with timeout

  # Hot path
  Route(ctx, envelope)  -> []IntentEnvelope | RouterError
    # Returns zero or more output envelopes:
    #   - zero  → envelope dropped (drop_rules matched)
    #   - one   → standard 1:1 classification
    #   - many  → splitting fired (compound utterance)

  # Asynchronous emission (summaries, late LLM classifications)
  Emissions()     -> <-chan IntentEnvelope

  # Session lifecycle awareness (for summary triggers)
  OnSessionEvent(event)               # "session.started" | "session.ended" | "session.idle"

  # Diagnostics
  Stats()         -> Stats            # routed_in, routed_out, classifier_hits,
                                      # latency_p50/p99, drops, errors
  Health()        -> Health
}

Capabilities {
  SupportedClassifiers []string       # "rules" | "local-model" | "llm"
  SupportsSplitting    bool
  SupportsCoreference  bool
  SupportsSummaries    bool
  MaxHistoryDepth      uint32
}

Orchestrator loop (informative)

  1. envelope = asr.NextEnvelope()
  2. outputs, err = router.Route(ctx, envelope)
  3. For each output in outputs: orchestrator sends to sinks per output.Routing
  4. Concurrently, orchestrator drains router.Emissions() for async derived envelopes (summaries that arrive after a session-end trigger)
  5. Orchestrator forwards session lifecycle events (OnSessionEvent) so the router can fire summary triggers

One router per pipeline

router/v1 does NOT support a chain of routers. Composition happens inside the router via the classifier chain. One router instance, configured via classifier chain + routing table + summary settings.

Reasoning: - Chain of routers would duplicate speaker resolution, history bookkeeping, summary accumulation - A single router with internal composition is simpler to debug - Users with exotic needs can write a wrapper router that delegates internally


Error Model

Typed errors, mirroring capture/v1 + sink/v1:

RouterError {
  Kind     RouterErrorKind
  Stage    string                     # "classifier:rules" | "classifier:llm" |
                                      # "split" | "coreference" | "summary"
  Message  string
  Cause    Error?
}

RouterErrorKind {
  ErrClassifierUnavailable           # specific classifier down
  ErrAllClassifiersFailed            # entire chain abstained or errored
  ErrInvalidEnvelope                 # input envelope failed validation
  ErrInternal                        # bug
  ErrBudgetExceeded                  # latency budget hit
}

Failure handling

Failure Action
Classifier crashes / panics Router-side recover; mark classifier unhealthy; take out of rotation for health_recovery_interval (60s default); continue with remaining classifiers
All classifiers abstain or below threshold Intent.Kind = unclassified, Intent.Confidence = 0, route per routes.unclassified (default fallback: local-file)
Classifier returns invalid IntentKind Treat as abstain; log structured warning; classifier stays in rotation; counter increments
Splitting LLM call fails Emit the original envelope unchanged (single output); log warning
Coreference LLM rewrite fails Fall through to prepend-previous (or skip if that's also failing); envelope continues
Summary generation (llm mode) fails Fall back to concatenate mode and emit; orchestrator gets a summary envelope regardless
Router itself crashes Orchestrator-side recover; mid-route envelopes dropped (logged + audit event); router restarted
Route() exceeds budget Return whatever the chain produced so far; emit router.latency_exceeded event; count budget violation

Key principle: the router never blocks the pipeline. Worst case is Intent.Kind = unclassified routing to fallback — never a hung pipeline.

Route() returns output envelopes AND optionally a RouterError for telemetry. The error is informational, never fatal — for stats and audit only.


Audit Hook

When audit/v1 is loaded, the router MUST emit a RouterDecisionEvent for every routing decision:

RouterDecisionEvent {
  Timestamp        Timestamp
  EnvelopeID       string
  ParentID         string?
  InputTranscript  string              # transcript as classified (may differ from output if coreference rewrote)
  OutputTranscript string              # final transcript on the output envelope
  Intent {
    Kind        IntentKind
    Confidence  float
    Reasoning   string?
  }
  Routing {
    PrimarySink    string
    AlsoTo         []string
    Suppress       []string
  }
  ClassifierChain []ClassifierStep     # which classifiers fired, in order
  Derivation      string?              # "split" | "coreference" | "summary" | null
  LatencyMS       uint32
}

ClassifierStep {
  Name        string                   # "rules" | "local-model" | "llm-anthropic"
  Confidence  float
  Abstained   bool
  LatencyMS   uint32
  Reasoning   string?
}

Routing decisions are the most consequential trust boundary in Vox — they determine where your voice content ends up. An auditor reconstructing "why did my password show up in an S3 bucket" needs exactly this data.

When audit/v1 is NOT loaded, the router emits the same data as structured logs at DEBUG level (configurable to INFO via router.log_decisions_at: info).


Versioning and Stability

router/v1 is the contract above. Once frozen:

  • Non-breaking changes (allowed in v1.x): adding optional fields to Capabilities / Stats / configuration; adding new classifier types; adding new presets; adding new drop_rules options; adding the deferred predicate-based extra_rules (with CEL or similar) as additive opt-in.
  • Breaking changes (require v2): changing the routing-table lookup semantics; changing the Route() signature; changing how splitting / coreference / summary derivations work in a non-additive way; removing or repurposing any existing field.

The core supports one vN of router/ at a time, with overlap during migrations.

v1.x additive features (shipped)

Filler-word removal

After the classifier sets Intent, the router optionally strips common verbal noise tokens ("um", "uh", "like" as discourse marker, "you know", etc.) from the transcript before it reaches sinks. Classification always runs on the raw transcript; filler removal is a post-classification pass.

router:
  filler_removal_enabled: true          # default: true; set false to disable
  filler_removal_words: []              # override the default list (empty = use built-in list)

Default filler list: um, uh, er, ah, hmm, like, you know, sort of, kind of, basically, literally, actually, well, so, right, okay.

When fillers are stripped, three provenance keys are stamped on the envelope: - router.filler_removed = true - router.filler_count = <int> — number of distinct filler tokens removed - router.original_transcript = <string> — the raw pre-removal transcript

If the entire utterance consists of fillers (fewer than 3 non-whitespace characters would remain), removal is skipped and the original is preserved.

Snippet templates (voice text-expander)

A pre-classification stage replaces a matched trigger phrase with a configured expansion, then lets normal classification run on the expanded text.

router:
  snippets:
    "vox slack ack": "Got it, will follow up shortly."
    "vox standup status": "Yesterday: shipped X. Today: working on Y. Blockers: none."
    "vox sign off email": "Best, Jay"

Matching is case- and punctuation-insensitive. When a snippet fires, three provenance keys are stamped: - router.snippet_expanded = true - router.snippet_trigger = <original transcript> — what the user said - router.snippet_match = <normalized key> — the matched trigger

Snippets default to empty (disabled). Configure via router.snippets in the config map.

Predicate-based extra_rules (deferred to v1.x)

router:
  extra_rules:                                  # opt-in; evaluated AFTER routes + by_source
    - when: "Intent.Kind == 'command' && contains(Transcript, 'urgent')"
      add_to: [email-smtp]                      # additive — not replacement
    - when: "Speaker.Label == 'cto'"
      add_to: [s3-cto-archive]

Why not in v1: - Requires a sandboxed expression engine (likely CEL) - Adds a debugging dimension ("why didn't my rule fire?") that hurts predictability - Most use cases are already covered by by_source overrides + suppress patterns

Path to v1.x: - Add CEL or equivalent as the expression engine - Gate behind router.extra_rules.enabled flag - Document the expression DSL with examples - Ship as additive — no breaking change to v1's table-based config


Reference Implementation

One built-in router ships in v1: router-v1 (aka default). It implements everything above.

Users can swap it for a custom router via router.implementation: <name>, but in practice the built-in covers the entire grilled design space.

Enterprise: router-rbac-audit (planned)

Same router/v1 contract, adds:

  • RBAC checks before allowing certain routes (e.g., "envelopes from speakers without compliance:read cannot route to LLM sinks")
  • Immutable audit storage integration (authz/v1 + enterprise audit store)
  • Per-team / per-user routing policy overrides

Lives in the enterprise repo. Loaded via the same registry mechanism; swap-in is a config change.


Project Principle: Opinionated Defaults, Every Default Configurable

This contract continues the principle from capture/v1 and sink/v1. Every behavior with a defensible default (min_confidence_threshold: 0.7, history_depth: 1, coreference.mode: prepend-previous, summaries.mode: concatenate, the meetings-and-dictation preset, etc.) is exposed as a config knob. Defaults reflect a considered recommendation for the typical voice-to-LLM use case; the knobs exist so specialized workflows can tune them.