Skip to content

sink/v1 — Intent Envelope Output Contract

Status: Draft (design-locked, ready for first implementation) · Stability: v1 will be frozen with the first reference sinks (local-file, llm-anthropic) · Implementations: in-tree only until v1 is frozen.

The sink/v1 surface is the exit of every Vox pipeline. A sink is anywhere a structured intent envelope ends up: a Large Language Model with the user's own API key, an S3-compatible bucket, an email summary, a bd task, a local JSONL file. Sinks share a single base contract; each sink type layers domain-specific behavior on top.

This is the contract that captures the entire purpose of the system. The intent envelope schema and the sink interface together define what everything upstream (capturesegmentasrrouter) must produce.


Scope

sink/v1 covers:

  • The intent envelope schema — the unit of data flowing into sinks
  • The base sink interface — lifecycle, write path, completion reporting, error model
  • The multi-sink orchestration model — fan-out semantics, routing, and failure isolation
  • Reference contracts for the six built-in sink families: LLM (BYOK), S3-compatible storage, email (SMTP + transactional), local file, bd
  • The sink registry — registration mechanism shared between open and enterprise sinks

sink/v1 does not cover:

  • How envelopes are produced (segment/v1, asr/v1, router/v1)
  • Tool execution after an LLM tool-call response (separate tools/v1 surface)
  • Identity / RBAC (enterprise: authz/v1)
  • Audit storage details (audit/v1; sinks emit audit events but don't manage the audit store)

The Intent Envelope

The intent envelope is the load-bearing data structure of Vox. Every upstream stage produces them; every sink consumes them.

IntentEnvelope {
  # Identity
  EnvelopeID    string         # uuid4
  SessionID     string         # matches the capture-side SessionID; groups envelopes by session
  StreamID      string         # the capture stream this envelope came from
  ParentID      string?        # optional; for envelopes derived from another (LLM responses, summaries, re-routes)

  # Time span
  StartedAt     Timestamp      # wall-clock of the first audio sample this envelope covers
  EndedAt       Timestamp      # wall-clock of the last sample
  Duration      Duration       # convenience; derived

  # Content
  Transcript    string         # the words; UTF-8
  Language      string?        # BCP-47 (e.g., "en-US"); optional if not detected
  Confidence    float          # 0.0–1.0; ASR-side confidence

  # Speaker
  Speaker {
    Label       string         # e.g., "self", "remote-1", "lapel-jane"; opaque to sinks
    SourceKind  SourceKind     # "self" | "in-person" | "online" | "file"
    Embedding   []float?       # optional; only present if diarization includes embeddings
  }

  # Intent classification (set by router)
  Intent {
    Kind        IntentKind     # "prompt" | "command" | "todo" | "note" | "question" |
                               # "summary" | "raw_transcript" | "llm_response" | "unclassified"
    Confidence  float
    Reasoning   string?        # optional; classifier's rationale (for audit / debug)
  }

  # Routing metadata (set by router; consumed by orchestrator + sinks)
  Routing {
    PrimarySink   string       # which sink is this primarily for? (e.g., "llm-anthropic")
    AlsoTo        []string     # fan-out sinks (e.g., ["s3", "bd"])
    Suppress      []string     # explicit "do NOT send to" list (kill-switch for sensitive content)
  }

  # Provenance
  Provenance {
    ASRBackend     string      # e.g., "whisper-cpp", "deepgram", or "llm:anthropic:claude-opus-4-7" for derived envelopes
    ASRVersion     string?     # backend version when known
    SegmenterImpl  string      # which segment impl produced this span
    RouterImpl     string      # which router classified it
    CapturedAt     Timestamp   # original audio capture wall-clock (== StartedAt for non-derived envelopes)
    Pipeline       string      # e.g., "vox/0.1.0"
  }

  # Optional payloads
  AudioRef      AudioRef?      # optional; pointer to the audio for this span
  Custom        map<string, any>   # adapter / sink-specific extensions; sinks ignore unknown keys
}

AudioRef {
  Location    string           # "s3://bucket/key", "file:///path", "memory://session/stream/span"
  Encoding    Encoding         # "f32" | "i16" | "wav" | "opus" | "flac"
  SampleRate  uint32?
  Channels    uint8?
  Bytes       []byte?          # populated for "memory://" refs only
}

Design choices baked in

  • Envelope, not "message" or "event" — these things ARE the unit of intent flowing through the system; the name reflects that. They carry transcript + intent + routing + provenance.
  • Routing is part of the envelope. The router stamps routing decisions onto the envelope; the orchestrator and sinks read them. This is the alternative to a pub/sub topology where the router does fan-out itself. The envelope-carries-routing model lets sinks filter ("I only handle intent: todo") without the router needing to know every sink.
  • Provenance is mandatory. When something goes wrong (bad transcript, miscategorized intent), the envelope itself carries who produced what. Critical for debugging and for the "tamper-evident audit log" enterprise feature.
  • AudioRef is optional and decoupled. Sinks that don't need audio (LLM, email summary) never load it. Sinks that do (S3 archive) follow the ref. Keeps envelope size small in the hot path.
  • Custom is the "additive growth" hatch. Sink-specific extensions that don't deserve top-level fields live here. Convention: namespace by sink name (anthropic.*, s3.*) to prevent collisions. Sinks ignore unknown keys.

IntentKind enum

Kind Meaning
prompt A direct request to an LLM ("write me a haiku about cats")
command An imperative addressed to the system or a tool ("create a bd issue for this")
todo A self-noted action item ("remember to email Sarah about the deck")
note An observation or thought worth capturing but not actionable
question A question the user wants answered (possibly by an LLM)
summary A summary span generated by an upstream stage
raw_transcript An unclassified transcript chunk; the catch-all when the router has no high-confidence label
llm_response A derived envelope created by an LLM sink containing the response to a prior envelope
unclassified Router couldn't classify with sufficient confidence

Schema versioning is implicit via the contract version. The sink/v1 contract defines this envelope shape. Future schema changes ship as v1.x (additive — new optional fields, new IntentKind values, new Encoding values) or v2 (breaking — removed or repurposed fields).


Base Sink Interface

Every sink — LLM, S3, email, local file, bd, and any community / enterprise sink — implements this:

Sink {
  # Identity
  Name()          -> string
  Capabilities()  -> Capabilities

  # Lifecycle
  Open(config)    -> Error
  Close()         -> Error           # drains internal queue with timeout, then closes

  # Hot path
  Write(ctx, envelope)        -> WriteResult
  WriteBatch(ctx, envelopes)  -> []WriteResult   # optional; only if SupportsBatch

  # Async results
  Completions()   -> <-chan Completion

  # Diagnostics
  Stats()         -> Stats           # accepted, rejected, delivered, retried, dead-lettered, queue_depth
  Health()        -> Health          # ok | degraded | unhealthy + reason
}

Capabilities {
  SupportsBatch         bool
  AcceptsIntentKinds    []IntentKind    # empty = all
  RequiresAudio         bool            # true for sinks that need AudioRef populated
  RequiresLLMResponse   bool            # true for sinks that only process derived llm_response envelopes
  MaxQueueDepth         uint32          # internal queue capacity (default 1000)
}

WriteResult {
  Accepted     bool       # sink accepted the envelope into its internal pipeline
  EnvelopeID   string     # for correlating completion events
  RejectReason string?    # if !Accepted ("queue_full" | "invalid_envelope" | "sink_down")
}

Completion {
  EnvelopeID   string
  State        CompletionState    # delivered | retrying | permanently_failed | discarded
  Attempt      uint32             # 1-based; how many tries it took
  CompletedAt  Timestamp
  Error        Error?             # populated when State != delivered
  Detail       map<string, any>   # e.g., LLM response ID, S3 object key, message ID
}

Caller-side sync, sink-internal async

Write returns fast (typically microseconds). It only confirms the sink accepted the envelope into its internal queue.

  • Sync sinks (in-memory, local file, bd): set Completed: true in WriteResult and never emit a Completion event. Caller code that doesn't drain Completions() still works.
  • Async sinks (LLM, S3, email): queue the envelope; do network work in background workers; emit a Completion event when final state is known.

Per-sink isolation is mandatory. A failing LLM sink MUST NOT prevent the S3 sink from receiving the same envelope. Each sink runs its own internal worker pool.

Backpressure

Each sink has a bounded internal queue (default 1000 envelopes — much higher than capture buffer because envelopes are infrequent at human-speech rate).

When the queue is full: - Write returns Accepted: false, RejectReason: "queue_full". - Orchestrator's per-sink on_accept_failure policy decides what's next: skip (default — log and move on), dead_letter (route to dead-letter sink), or halt (stop the orchestrator and alert).

Error model

Same typed-error scheme as capture/v1:

Error {
  Kind      ErrorKind     # see below
  Sink      string        # sink name
  Op        string        # which method was being called
  Message   string        # human-readable, no PII
  Cause     Error?        # optional wrapped cause
}

ErrorKind {
  ErrInvalidConfig
  ErrAuthFailed
  ErrQuotaExceeded            # rate-limit / quota
  ErrInvalidEnvelope          # envelope didn't pass sink-side validation
  ErrSinkUnavailable          # provider down, network unreachable
  ErrPermissionDenied         # sink-specific permission issue
  ErrPersistent               # logically permanent; no retry will help
  ErrTransient                # retry may help
  ErrInternal
}

Retry policy defaults

The orchestrator and sinks together implement a typed retry policy. All values configurable per sink.

Failure type Default behavior
Transient (timeout, 5xx, connection reset) Retry 5 times with exponential backoff (1s, 2s, 4s, 8s, 16s)
Auth (401, 403) 1 retry (in case of transient auth glitch), then permanent fail
Quota / rate-limit (429) Retry with Retry-After honored, up to 10 attempts
Permanent (400, 404, malformed envelope) No retry; dead-letter immediately
Unknown Treat as transient (5 retries with backoff)

Dead-letter destinations

Destination Default? Notes
log (structured WARN line) Always on Cheap; envelope summary, not full body
file://<path> Off (configurable) Append-only JSONL of failed envelopes; replayable
Re-queue to another sink Off (configurable) Powerful but easy to misconfigure
audit/v1 event On when audit/v1 is loaded Compliance hook

Multi-Sink Orchestration

The orchestrator owns the loop that delivers each envelope to its target sinks. Its job is small but precise.

Routing — both-layer model

Two layers, both must pass for a sink to receive an envelope.

Layer 1 (positive intent): the envelope's Routing block: - PrimarySink — the canonical destination (e.g., "llm-anthropic") - AlsoTo — fan-out destinations (e.g., ["s3", "bd"]) - Suppress — explicit kill-switch list (e.g., ["s3"] to keep this envelope out of archive)

Layer 2 (negative filter): each sink's declarative filter block in its config:

filter:
  intent_kinds: [prompt, todo, command]    # include only these
  source_kinds: [self, online]              # not in-person
  min_confidence: 0.7                       # require ASR confidence ≥ 0.7
  include_derived: true                     # include llm_response envelopes

Final delivery rule: a sink receives an envelope iff

sink_name ∈ (PrimarySink ∪ AlsoTo)
  AND sink_name ∉ Suppress
  AND sink.filter accepts envelope

This gives the router product-level intent ("send prompts to LLMs, commands to bd, summaries to email") while letting individual sinks defend themselves ("I don't care if the router says to send me everything, my filter says no raw_transcript").

Delivery order — parallel by default

Sinks must not depend on each other for their work (per-sink isolation, see above). Dependencies between sinks are expressed as data flow, not delivery order:

  • An LLM sink that produces a derived llm_response envelope can route that envelope to other sinks (email, S3, bd) via standard routing.
  • The LLM response is a first-class envelope, not a callback or side effect.

This keeps the orchestrator simple and prevents subtle "sink A succeeded but sink B failed, now what?" coupling.

Sink discovery

v1: static config. Sinks are declared in the user's config file at startup. The orchestrator validates each declared sink against the registry, opens it, then starts routing. Adding or removing sinks requires a restart.

Hot-reload of sink config is out of scope for v1; planned for v2 or the enterprise edition where multi-tenant runtime config matters.

Orchestrator error handling

Failure Orchestrator action
Single sink rejects at Write Log, increment counter, continue with other sinks
Sink's Health() returns unhealthy Take out of rotation; periodically probe; auto-restore when healthy; emit audit event
All sinks for an envelope reject Dead-letter the envelope (configurable destination); emit ERROR log; emit audit event
Completions() reports permanent failure Per-sink dead-letter policy applies; envelope-level retry-via-different-sink is opt-in (off by default)

Built-in Sinks

LLM (BYOK)

The most product-defining open-core sink. When this sink is used, the open-core code calls the configured provider's API directly with the user's BYOK credential — no Vox proxy in the open-core code path.

Bundled-LLM — a Vox-subscription-token-driven sink that proxies through Vox's cloud service to provider APIs with aggregate-volume credentials — is implemented in the enterprise repo as a separate llm-bundled sink type. It implements this same sink/v1 LLM interface; the open-core pipeline doesn't distinguish between the two. The proprietary piece is the Vox cloud service that issues subscription tokens and proxies provider traffic.

LLMSink : Sink {
  Provider()        -> ProviderInfo
  Models()          -> []ModelInfo
  Frame(envelope)   -> LLMRequest      # default framing; user can override via templates
  SubscribeStream(envelopeID) -> <-chan StreamChunk | error
}

ProviderInfo {
  Name             string    # "anthropic" | "openai" | "google" | "mistral" | "groq" |
                             # "ollama" | "llamacpp" | "azure-openai" | "bedrock"
  Endpoint         string
  AuthType         AuthType  # "api_key" | "oauth" | "iam" | "none"
  SupportsStreaming bool
  SupportsTools    bool
  SupportsVision   bool
}

LLMRequest {
  Model           string
  SystemPrompt    string
  Messages        []Message
  Tools           []ToolDef?
  MaxTokens       uint32
  Temperature     float
  Stream          bool

  # Provenance — passed through but not sent to provider
  EnvelopeID      string
  SessionID       string
  Custom          map<string, any>     # provider-specific knobs
}

LLMResponse {
  EnvelopeID      string
  ResponseText    string
  ToolCalls       []ToolCall?
  UsageTokens     UsageInfo
  CompletedAt     Timestamp
  StreamChunks    []StreamChunk?
}

Design choices:

  • One LLMSink instance = one provider + one credential. Two Anthropic accounts = two llm-anthropic sinks. Provider-switching is config-time, not runtime.
  • Provider-agnostic at the contract layer, provider-native at the wire layer. Common fields (model, messages, tools, max_tokens, temperature, stream) translate to each provider's native API. Provider-specific knobs go in LLMRequest.Custom — sinks pass through, sinks that don't care ignore.
  • BYOK is in the sink config, NOT in the envelope. Envelopes are credential-less. Credentials live in the sink's startup config (env var, OS keychain, etc.). An envelope can fan out to N LLM sinks with N different providers and credentials without re-stamping.
  • Framing is separable from sending. Sink.Frame(envelope) -> LLMRequest is exposed so users can override framing per sink via templates ("for intent: prompt, system prompt is X; for intent: command, it's Y"). Auditable separately from credential / endpoint.
  • Response capture as derived envelopes. When the LLM responds, the sink emits a Completion AND optionally creates a derived IntentEnvelope with ParentID = original envelope ID, Intent.Kind = "llm_response", Transcript = ResponseText, Provenance.ASRBackend = "llm:anthropic:claude-opus-4-7". This derived envelope routes through the normal pipeline — append to bd, archive to S3, email summary.

v1.x additive: per-intent system prompts

The llm-anthropic sink (and by convention all LLM sinks) supports a system_prompts map that selects a different system prompt based on the envelope's Intent.Kind. This lets you tune LLM style per intent without duplicating sink configuration.

sinks:
  - name: my-anthropic
    type: llm-anthropic
    system_prompts:
      prompt: "You are a helpful assistant answering a dictated request. Reply directly."
      question: "You are a helpful assistant. Reply directly and concisely."
      command: "You are receiving an imperative command. Interpret it as an action request."
      todo: "Elaborate this todo into a clear action item with concrete next steps."
      note: "Polish this note into clean meeting-note style. Don't expand; just clean it."
      default: "You are a helpful assistant called by Blackrim Vox."
    # Legacy single-prompt form still works when system_prompts is absent:
    # system_prompt: "..."

Lookup order at Write time: 1. system_prompts[envelope.Intent.Kind] — per-intent override 2. system_prompts["default"] — catch-all override 3. system_prompt field — old-style single prompt 4. Compiled-in default

Backward compatible: existing configs using only system_prompt continue to work unchanged.

Authentication precedence

Credential lookup order at Open(): 1. Explicit env var (ANTHROPIC_API_KEY, etc.) 2. OS keychain (macOS Keychain, Windows Credential Manager, libsecret) — default 3. Config file (~/.vox/credentials.yaml) — deprecated; warns if used 4. External secrets manager (Vault, AWS Secrets Manager, 1Password CLI) via a future secrets/v1 extension

vox auth set anthropic is the user-facing onboarding command — prompts for the key, stores in OS keychain, confirms. Easy path = secure path.

Streaming

  • Default: stream for prompt and question intent kinds; non-stream for summary and note. Configurable per sink + per intent.
  • Stream chunks emitted on a per-envelope subscribable channel via Sink.SubscribeStream(envelopeID).
  • If the caller doesn't subscribe, chunks are buffered until completion and returned in LLMResponse.StreamChunks.
  • Stream chunks are NOT re-emitted as separate envelopes — only the assembled final response becomes a derived envelope. Partial streams are a UI affordance, not a pipeline data type.

Tool definitions

For envelopes with Intent.Kind = "command", the sink MAY pass tool definitions to the provider. Tools are configured per sink, not per envelope.

sinks:
  - name: anthropic-with-tools
    type: llm-anthropic
    model: claude-opus-4-7
    tools:
      - name: bd_create
        description: "Create a task in beads"
        # tool definition follows JSON Schema

This keeps the envelope simple (carries intent only) and tools auditable (you know what each LLM sink CAN do without reading every envelope).

Tool execution is out of scope for sink/v1 — tools are declared to the LLM here, but the execution layer that handles a tool-call response and dispatches the action is a separate concern (planned tools/v1).

Tier-1 providers (ship with v1)

Provider Notes
anthropic Claude Sonnet / Opus
openai GPT-4o / o1 / o3
google Gemini Pro / Flash
ollama Local-first; the "everything local" path

Tier-2 providers (community-contributable, same contract): mistral, groq, llamacpp, azure-openai, bedrock.


S3-compatible storage

Archive sink for envelopes + audio. Tier-1 across AWS S3, Cloudflare R2, Backblaze B2, Wasabi, MinIO (self-hosted).

Object key schema

{prefix}/sessions/{session_id}/envelopes/{envelope_id}.json
{prefix}/sessions/{session_id}/audio/{envelope_id}.{ext}
{prefix}/sessions/{session_id}/manifest.json
  • {prefix} — user-configured base path (e.g., vox/prod/)
  • {session_id}, {envelope_id} — from the envelope
  • Audio extension depends on encoding (opus, flac, wav)

Session-rooted by default — cross-midnight sessions stay together. Date-rooted path strategy is opt-in:

s3:
  path_strategy: session-rooted     # session-rooted | date-prefixed

Date-rooted yields {prefix}/{YYYY}/{MM}/{DD}/sessions/{session_id}/... which is convenient for lifecycle rules.

What gets written

Two objects per envelope: envelope JSON + audio (when applicable). Per-envelope PUTs — cost is negligible at speech-rate envelope volumes (≈ $0.007 per hour-long meeting at AWS S3 pricing).

A manifest.json is built incrementally per session, flushed every 30s by default, finalized at session close. Lets a consumer pull a single object to enumerate an entire session without listing under the prefix.

Audio encoding

Format 1hr 16kHz mono Lossless Speech-tuned
Opus (default) 7-15 MB No Yes
FLAC 50-70 MB Yes No
WAV 115 MB Effectively No
none 0

Default opus_bitrate: 24000 (24 kbps — near-transparent for speech).

Object metadata

Every object stamped with metadata keys for lifecycle filtering without parsing content:

Metadata key Value
x-amz-meta-vox-session-id UUID
x-amz-meta-vox-stream-id UUID
x-amz-meta-vox-source-kind self / in-person / online / file
x-amz-meta-vox-intent-kind prompt / command / etc.
x-amz-meta-vox-captured-at ISO 8601
x-amz-meta-vox-retention-policy default / compliance / pii / custom
x-amz-meta-vox-schema-version v1

Authentication

AWS SDK is the underlying client. Same credential precedence as LLM sink (env → keychain → profile file → external secrets manager). Provider-agnostic config — endpoint URL specifies the destination provider:

sinks:
  - type: s3
    endpoint: https://s3.amazonaws.com      # or r2.cloudflarestorage.com, s3.wasabisys.com, etc.
    region: us-east-1
    bucket: vox-archive
    prefix: vox/prod/
    auth:
      method: keychain
      credential_name: vox-s3-aws
    path_strategy: session-rooted
    audio:
      encoding: opus
      opus_bitrate: 24000
    manifest:
      flush_interval: 30s
    encryption:
      sse: aws:s3                            # aws:s3 (default) | aws:kms | none
      kms_key_id: ""

iam-role auth method uses AWS SDK's automatic IMDS lookup for EC2 / ECS / Lambda contexts.

Server-side encryption

  • aws:s3 (default) — provider-managed SSE-S3 (AES-256)
  • aws:kms — SSE-KMS with user's key (compliance scenarios)
  • none — for self-hosted MinIO without encryption

Client-side encryption is out of scope for open-core sink/v1 — compliance-tier feature that belongs in the enterprise repo where audit / key-management infrastructure can support it properly.


Email

Tier-1 across SMTP and the major transactional providers.

Transport

Pluggable transports. Each ships as a separate sink registration with the same envelope-handling contract:

Sink name Transport
email-smtp SMTP (any relay: Gmail, ProtonMail, self-hosted Postfix, etc.)
email-sendgrid SendGrid HTTP API
email-postmark Postmark HTTP API
email-mailgun Mailgun HTTP API
email-resend Resend HTTP API
email-ses AWS SES

Credential precedence is the standard chain (env → keychain → config → secrets manager).

EmailSink : Sink {
  Transport()     -> string
  TestSend(ctx)   -> Error   # send a test message; validates transport config
}

Triggering modes

Mode When email is sent Use case
per-envelope One email per envelope received Dictation → email; immediate forwarding
per-session (default for online/in-person) Accumulate envelopes; send one summary at session close Meeting summaries — the canonical case
scheduled Daily / weekly digest of envelopes matching a filter "Friday 5pm summary of what I dictated this week"

Configurable per sink:

sinks:
  - name: meeting-summary
    type: email-smtp
    trigger: per-session
    flush_idle_after: 5m       # send after N minutes of no new envelopes
    flush_max_wait: 4h         # hard cap

  - name: dictation-forward
    type: email-smtp
    trigger: per-envelope
    filter:
      intent_kinds: [prompt, note]
      source_kinds: [self]

  - name: weekly-digest
    type: email-smtp
    trigger: scheduled
    schedule: "0 17 * * FRI"   # cron
    digest_window: 7d

Templates

Go templates (text/template + html/template). Three bundled defaults:

  • default-summary.html.tmplper-session meeting summary card
  • default-envelope.html.tmplper-envelope single transcript
  • default-digest.html.tmplscheduled digest list

Users override per sink:

template:
  subject: "Meeting: {{ .Session.Title }}"
  html_path: ~/.vox/templates/team-summary.html.tmpl
  text_path: ~/.vox/templates/team-summary.txt.tmpl

Template context exposes .Session, .Envelopes, .LLMResponses, .Stats.

Recipient determination

Three-level precedence:

  1. Envelope overrideenvelope.Custom.email.to: [...] wins (highest)
  2. Session participants — orchestrator-supplied; sink uses when configured
  3. Sink-config recipients — static fallback (lowest)
recipients:
  to: ["[email protected]"]
  cc: []
  bcc: ["[email protected]"]
  use_session_participants: true

Threading

Stable Message-ID derived from SessionID (for per-session) or digest_window_start (for scheduled).

  • Per-session first send: <{session_id}@vox.local>
  • Per-session re-flush (long session): <{session_id}-{flush_n}@vox.local> with In-Reply-To: <{session_id}@vox.local>
  • Scheduled: <digest-{date}@vox.local> with In-Reply-To: <digest-{previous_date}@vox.local> for chained digests

Hostname (vox.local) is configurable; defaults to the configured SMTP / API domain.

Attachments

Content Behavior
Transcript text/markdown Attached when ≤ 100 KB; inline in body otherwise
Audio bytes Never attached. Linked via S3 URL in body if S3 sink also fired
HTML summary Inline (multipart/alternative with text fallback)
PDF summary Out of scope for v1 (add later via separate pdf-render sink → email)

Filtering

Standard sink filter block:

filter:
  intent_kinds: [prompt, todo, command]
  source_kinds: [self, online]
  min_confidence: 0.7
  include_derived: true

Local file

Simplest sink. Useful as a no-cloud fallback, dev/test substrate, and the default sink for self-hosters who want zero network.

Mirrors the S3 sink's key schema so users can run both side-by-side or migrate between them without rethinking layout.

sinks:
  - type: local-file
    base_dir: ~/.vox/archive
    format: jsonl                 # jsonl (default) | json-array | sqlite
    path_strategy: per-session    # per-session (default) | per-day | single-file
    audio: sidecar                # sidecar (default) | embed-base64 | none
    rotation: none                # none | daily | size:100MB (for single-file mode)
    compress: none                # none | gzip
    fsync_every: ""               # paranoid mode: fsync after N envelopes or duration

Default layout:

{base_dir}/sessions/{session_id}.jsonl
{base_dir}/sessions/{session_id}/audio/{envelope_id}.opus

JSONL is append-friendly, line-oriented, streamable, and grep-able — the standard for envelope-style streams. json-array and sqlite are available for users who want different ergonomics.


ox-ledger (SageOx team-context ledger)

Writes envelopes as murmurs into a SageOx (ox) ledger directory. Murmurs are git-tracked JSON files in data/murmurs/YYYY-MM-DD/HH/<id>.json that ox uses to share team context across humans and AI coding agents.

This sink turns voice-captured intent into team-shared context that any ox-integrated AI coworker automatically loads via ox agent prime. Full integration design + upstream-coordination tracker: docs/integrations/ox.md.

sinks:
  - name: team-ledger
    type: ox-ledger
    ledger_dir: ~/.sageox/ledger              # auto-detected from ~/.sageox/config.yaml when blank
    agent_id_template: "vox-{{ .InstanceID }}"
    agent_type: vox                           # appears in ox UI / queries
    topic_template: "voice/{{ .Envelope.Speaker.SourceKind }}/{{ .Envelope.Intent.Kind }}"
    importance_template: "{{ if gt .Envelope.Intent.Confidence 0.8 }}normal{{ else }}ambient{{ end }}"
    scope: team                               # "team" | "ledger"
    schema_version: "1"                       # ox murmur schema version
    git_commit_interval: 30s                  # batch commits to avoid repo bloat
    git_auto_push: false                      # user / ox daemon owns push
    prefer_daemon_ipc: true                   # use ox daemon when reachable; direct write otherwise
    filter:
      intent_kinds: [prompt, command, todo, note, summary, llm_response]
      source_kinds: [self, in-person, online]
      min_confidence: 0.6

Envelope → murmur mapping (abridged; full table in docs/integrations/ox.md):

ox murmur field Vox envelope source
id EnvelopeID
timestamp StartedAt
agent_id / agent_type sink config (vox-{instance} / vox)
principal_id / principal_type derived from Speaker.Label / Speaker.SourceKind
topic / importance rendered from topic_template / importance_template
content Transcript
metadata namespaced vox.* keys with session_id / stream_id / intent_kind / confidence / audio_ref / etc.
tags [vox, source:<kind>, intent:<kind>] plus user-supplied
scope sink config (team default)

Performance: file writes are immediate; git commits are batched (default every 30s, configurable). Never auto-pushes — that's the user's or ox daemon's responsibility.

Two integration modes:

  • Direct write (default fallback): Vox writes JSON files into the ledger directory and runs git add + git commit on the batch interval. Works when ox daemon isn't running.
  • Daemon IPC (preferred when ox daemon is reachable): Vox sends murmurs to the ox daemon via the adapter-protocol IPC; the daemon handles file I/O + commit serialization. Requires the ox-adapter-vox binary (ships in v1.1) to be installed.

Detection: ledger_dir auto-detects from - OX_LEDGER_DIR env var - ~/.sageox/config.yaml (when ox is installed) - explicit ledger_dir field in sink config (overrides both)

If none resolve, the sink's Open() returns ErrInvalidConfig with a human-readable pointer to the ox setup docs.


bd (Beads task tracker)

Envelopes with Intent.Kind = "todo" or "command" naturally become bd issues when bd is available in the host project. The bd sink wraps bd create and bd update.

sinks:
  - type: bd
    filter:
      intent_kinds: [todo, command]
    title_template: "{{ truncate 80 (firstSentence .Envelope.Transcript) }}"
    description_template: ""      # empty = bundled default
    default_type: task            # bd issue type
    default_priority: p2
    auto_claim: false
    include_s3_link: true         # auto-detect S3 sink output, link in description

Idempotency: envelope_id is the dedup key. On retry, the sink looks up existing issues by description-prefix containing the envelope_id; if found, updates instead of creating.

bd not present: Open() returns ErrUnsupported; orchestrator marks the sink unhealthy; other sinks continue.

bd remote sync: the sink does NOT push. The host project's normal bd workflow (bd dolt push) handles that. Keeps the sink in its lane.


Sink Registration

Sinks register via a single, documented entry point:

RegisterSink(name: string, factory: (config) -> Sink)

Registration is package-init in Go, equivalent in other languages. The core maintains a single registry; duplicate names panic at startup (intentional).

Enterprise plugins register against the same registry — the core does not distinguish open vs. enterprise sinks at the loader level.

Built-in v1 sinks

Name Family Tier Notes
local-file local 1 Reference impl; simplest possible sink
bd bd 1 Task tracker integration
ox-ledger integration 1 SageOx team-context ledger writer; see docs/integrations/ox.md
llm-anthropic llm 1 Primary LLM; flagship provider
llm-openai llm 1 GPT-4o / o1 / o3
llm-google llm 1 Gemini Pro / Flash
llm-ollama llm 1 Local-first LLM path
llm-mistral llm 2 Mistral Large / Codestral
llm-groq llm 2 Llama 3 / Mixtral fast inference
llm-llamacpp llm 2 Embedded; reference Whisper.cpp pattern
llm-azure-openai llm 2 Enterprise-Azure path
llm-bedrock llm 2 Enterprise-AWS path
s3 s3 1 AWS S3 + Cloudflare R2 + B2 + Wasabi + MinIO
email-smtp email 1 Universal; works with any SMTP relay
email-sendgrid email 1 SendGrid transactional
email-postmark email 1 Postmark transactional
email-mailgun email 2 Mailgun transactional
email-resend email 2 Resend transactional
email-ses email 2 AWS SES

Tier 1 = ships with first stable release. Tier 2 = community-contributable adapter slots that follow the same contract.


Configuration Schema

Top-level configuration for the entire sink layer:

sinks:
  - name: my-anthropic                # unique sink instance name
    type: llm-anthropic               # registered factory name
    # ... type-specific config

  - name: my-archive
    type: s3
    # ...

orchestrator:
  on_accept_failure: skip             # skip (default) | dead_letter | halt
  dead_letter:
    log: true                         # always on; structured WARN
    file: ~/.vox/dead-letter.jsonl    # optional; off by default
    audit: true                       # on when audit/v1 loaded
  envelope_retry_via_different_sink: false   # opt-in chaining on permanent fail

Each sink config inherits the standard filter: block and the sink-type-specific fields documented in its section above.


Versioning and Stability

sink/v1 is the contract above. Once frozen:

  • Non-breaking changes (allowed in v1.x): adding optional fields with sensible defaults to IntentEnvelope, WriteResult, Completion, Capabilities, or Stats; adding new IntentKind values; adding new ErrorKind values; adding new built-in sinks; adding new providers under existing sink families.
  • Breaking changes (require v2): removing or renaming any existing field or method; changing the meaning of an existing field; changing the envelope schema in any non-additive way; changing the routing semantics.

The core supports one vN of sink/ at a time, with overlap during migrations. Sinks declare which version they target via their Name() return value or a parallel SupportedVersions() method (TBD before freeze).


Reference Implementations (Build Order)

Order Sink Family Why this order
1 local-file local First. Zero dependencies; validates the base interface + envelope schema end-to-end; testable in CI without any network
2 llm-anthropic llm First BYOK LLM; proves the auth precedence + streaming + derived-envelope flow
3 bd bd Validates idempotency + intent-kind filtering; tightly bounded scope
4 ox-ledger integration Validates git-batching + templated murmur emission; unblocks the SageOx integration (the highest-leverage product partnership)
5 email-smtp email Hardest of the triggering modes (per-session); proves the orchestrator's batching path
6 s3 s3 Most config surface; lifecycle / encryption / metadata to validate
7+ other LLM providers, transactional email various Same contract, different wire protocols

Build the sinks in this order; build the orchestrator with local-file alone first; add sinks one at a time. The orchestrator and base interface stabilize before any network-bound sink lands.


Project Principle: Opinionated Defaults, Every Default Configurable

This contract continues the principle established in capture/v1. Every behavior with a defensible default (buffer_frames: 1000, retry_max_attempts: 5, audio.encoding: opus, triggering.flush_idle_after: 5m, etc.) is exposed as a config knob. The defaults reflect a considered recommendation for the typical voice-to-LLM + archive + summary use case; the knobs exist so specialized workflows can tune them.