Skip to content

Architecture

Blackrim Vox is a local-first voice capture and transcription system built around a five-stage pipeline: capture → segment → ASR → router → sinks. All components are pluggable via a typed registry; the open-source edition ships functional defaults for every stage, and the Enterprise edition shadows or extends those defaults without forking the core.


System overview

The pipeline flows left-to-right. Audio frames enter through a capture adapter, get sliced into utterance segments, are transcribed by an ASR backend, classified and dispatched by a router, and finally delivered to one or more output sinks. Policy gates and IPC helpers sit alongside the pipeline rather than inside it.

flowchart LR
    subgraph Input
        MIC["Microphone / System audio\n(internal/capture)"]
        WAV["File / WAV\n(internal/capture/filewav)"]
    end

    subgraph Pipeline["Pipeline — internal/orchestrator"]
        direction LR
        CAP["Capture\nAdapter"]
        SEG["Segment\nBackend"]
        ASR["ASR\nBackend"]
        RTR["Router"]
        SINKS["Sinks"]
    end

    subgraph Output
        LLM["LLM sink\n(internal/sink/llmanthropic)"]
        FILE["Local file\n(internal/sink/localfile)"]
        TTS_OUT["TTS sink\n(internal/sink/tts)"]
    end

    subgraph Helpers
        POLICY["Policy gate\n(internal/policy)"]
        IPC["IPC indicator\n(internal/indicator/ipc)"]
        AUDIT["Audit stream\n(pkg/audit)"]
    end

    MIC --> CAP
    WAV --> CAP
    CAP --> SEG
    SEG --> ASR
    ASR --> RTR
    RTR --> SINKS
    SINKS --> LLM
    SINKS --> FILE
    SINKS --> TTS_OUT

    POLICY -. "gate check\nbefore start" .-> Pipeline
    SINKS -. "emit on verdict" .-> AUDIT
    IPC -. "status updates" .-> CAP

The orchestrator (internal/orchestrator) wires these contracts together and is the only place that knows all five stages simultaneously. Individual stages depend only on their immediate upstream/downstream interface — not on each other.


Module map

graph TD
    EXT["pkg/extension\nRegistry of typed constructors\nfor ASR, TTS, Sink, IAM, AuditForwarder"]
    OSSVOX["pkg/ossvox\nOSS run-loop entry point;\nRegisterDefaults + Dispatcher"]
    IAM["pkg/iam\nIdentity & access management\nBackend interface + role types"]
    AUDIT_PKG["pkg/audit\nAudit/v1 stream; JSONLStream\nwith rotation + Forwarder interface"]

    ASR_INT["internal/asr\nASR Backend interface;\nwhispercli, deepgram, assemblyai, azure adapters"]
    TTS_INT["internal/tts\nTTS Backend interface;\npiper, elevenlabs, openai, say adapters"]
    SINK_INT["internal/sink\nSink interface;\nllmanthropic, localfile, tts outputs"]
    CAP_INT["internal/capture\nAdapter interface;\ncoreaudio, filewav, echo backends"]
    SEG_INT["internal/segment\nSegmentation Backend interface;\nenergy VAD default"]
    POL_INT["internal/policy\nNetwork-egress gate + per-sink\nconsent persistence"]
    ORCH_INT["internal/orchestrator\nWires capture→segment→ASR\n→router→sinks end-to-end"]
    RTR_INT["internal/router\nRouter interface;\ndefaultrouter implementation"]

    OSSVOX --> EXT
    ORCH_INT --> CAP_INT
    ORCH_INT --> SEG_INT
    ORCH_INT --> ASR_INT
    ORCH_INT --> RTR_INT
    ORCH_INT --> SINK_INT
    ORCH_INT --> POL_INT
    EXT --> ASR_INT
    EXT --> TTS_INT
    EXT --> SINK_INT
    EXT --> IAM
    EXT --> AUDIT_PKG
    AUDIT_PKG --> POL_INT
Package Role
pkg/extension Typed constructor registry; RegisterASR, RegisterTTS, RegisterSink, RegisterIAM, RegisterAuditForwarder — the sole seam between OSS and Enterprise
pkg/iam Backend interface for authentication + session management; roles (admin, user, read-only); no SDK deps
pkg/audit Stream interface (Emit); JSONLStream with size/age rotation; Forwarder interface for Splunk/Datadog/syslog/Loki/Elasticsearch adapters
pkg/ossvox Run entry point; RegisterDefaults pre-populates the registry with OSS backends; Dispatcher maps subcommands to handlers
internal/asr Backend interface for streaming transcription; sub-packages: whispercli, deepgram, assemblyai, azure, echo (test), fallback
internal/tts Backend interface for speech synthesis; sub-packages: piper, elevenlabs, openai, say (macOS), voicecache, fallback
internal/sink Sink interface for output destinations; sub-packages: llmanthropic, localfile, tts
internal/capture Adapter interface for audio sources; sub-packages: coreaudio, filewav, echo (test), gate
internal/policy Network-egress Gate; consent sub-package persists per-sink decisions to ~/.vox/policy.json
internal/orchestrator Wires the five pipeline stages; holds Pipeline config struct and Run function; sole place that touches all five contracts

Extension registration pattern

The registry is the only contract surface between the OSS core and the Enterprise edition. All backends are identified by name strings; re-registering a name replaces the prior entry.

sequenceDiagram
    participant main as cmd/vox main()
    participant reg as extension.Registry
    participant oss as ossvox.RegisterDefaults
    participant ent as cmd/vox-enterprise (separate module)
    participant orch as orchestrator.Run

    main->>reg: extension.NewRegistry()
    main->>oss: RegisterDefaults(reg)
    note over oss: registers whispercli ASR,<br/>piper TTS, localfile sink,<br/>null IAM, null audit forwarder
    opt Enterprise binary only
        main->>ent: RegisterEnterpriseBackends(reg)
        note over ent: shadows IAM slot with WorkOS,<br/>registers ElevenLabs TTS,<br/>Splunk audit forwarder, etc.
    end
    main->>orch: ossvox.Run(ctx, args, reg, dispatcher)
    orch->>reg: ResolveASR(cfg.ASRBackend)
    orch->>reg: ResolveTTS(cfg.TTSBackend)
    orch->>reg: ResolveSink(name) [for each configured sink]
    orch->>reg: ResolveIAM(cfg.IAMBackend)

Key invariants:

  • Register* methods are not concurrency-safe; all registration happens before Run is called.
  • Resolve* methods are safe for concurrent use after registration.
  • Enterprise backends shadow OSS slots by registering under the same name (e.g. "workos" occupies the previously-empty "iam" slot).
  • The enterprise hint catalog (extension.LookupEnterprise) provides friendly CLI errors when an OSS user invokes a known-enterprise subcommand.

The full registry interface is defined in pkg/extension/registry.go. Constructor signatures follow a uniform func(ctx context.Context, cfg map[string]any) (T, error) pattern for all five surfaces.


Policy and posture

Vox is air-gapped by default. No audio data, transcript, or credential leaves the host unless an operator explicitly enables a networked sink.

flowchart TD
    START([Pipeline.Run called]) --> GATE{Policy gate\nnon-nil?}
    GATE -- No --> SKIP[Skip check\nlegacy / test mode]
    GATE -- Yes --> CHECK[Check each sink's\nNetworkScope against gate]
    CHECK --> ALLOWED{All sinks\nallowed?}
    ALLOWED -- Yes --> PIPELINE[Execute pipeline]
    ALLOWED -- No --> CONSENT{Interactive\nmode?}
    CONSENT -- Yes --> PROMPT[Show per-sink\nconsent prompt]
    PROMPT --> PERSIST[Persist decision\nto ~/.vox/policy.json]
    PERSIST --> ALLOWED
    CONSENT -- No --> ERR[Return PolicyError\nheadless / daemon]
    SKIP --> PIPELINE

Posture summary:

Dimension Default Override
Network egress Denied unless explicitly allowed --i-accept-network-egress or interactive consent
Consent persistence ~/.vox/policy.json per-sink per-endpoint Cleared on sink reconfiguration
Edition switch OSS binary = OSS backends only Enterprise binary adds backends via registry shadowing
Air-gap enforcement policy.Gate checks sink.Capabilities.NetworkScope before orchestrator.Run Nil gate = no check (test / legacy)

The internal/policy/consent sub-package manages the interactive prompt and the persisted JSON. The --i-accept-network-egress flag bypasses the prompt for CI and scripted environments.


ADR index

ADR Title Status
ADR-0003 Audit v1 stream design Accepted
ADR-0005 BYOK / org credentials (pooled model 1) Accepted

Additional architecture decision records are forthcoming. When filing a new ADR, use the research template at docs/research/_template-vendor-evaluation.md as a starting point for the problem/options/decision structure until a dedicated ADR template is published.