Architecture¶
Blackrim Vox is a local-first voice capture and transcription system built around a five-stage pipeline: capture → segment → ASR → router → sinks. All components are pluggable via a typed registry; the open-source edition ships functional defaults for every stage, and the Enterprise edition shadows or extends those defaults without forking the core.
System overview¶
The pipeline flows left-to-right. Audio frames enter through a capture adapter, get sliced into utterance segments, are transcribed by an ASR backend, classified and dispatched by a router, and finally delivered to one or more output sinks. Policy gates and IPC helpers sit alongside the pipeline rather than inside it.
flowchart LR
subgraph Input
MIC["Microphone / System audio\n(internal/capture)"]
WAV["File / WAV\n(internal/capture/filewav)"]
end
subgraph Pipeline["Pipeline — internal/orchestrator"]
direction LR
CAP["Capture\nAdapter"]
SEG["Segment\nBackend"]
ASR["ASR\nBackend"]
RTR["Router"]
SINKS["Sinks"]
end
subgraph Output
LLM["LLM sink\n(internal/sink/llmanthropic)"]
FILE["Local file\n(internal/sink/localfile)"]
TTS_OUT["TTS sink\n(internal/sink/tts)"]
end
subgraph Helpers
POLICY["Policy gate\n(internal/policy)"]
IPC["IPC indicator\n(internal/indicator/ipc)"]
AUDIT["Audit stream\n(pkg/audit)"]
end
MIC --> CAP
WAV --> CAP
CAP --> SEG
SEG --> ASR
ASR --> RTR
RTR --> SINKS
SINKS --> LLM
SINKS --> FILE
SINKS --> TTS_OUT
POLICY -. "gate check\nbefore start" .-> Pipeline
SINKS -. "emit on verdict" .-> AUDIT
IPC -. "status updates" .-> CAP
The orchestrator (internal/orchestrator) wires these contracts together and is the only place that knows all five stages simultaneously. Individual stages depend only on their immediate upstream/downstream interface — not on each other.
Module map¶
graph TD
EXT["pkg/extension\nRegistry of typed constructors\nfor ASR, TTS, Sink, IAM, AuditForwarder"]
OSSVOX["pkg/ossvox\nOSS run-loop entry point;\nRegisterDefaults + Dispatcher"]
IAM["pkg/iam\nIdentity & access management\nBackend interface + role types"]
AUDIT_PKG["pkg/audit\nAudit/v1 stream; JSONLStream\nwith rotation + Forwarder interface"]
ASR_INT["internal/asr\nASR Backend interface;\nwhispercli, deepgram, assemblyai, azure adapters"]
TTS_INT["internal/tts\nTTS Backend interface;\npiper, elevenlabs, openai, say adapters"]
SINK_INT["internal/sink\nSink interface;\nllmanthropic, localfile, tts outputs"]
CAP_INT["internal/capture\nAdapter interface;\ncoreaudio, filewav, echo backends"]
SEG_INT["internal/segment\nSegmentation Backend interface;\nenergy VAD default"]
POL_INT["internal/policy\nNetwork-egress gate + per-sink\nconsent persistence"]
ORCH_INT["internal/orchestrator\nWires capture→segment→ASR\n→router→sinks end-to-end"]
RTR_INT["internal/router\nRouter interface;\ndefaultrouter implementation"]
OSSVOX --> EXT
ORCH_INT --> CAP_INT
ORCH_INT --> SEG_INT
ORCH_INT --> ASR_INT
ORCH_INT --> RTR_INT
ORCH_INT --> SINK_INT
ORCH_INT --> POL_INT
EXT --> ASR_INT
EXT --> TTS_INT
EXT --> SINK_INT
EXT --> IAM
EXT --> AUDIT_PKG
AUDIT_PKG --> POL_INT
| Package | Role |
|---|---|
pkg/extension |
Typed constructor registry; RegisterASR, RegisterTTS, RegisterSink, RegisterIAM, RegisterAuditForwarder — the sole seam between OSS and Enterprise |
pkg/iam |
Backend interface for authentication + session management; roles (admin, user, read-only); no SDK deps |
pkg/audit |
Stream interface (Emit); JSONLStream with size/age rotation; Forwarder interface for Splunk/Datadog/syslog/Loki/Elasticsearch adapters |
pkg/ossvox |
Run entry point; RegisterDefaults pre-populates the registry with OSS backends; Dispatcher maps subcommands to handlers |
internal/asr |
Backend interface for streaming transcription; sub-packages: whispercli, deepgram, assemblyai, azure, echo (test), fallback |
internal/tts |
Backend interface for speech synthesis; sub-packages: piper, elevenlabs, openai, say (macOS), voicecache, fallback |
internal/sink |
Sink interface for output destinations; sub-packages: llmanthropic, localfile, tts |
internal/capture |
Adapter interface for audio sources; sub-packages: coreaudio, filewav, echo (test), gate |
internal/policy |
Network-egress Gate; consent sub-package persists per-sink decisions to ~/.vox/policy.json |
internal/orchestrator |
Wires the five pipeline stages; holds Pipeline config struct and Run function; sole place that touches all five contracts |
Extension registration pattern¶
The registry is the only contract surface between the OSS core and the Enterprise edition. All backends are identified by name strings; re-registering a name replaces the prior entry.
sequenceDiagram
participant main as cmd/vox main()
participant reg as extension.Registry
participant oss as ossvox.RegisterDefaults
participant ent as cmd/vox-enterprise (separate module)
participant orch as orchestrator.Run
main->>reg: extension.NewRegistry()
main->>oss: RegisterDefaults(reg)
note over oss: registers whispercli ASR,<br/>piper TTS, localfile sink,<br/>null IAM, null audit forwarder
opt Enterprise binary only
main->>ent: RegisterEnterpriseBackends(reg)
note over ent: shadows IAM slot with WorkOS,<br/>registers ElevenLabs TTS,<br/>Splunk audit forwarder, etc.
end
main->>orch: ossvox.Run(ctx, args, reg, dispatcher)
orch->>reg: ResolveASR(cfg.ASRBackend)
orch->>reg: ResolveTTS(cfg.TTSBackend)
orch->>reg: ResolveSink(name) [for each configured sink]
orch->>reg: ResolveIAM(cfg.IAMBackend)
Key invariants:
Register*methods are not concurrency-safe; all registration happens beforeRunis called.Resolve*methods are safe for concurrent use after registration.- Enterprise backends shadow OSS slots by registering under the same name (e.g.
"workos"occupies the previously-empty"iam"slot). - The enterprise hint catalog (
extension.LookupEnterprise) provides friendly CLI errors when an OSS user invokes a known-enterprise subcommand.
The full registry interface is defined in pkg/extension/registry.go. Constructor signatures follow a uniform func(ctx context.Context, cfg map[string]any) (T, error) pattern for all five surfaces.
Policy and posture¶
Vox is air-gapped by default. No audio data, transcript, or credential leaves the host unless an operator explicitly enables a networked sink.
flowchart TD
START([Pipeline.Run called]) --> GATE{Policy gate\nnon-nil?}
GATE -- No --> SKIP[Skip check\nlegacy / test mode]
GATE -- Yes --> CHECK[Check each sink's\nNetworkScope against gate]
CHECK --> ALLOWED{All sinks\nallowed?}
ALLOWED -- Yes --> PIPELINE[Execute pipeline]
ALLOWED -- No --> CONSENT{Interactive\nmode?}
CONSENT -- Yes --> PROMPT[Show per-sink\nconsent prompt]
PROMPT --> PERSIST[Persist decision\nto ~/.vox/policy.json]
PERSIST --> ALLOWED
CONSENT -- No --> ERR[Return PolicyError\nheadless / daemon]
SKIP --> PIPELINE
Posture summary:
| Dimension | Default | Override |
|---|---|---|
| Network egress | Denied unless explicitly allowed | --i-accept-network-egress or interactive consent |
| Consent persistence | ~/.vox/policy.json per-sink per-endpoint |
Cleared on sink reconfiguration |
| Edition switch | OSS binary = OSS backends only | Enterprise binary adds backends via registry shadowing |
| Air-gap enforcement | policy.Gate checks sink.Capabilities.NetworkScope before orchestrator.Run |
Nil gate = no check (test / legacy) |
The internal/policy/consent sub-package manages the interactive prompt and the persisted JSON. The --i-accept-network-egress flag bypasses the prompt for CI and scripted environments.
ADR index¶
| ADR | Title | Status |
|---|---|---|
| ADR-0003 | Audit v1 stream design | Accepted |
| ADR-0005 | BYOK / org credentials (pooled model 1) | Accepted |
Additional architecture decision records are forthcoming. When filing a new ADR, use the research template at docs/research/_template-vendor-evaluation.md as a starting point for the problem/options/decision structure until a dedicated ADR template is published.