ADR-001: Six-Layer Reasoning Pipeline, Separated from the Model¶
- Status: Accepted
- Date: 2026-03-28
- Deciders: @piyushptiwari1
Context¶
We have two artefacts that evolve on different cadences and serve different audiences:
- The model (trained weights + training code) — the research artefact. Published as
ebrm. Cite-able, fixed once trained, versioned by checkpoint. - The system — the production pipeline that wraps any base reasoner with intent routing, adaptive compute, verifiers, and voting. Published as
ebrm-system. Ships on a fast cadence, should be installable viapip, should run with or without the v2 model.
Keeping them in one repository forced tight coupling: breaking changes in the pipeline churned the paper repo; retraining pinned the pipeline version. Splitting them lets each evolve independently and keeps the model repo citable.
Decision¶
Two repositories, one pipeline architecture.
ebrm— model, training loop, paper reference, Hugging Face release.-
ebrm-system(this repo) — installable pipeline with the following six layers, each behind atyping.Protocol: -
Intent / difficulty classifier → emits
IntentPredictionwith suggested Langevin steps, restarts, and trace count. - Hierarchical latent reasoner → inner latent-thought loop (Coconut-inspired). Swappable with any
nn.Module. - Adaptive Langevin → test-time compute scaled by difficulty; K parallel traces.
- Process reward model → stepwise energy → per-trace confidence.
- External verifier bridge →
SymPyVerifier,ExecVerifier(sandboxed subprocess),RegexVerifier, composed viaVerifierChainthat short-circuits on rejection. - Self-consistency voter →
SelfConsistencyVoterwithuniform | confidence | inverse_energyweighting and exact / numeric bucketing.
The pipeline is CPU-testable end-to-end without the model — every layer has a deterministic rule-based / algorithmic implementation so tests don't require GPUs.
Consequences¶
Positive - Fast CI (no GPU / no model download for 95 % of tests). - Clean citation: paper → model repo; deployment → system repo. - Pipeline can wrap third-party reasoners (Qwen, Llama, DeepSeek) with the same verifier + voting layers. - Each Protocol is independently swappable.
Negative
- Two repos to maintain; cross-repo docs must be kept in sync.
- The v2 model release must pin a compatible ebrm-system minor version.
Mitigations
- Documentation tables show which ebrm-system versions match which ebrm checkpoints.
- Pipeline exports a Reasoner Protocol so model implementations can be version-gated at import time.
Alternatives considered¶
- Monorepo. Rejected: churn, heavier CI, weaker citability.
- Ship pipeline as part of model package. Rejected: forces GPU install for users who only want verifiers / voting.
- No pipeline, only a notebook. Rejected: users cannot
pip installa notebook.