Architecture¶

flowchart TD
    Q[query] --> C[Intent Classifier]
    C -->|difficulty + budget| R[Hierarchical Reasoner]
    R --> L[Adaptive Langevin<br/>K parallel traces]
    L --> P[Process Reward Model]
    P --> V[Verifier Chain<br/>SymPy / Exec / Regex]
    V --> S[Self-Consistency Voter]
    S --> A[answer + audit trail]

Layers¶

1. Intent classifier¶

Routes queries and assigns a compute budget. Rule-based baseline in ebrm_system.intent.RuleBasedClassifier; swap with a neural classifier via the Classifier Protocol.

2. Hierarchical latent reasoner¶

Inner latent-thought loop. Coconut-inspired. Implemented in ebrm_system.core (WIP).

3. Adaptive Langevin¶

Test-time compute scaled with difficulty. N steps, R restarts, K parallel traces — all controlled by the classifier's IntentPrediction. Implemented in ebrm_system.inference (WIP).

4. Process reward model¶

Stepwise energy becomes per-trace confidence. Implemented in ebrm_system.reward (WIP).

5. Verifier chain¶

Mechanical checks: SymPyVerifier for algebraic equality, ExecVerifier for sandboxed Python, RegexVerifier for format. Composed via VerifierChain, which short-circuits on the first rejection.

6. Self-consistency voter¶

Aggregates K traces into a consensus. Supports exact and numerical bucketing, uniform / confidence / inverse-energy weighting.

Design invariants¶

Mechanical verification only. Verifiers never ask an LLM to grade an LLM.
Everything is a Protocol. Swap any layer in one line.
CPU-testable. Tests do not require GPUs or model downloads.