Lumera Mainnet Halt: What Happened and What We’re Building

SHARE:

On March 25, 2026, the Lumera mainnet experienced a consensus halt at block height 4,288,927. The chain stopped producing finalized blocks for several hours. No funds were lost. No fork occurred. No state was corrupted.

But it stopped. And the community deserves a clear, honest account of why.

What Happened

The halt was triggered by APP_HASH divergence: validators were computing different state hashes for the same block. CometBFT detected this and refused to finalize. The network entered a prevote-nil state and stopped.

The root cause was traced to EVIDENCE_TYPE_CASCADE_CLIENT_FAILURE, a new evidence type being used in production for the first time. It contained a map-based metadata field included in consensus-critical state.

The problem: Go maps don’t have a guaranteed iteration order. The same inputs produced different serialized byte outputs across different validator environments, and different bytes meant different state hashes. Validators couldn’t agree on what happened, so they didn’t.

Why This Was Hard to Catch

Map-order nondeterminism is probabilistic. It doesn’t fail consistently. Standard testing validates that transactions succeed and queries return correct results. It doesn’t validate that identical inputs produce identical byte sequences across independently operated nodes.

That requires a different class of testing: multi-node comparison, byte-level validation, and adversarial inputs. We hadn’t built it yet. This incident made it a priority.

How We Responded

Root cause was isolated in under 90 minutes. A containment hotfix (v1.11.1) was shipped, disabling public submission of the affected evidence type. Validators coordinated on rollback and upgrade. Consensus resumed roughly 3 to 4 hours after the hotfix was released.

The validator community’s response was exceptional. They moved quickly, communicated clearly, and executed a coordinated recovery. That is what a healthy decentralized network looks like under pressure.

What We’re Building

  • PR #110: Enforced deterministic serialization across all consensus-critical paths. Map-based structures are normalized before serialization. Replay validation confirms the fix.
  • PR #111: New CI layer with ephemeral multi-validator networks (6+ nodes), adversarial test payloads, and per-height app hash comparisons. Any divergence fails the build automatically.
  • New Determinism Test Suite: cross-validator hash consistency, JSON/map key permutation testing, restart consistency validation, and safe rejection of unsupported evidence types.
  • Updated release policy: every release will now execute all transaction types across multiple submission waves, with post-execution validation requiring 100 blocks of continuous consensus and a single agreed app hash across all nodes.

The Bigger Picture

This wasn’t a flaw in Lumera’s core design. CometBFT did exactly what it was supposed to do: detect the divergence and halt safely before anything could be corrupted. The safety net held.

What this incident exposed was a gap in our testing methodology. We are closing that gap comprehensively, across code, testing infrastructure, CI, and release processes.

The standard we’re building toward is not whether a feature passes its tests. It’s whether it is safe to run in a distributed consensus environment across every validator on the network.

Thank you to everyone who helped coordinate the recovery. The Lumera Team

Scroll to Top