Freshness and Replay

Freshness describes how current the read daemon’s projections are relative to the commit log. Replay is the mechanism that keeps projections synchronized. Understanding both is essential for operating Asset Core at scale without surprising staleness.

Problem this concept solves

In event-sourced systems, writes and reads are separated:

The write daemon commits events immediately
The read daemon must process those events to update its view

This creates a freshness gap: the time between when a commit is acknowledged and when it appears in read queries. Understanding this gap is essential for:

Building applications that need consistent reads
Diagnosing apparent data “staleness”
Tuning system performance

Core ideas

Commit Log

The commit log is the append-only sequence of event batches. It is the only source used to rebuild projections, so it defines the authoritative timeline. Durability depends on the backend: in-memory logs are fast and disposable, while file/segmented logs persist across restarts. This design (covered in Runtime Model) is what makes AssetCore’s guarantees possible.

Append-only: New batches are added at the end (no in-place updates, no deletes)
Immutable: Once written, batches don’t change (cannot be modified or reordered)
Sequenced: Each batch has a monotonic sequence number (total ordering of all events)

The write daemon appends to the log; the read daemon tails it. This separation is critical: the write daemon never blocks on read performance, and the read daemon can lag behind without affecting write throughput. What you gain: write and read scaling independently, crash recovery when using a durable log backend, forensic debugging by replaying to any point. What you give up: reads are eventually consistent, not immediately consistent—there’s always some lag.

Projections

Projections are in-memory views of state. They are optimized for queries and can always be rebuilt from the log, which is why replay is a core invariant. This separation between “source of truth” (commit log) and “query interface” (projections) is fundamental to AssetCore’s architecture.

Built by applying events from the commit log (replay process, see below)
Optimized for query performance (indexed, denormalized for fast lookups)
Published atomically via snapshot swap (no partial updates visible to queries)

Projections are derived data. If lost, they can be rebuilt from the commit log. This property is what makes AssetCore operationally simple: you can always reconstruct query state from the log, and durability is a deployment choice. Without this, losing a read daemon would mean data loss or complex replication protocols. With it, losing a read daemon is inconvenient (rebuild time) but not catastrophic.

What you gain: disposable query state (rebuild at will), freedom to change projection structure (replay with new logic), confidence that analytics/notifications never drift from operational state (all consume same log). What you give up: rebuild time after crashes scales with log size, though checkpoints make this manageable.

Checkpoints

Checkpoints record progress through the commit log. They make recovery fast and allow you to reason about exactly how much state has been applied. Without checkpoints, every restart would require replaying the entire log from the beginning—feasible for small systems, catastrophic for production workloads with millions of events.

Write daemon: Tracks last committed sequence (so it knows where to resume after a crash)
Read daemon: Tracks last applied sequence (so it can resume replay without re-applying events)

Checkpoints enable:

Fast startup: Resume from checkpoint, not beginning (replaying 1,000 events instead of 1,000,000)
Crash recovery: Replay from checkpoint position forward (see Runtime Model)
Freshness calculation: Compare world sequence watermarks (lag = commit_log_seq - projection_seq)

What you gain: startup time measured in seconds not hours, predictable recovery windows, real-time freshness monitoring. What you give up: checkpoint overhead (periodic snapshot writes), though these are lightweight and configurable.

Freshness Metadata

Read responses include freshness information; clients can pass x-assetcore-min-world-seq to block until a minimum sequence. This is how you enforce read-your-writes and strict consistency at the client boundary. Without explicit freshness tracking, you’d have no idea whether a query returned stale data or fresh data—debugging becomes guesswork.

{
  "freshness": {
    "namespace": 1,
    "world_seq": 42,
    "commit_log_world_seq": 45,
    "lag": 3,
    "lag_ms": 125
  }
}

world_seq: The latest sequence applied to the read projection (what queries see)
commit_log_world_seq: The latest observed commit log sequence (what’s been committed)
lag: The difference in commits (how many batches behind)
lag_ms: The time difference in milliseconds (how old the data is)

The difference is the lag: how many commits haven’t been applied yet (and how long that gap is in milliseconds). This lag is normal and expected in event-sourced systems—it’s the cost of write/read separation. The critical part is that it’s measured and exposed, not hidden.

What you gain: visibility into staleness (know when data is fresh), ability to enforce consistency (block reads until fresh enough), debugging insight (correlate query issues with lag spikes). What you give up: slightly larger response payloads (freshness metadata in every read), but the operational value is worth it.

Replay Process

Replay is the mechanism that keeps projections synchronized with the commit log. It’s not just a recovery feature—it’s the core operational loop for the read daemon. Understanding replay is understanding how AssetCore works.

When the read daemon tails the commit log:

Fetch next batch after checkpoint (from commit log storage)
Apply each event via L1 setters (see Runtime Model for layer details)
Update checkpoint (record progress)
Publish new snapshot (atomic swap, no partial updates visible)

Events are idempotent: applying the same event twice produces the same state. This makes replay safe to retry and is what enables deterministic recovery after partial failures. More importantly, it means replay isn’t fragile—if you’re unsure whether an event was applied, just apply it again.

Why replay is a killer feature: It’s not just recovery from crashes. It’s time-travel debugging (replay to 3:47 PM Tuesday and inspect exact state), simulation (replay production events against experimental logic), analytics (replay log into different projection structure), and testing (verify behavior against real event streams). Systems without deterministic replay can’t do any of this reliably.

Recovery Flow

After a crash, AssetCore recovers automatically without manual intervention. This is one of the most operationally valuable properties of the architecture—no 3 AM pages to manually restore state.

After a crash:

Load checkpoint from disk (last known good state snapshot)
Fetch events from checkpoint position (only events not yet applied)
Replay events to rebuild state (via L1 setters, fast and deterministic)
Resume normal tail loop (back to steady-state operation)

Because events carry post-state (see Runtime Model), replay doesn’t need to re-execute business logic. It simply sets the final values, eliminating non-determinism during recovery. This is what makes recovery fast (no validation overhead) and reliable (no possibility of divergence).

What you gain: automatic recovery (no manual steps), predictable recovery time (scales with events since checkpoint), confidence that recovered state is byte-identical to pre-crash state. What you give up: recovery isn’t instant—it takes time proportional to checkpoint interval, but this is typically seconds not minutes.

How it fits into the system

Write Path

Client → Write Daemon → Commit Log
                         ↓
              [checkpoint updated]

The write daemon updates its checkpoint after successful persistence.

Read Path

Commit Log → Read Daemon → Projections → Client
    ↓
[checkpoint updated after apply]

The read daemon publishes snapshots before updating its checkpoint. This ensures:

Queries always see consistent state
Crashes don’t lose applied but uncheckpointed work

Freshness Reporting

The read health endpoint reports lag for a namespace:

{
  "status": "ready",
  "freshness": {
    "namespace": 1,
    "world_seq": 98,
    "commit_log_world_seq": 100,
    "lag": 2,
    "lag_ms": 250
  }
}

Monitor this to detect:

Normal operation (lag near 0)
Temporary burst (lag spikes then recovers)
Systematic issues (lag grows continuously)

Key invariants and guarantees

Eventual Consistency

Projections will eventually reflect all committed events:

The read daemon continuously tails the log
Lag may spike during bursts but recovers
No committed event is permanently invisible

Publish-Before-Checkpoint

Snapshots are published before checkpoints are updated:

Queries see data that is guaranteed applied
Crashes don’t cause data to “disappear”
Recovery replays from last safe point

Deterministic Replay

Replay produces identical state:

Events carry post-state values
No business logic during replay
Same events → same state

Checkpoint Safety

Checkpoints are persisted atomically:

No partial checkpoint writes
Recovery always finds valid checkpoint
Progress is never lost across restarts