Arxi Docs

Proof recording and tamper-evident evidence documentation.

Other product docs

Arxi System-Test and Validation Architecture

Audience: Engineers maintaining release confidence, deterministic validation, and audit-ready test artifacts.


Table of Contents

  1. Executive Overview
  2. System-Test Contract
  3. Registry-Driven Inventory
  4. Coverage Matrix and Gap Model
  5. Artifact and Transcript Contract
  6. Execution Tooling
  7. Suite Structure
  8. File-by-File Cross Reference

Executive Overview

Arxi system-tests validate behavior through real production boundaries (CLI and adapter ingest), emit structured artifacts, and track coverage using registry + gaps TOML files. The harness is designed to keep runs deterministic and inspectable.

F:system-tests/README.md L12-L39 F:system-tests/AGENTS.md L12-L35


System-Test Contract

System-tests require:

  • fail-closed assertions,
  • no sleep-based correctness,
  • production command and data surfaces,
  • mandatory per-test artifact emission.

F:system-tests/AGENTS.md L14-L37

Feature gating keeps system-tests explicit in CI and local runs. F:system-tests/README.md L34-L45


Registry-Driven Inventory

system-tests/test_registry.toml is authoritative for:

  • categories,
  • per-test metadata,
  • command entrypoints,
  • required artifacts,
  • estimated runtime.

F:system-tests/test_registry.toml L5-L14 F:system-tests/test_registry.toml L15-L480


Coverage Matrix and Gap Model

system-tests/TEST_MATRIX.md defines P0/P1/P2 objective coverage snapshots. F:system-tests/TEST_MATRIX.md L12-L42

system-tests/test_gaps.toml tracks open/closed gaps with explicit acceptance criteria and category/priority mapping. F:system-tests/test_gaps.toml L4-L74 F:system-tests/test_gaps.toml L75-L190

As of 2026-02-07, all tracked P1 core gaps are closed (composite selector dedup, partial-segment proof anchors, cross-segment mismatch fail-closed, attachment closure fail-closed, SQLite/in-memory parity). P2 stress/perf coverage remains intentionally open. As of the same date, OSS Launch 0 security findings are system-test gated for CLI boundary limits/policies, manifest structural tamper paths, SQLite corruption materialization, and single-open-segment enforcement. As of the same date, sidecar HTTP lifecycle and restart idempotency persistence are system-test gated with real sidecar subprocess + TCP transport workflows. As of 2026-02-08, the CLI OSS world-class expansion follow-up is fully mirrored in system-tests: recorder-id shape validation parity, attachment-recording hostile-input fail-closed checks, auto-seal duration/combined lifecycle lanes, query JSON pagination + over-limit guardrails, and Decision Gate CLI ingest-fixture strict-fail/success command paths. As of 2026-02-08, sidecar container packaging is also system-test gated via the sidecar_docker suite (asset hardening checks + Docker Compose e2e lane with explicit skip/fail policy via ARXI_REQUIRE_DOCKER). As of 2026-02-08, the Docker Compose lane additionally validates containerized startup/readiness probe behavior (/startup, /ready) before and after segment-open lifecycle transitions.


Artifact and Transcript Contract

Each test run emits at minimum:

  • summary.json,
  • summary.md,
  • tool_transcript.json.

TestReporter and TestArtifacts create deterministic run roots, enforce run-root reuse policy, and produce standardized summary documents.

F:system-tests/tests/helpers/artifacts.rs L65-L131 F:system-tests/tests/helpers/artifacts.rs L133-L214 F:system-tests/tests/helpers/cli.rs L19-L107


Execution Tooling

Python helpers:

  • test_runner.py: registry-based execution with optional parallelism, per-test artifact roots, and manifest generation.
  • coverage_report.py: generates docs from registry + gaps.
  • gap_tracker.py: lists/shows/closes gaps and generates implementation prompts.

F:scripts/system_tests/test_runner.py L64-L112 F:scripts/system_tests/test_runner.py L119-L199 F:scripts/system_tests/coverage_report.py L43-L101 F:scripts/system_tests/gap_tracker.py L92-L140


Suite Structure

Suite modules cover:

  • smoke: CLI startup and help/version checks,
  • bundle: build/verify/inspect and tamper detection,
  • persistence: restart, determinism, and SQLite/in-memory parity checks,
  • operations: query ordering/cursor plus JSON pagination/limit guardrails and recorder-id + auto-seal config validation parity checks,
  • security: bounded CLI input surfaces, malformed-identifier rejection, secure signer-file policy, signer-rotation recovery/corruption behavior, contract path safety, hostile bundle parse-boundary checks, and hostile record-with-attachments boundary checks,
  • recorder: lifecycle plus auto-seal count/duration/combined behavior and attachment-recording persistence checks over the real CLI boundary,
  • sidecar: real sidecar process lifecycle over HTTP (record/seal/build/verify) and restart-boundary idempotency replay/conflict persistence checks,
  • sidecar_docker: Dockerfile/Compose/config hardening checks and Docker Compose build/up/down with containerized sidecar startup/readiness probe checks plus open/record/query workflow,
  • integration_openclaw: fixture-driven OpenClaw gateway/CLI ingest, signed/unsigned verification lanes, sequence-gap policy checks, sensitive field redaction, and bounded payload handling checks.
  • integration_decision_gate: fixture-driven Decision Gate MCP runpack flow ingest through the production arxi-decision-gate-adapter crate, signed/unsigned verification lanes, runpack-integrity strict-vs-anomaly policy checks (including manifest self-integrity recomputation), sensitive transcript-field redaction, bounded transcript payload handling checks, CLI decision-gate ingest-fixture command-path validation, and a fixture conformance gate that enforces canonical Decision Gate tool request/response shapes (including export-vs-verify checked_files semantics).

F:system-tests/tests/suites/smoke.rs L15-L43 F:system-tests/tests/suites/recorder.rs L20-L678 F:system-tests/tests/suites/bundle.rs L64-L684 F:system-tests/tests/suites/persistence.rs L24-L468 F:system-tests/tests/suites/operations.rs L23-L570 F:system-tests/tests/suites/security.rs L19-L1024 F:system-tests/tests/suites/sidecar.rs F:system-tests/tests/suites/sidecar_docker.rs F:system-tests/tests/suites/integration_openclaw.rs L1-L200 F:system-tests/tests/suites/integration_decision_gate.rs L1-L1161 F:Docs/architecture/arxi_openclaw_integration_architecture.md L1-L160 F:Docs/architecture/arxi_decision_gate_integration_architecture.md L1-L170


File-by-File Cross Reference

AreaFileNotes
Contract and standardssystem-tests/AGENTS.mdBehavioral and artifact requirements for system-tests.
Execution overviewsystem-tests/README.mdHow to run and extend suites.
Coverage snapshotsystem-tests/TEST_MATRIX.mdP0/P1/P2 matrix.
Test registrysystem-tests/test_registry.tomlAuthoritative inventory and run commands.
Gap tracker datasystem-tests/test_gaps.tomlCoverage gaps and acceptance criteria.
Artifact helpersystem-tests/tests/helpers/artifacts.rsRun-root and summary generation contract.
CLI helpersystem-tests/tests/helpers/cli.rsReal CLI command execution and transcript capture.
Sidecar helpersystem-tests/tests/helpers/sidecar.rsReal sidecar process start/stop and HTTP transcript capture.
Docker helpersystem-tests/tests/helpers/docker.rsDocker daemon/compose probes and command helpers for containerized lanes.
Sidecar suitesystem-tests/tests/suites/sidecar.rsSidecar HTTP lifecycle and restart-idempotency validation.
Sidecar Docker suitesystem-tests/tests/suites/sidecar_docker.rsSidecar container packaging hardening and Docker Compose workflow validation, including startup/readiness probes.
OpenClaw integration suitesystem-tests/tests/suites/integration_openclaw.rsFixture-driven adapter ingest validation for gateway + CLI mock flows.
OpenClaw fixturessystem-tests/tests/fixtures/openclaw_gateway_mock_events.jsonGateway mock flow event fixture aligned to OpenClaw event schema.
OpenClaw fixturessystem-tests/tests/fixtures/openclaw_cli_mock_events.jsonCLI fallback-style flow event fixture aligned to OpenClaw event schema.
OpenClaw integration architectureDocs/architecture/arxi_openclaw_integration_architecture.mdVersioned mapping, redaction, and bounded payload policy contract.
Decision Gate production adaptercrates/arxi-decision-gate-adapter/src/adapter.rsCanonical Decision Gate-to-Arxi mapping implementation exercised by system-tests.
Decision Gate integration suitesystem-tests/tests/suites/integration_decision_gate.rsFixture-driven MCP runpack flow validation for control-plane coupling.
Decision Gate fixturesystem-tests/tests/fixtures/decision_gate_runpack_mock_flow.jsonMock runpack MCP flow fixture aligned to Decision Gate transcript and runpack manifest layout.
Decision Gate integration architectureDocs/architecture/arxi_decision_gate_integration_architecture.mdVersioned MCP flow mapping, runpack integrity policy, and transcript redaction/bounds contract.
Env parsingsystem-tests/src/config/env.rsStrict environment parsing for test config.
Runner scriptscripts/system_tests/test_runner.pyRegistry-driven execution engine.
Coverage docs generatorscripts/system_tests/coverage_report.pyGenerated testing docs pipeline.
Gap management scriptscripts/system_tests/gap_tracker.pyGap lifecycle tooling.