Arxi System-Test and Validation Architecture
Audience: Engineers maintaining release confidence, deterministic validation, and audit-ready test artifacts.
Table of Contents
- Executive Overview
- System-Test Contract
- Registry-Driven Inventory
- Coverage Matrix and Gap Model
- Artifact and Transcript Contract
- Execution Tooling
- Suite Structure
- File-by-File Cross Reference
Executive Overview
Arxi system-tests validate behavior through real production boundaries (CLI and adapter ingest), emit structured artifacts, and track coverage using registry + gaps TOML files. The harness is designed to keep runs deterministic and inspectable.
F:system-tests/README.md L12-L39 F:system-tests/AGENTS.md L12-L35
System-Test Contract
System-tests require:
- fail-closed assertions,
- no sleep-based correctness,
- production command and data surfaces,
- mandatory per-test artifact emission.
F:system-tests/AGENTS.md L14-L37
Feature gating keeps system-tests explicit in CI and local runs. F:system-tests/README.md L34-L45
Registry-Driven Inventory
system-tests/test_registry.toml is authoritative for:
- categories,
- per-test metadata,
- command entrypoints,
- required artifacts,
- estimated runtime.
F:system-tests/test_registry.toml L5-L14 F:system-tests/test_registry.toml L15-L480
Coverage Matrix and Gap Model
system-tests/TEST_MATRIX.md defines P0/P1/P2 objective coverage snapshots.
F:system-tests/TEST_MATRIX.md L12-L42
system-tests/test_gaps.toml tracks open/closed gaps with explicit acceptance
criteria and category/priority mapping.
F:system-tests/test_gaps.toml L4-L74
F:system-tests/test_gaps.toml L75-L190
As of 2026-02-07, all tracked P1 core gaps are closed (composite selector
dedup, partial-segment proof anchors, cross-segment mismatch fail-closed,
attachment closure fail-closed, SQLite/in-memory parity). P2 stress/perf
coverage remains intentionally open.
As of the same date, OSS Launch 0 security findings are system-test gated for
CLI boundary limits/policies, manifest structural tamper paths, SQLite
corruption materialization, and single-open-segment enforcement.
As of the same date, sidecar HTTP lifecycle and restart idempotency persistence
are system-test gated with real sidecar subprocess + TCP transport workflows.
As of 2026-02-08, the CLI OSS world-class expansion follow-up is fully mirrored
in system-tests: recorder-id shape validation parity, attachment-recording
hostile-input fail-closed checks, auto-seal duration/combined lifecycle lanes,
query JSON pagination + over-limit guardrails, and Decision Gate CLI
ingest-fixture strict-fail/success command paths.
As of 2026-02-08, sidecar container packaging is also system-test gated via the
sidecar_docker suite (asset hardening checks + Docker Compose e2e lane with
explicit skip/fail policy via ARXI_REQUIRE_DOCKER).
As of 2026-02-08, the Docker Compose lane additionally validates containerized
startup/readiness probe behavior (/startup, /ready) before and after
segment-open lifecycle transitions.
Artifact and Transcript Contract
Each test run emits at minimum:
summary.json,summary.md,tool_transcript.json.
TestReporter and TestArtifacts create deterministic run roots, enforce
run-root reuse policy, and produce standardized summary documents.
F:system-tests/tests/helpers/artifacts.rs L65-L131 F:system-tests/tests/helpers/artifacts.rs L133-L214 F:system-tests/tests/helpers/cli.rs L19-L107
Execution Tooling
Python helpers:
test_runner.py: registry-based execution with optional parallelism, per-test artifact roots, and manifest generation.coverage_report.py: generates docs from registry + gaps.gap_tracker.py: lists/shows/closes gaps and generates implementation prompts.
F:scripts/system_tests/test_runner.py L64-L112 F:scripts/system_tests/test_runner.py L119-L199 F:scripts/system_tests/coverage_report.py L43-L101 F:scripts/system_tests/gap_tracker.py L92-L140
Suite Structure
Suite modules cover:
smoke: CLI startup and help/version checks,bundle: build/verify/inspect and tamper detection,persistence: restart, determinism, and SQLite/in-memory parity checks,operations: query ordering/cursor plus JSON pagination/limit guardrails and recorder-id + auto-seal config validation parity checks,security: bounded CLI input surfaces, malformed-identifier rejection, secure signer-file policy, signer-rotation recovery/corruption behavior, contract path safety, hostile bundle parse-boundary checks, and hostilerecord-with-attachmentsboundary checks,recorder: lifecycle plus auto-seal count/duration/combined behavior and attachment-recording persistence checks over the real CLI boundary,sidecar: real sidecar process lifecycle over HTTP (record/seal/build/verify) and restart-boundary idempotency replay/conflict persistence checks,sidecar_docker: Dockerfile/Compose/config hardening checks and Docker Compose build/up/down with containerized sidecar startup/readiness probe checks plus open/record/query workflow,integration_openclaw: fixture-driven OpenClaw gateway/CLI ingest, signed/unsigned verification lanes, sequence-gap policy checks, sensitive field redaction, and bounded payload handling checks.integration_decision_gate: fixture-driven Decision Gate MCP runpack flow ingest through the productionarxi-decision-gate-adaptercrate, signed/unsigned verification lanes, runpack-integrity strict-vs-anomaly policy checks (including manifest self-integrity recomputation), sensitive transcript-field redaction, bounded transcript payload handling checks, CLIdecision-gate ingest-fixturecommand-path validation, and a fixture conformance gate that enforces canonical Decision Gate tool request/response shapes (including export-vs-verifychecked_filessemantics).
F:system-tests/tests/suites/smoke.rs L15-L43 F:system-tests/tests/suites/recorder.rs L20-L678 F:system-tests/tests/suites/bundle.rs L64-L684 F:system-tests/tests/suites/persistence.rs L24-L468 F:system-tests/tests/suites/operations.rs L23-L570 F:system-tests/tests/suites/security.rs L19-L1024 F:system-tests/tests/suites/sidecar.rs F:system-tests/tests/suites/sidecar_docker.rs F:system-tests/tests/suites/integration_openclaw.rs L1-L200 F:system-tests/tests/suites/integration_decision_gate.rs L1-L1161 F:Docs/architecture/arxi_openclaw_integration_architecture.md L1-L160 F:Docs/architecture/arxi_decision_gate_integration_architecture.md L1-L170
File-by-File Cross Reference
| Area | File | Notes |
|---|---|---|
| Contract and standards | system-tests/AGENTS.md | Behavioral and artifact requirements for system-tests. |
| Execution overview | system-tests/README.md | How to run and extend suites. |
| Coverage snapshot | system-tests/TEST_MATRIX.md | P0/P1/P2 matrix. |
| Test registry | system-tests/test_registry.toml | Authoritative inventory and run commands. |
| Gap tracker data | system-tests/test_gaps.toml | Coverage gaps and acceptance criteria. |
| Artifact helper | system-tests/tests/helpers/artifacts.rs | Run-root and summary generation contract. |
| CLI helper | system-tests/tests/helpers/cli.rs | Real CLI command execution and transcript capture. |
| Sidecar helper | system-tests/tests/helpers/sidecar.rs | Real sidecar process start/stop and HTTP transcript capture. |
| Docker helper | system-tests/tests/helpers/docker.rs | Docker daemon/compose probes and command helpers for containerized lanes. |
| Sidecar suite | system-tests/tests/suites/sidecar.rs | Sidecar HTTP lifecycle and restart-idempotency validation. |
| Sidecar Docker suite | system-tests/tests/suites/sidecar_docker.rs | Sidecar container packaging hardening and Docker Compose workflow validation, including startup/readiness probes. |
| OpenClaw integration suite | system-tests/tests/suites/integration_openclaw.rs | Fixture-driven adapter ingest validation for gateway + CLI mock flows. |
| OpenClaw fixtures | system-tests/tests/fixtures/openclaw_gateway_mock_events.json | Gateway mock flow event fixture aligned to OpenClaw event schema. |
| OpenClaw fixtures | system-tests/tests/fixtures/openclaw_cli_mock_events.json | CLI fallback-style flow event fixture aligned to OpenClaw event schema. |
| OpenClaw integration architecture | Docs/architecture/arxi_openclaw_integration_architecture.md | Versioned mapping, redaction, and bounded payload policy contract. |
| Decision Gate production adapter | crates/arxi-decision-gate-adapter/src/adapter.rs | Canonical Decision Gate-to-Arxi mapping implementation exercised by system-tests. |
| Decision Gate integration suite | system-tests/tests/suites/integration_decision_gate.rs | Fixture-driven MCP runpack flow validation for control-plane coupling. |
| Decision Gate fixture | system-tests/tests/fixtures/decision_gate_runpack_mock_flow.json | Mock runpack MCP flow fixture aligned to Decision Gate transcript and runpack manifest layout. |
| Decision Gate integration architecture | Docs/architecture/arxi_decision_gate_integration_architecture.md | Versioned MCP flow mapping, runpack integrity policy, and transcript redaction/bounds contract. |
| Env parsing | system-tests/src/config/env.rs | Strict environment parsing for test config. |
| Runner script | scripts/system_tests/test_runner.py | Registry-driven execution engine. |
| Coverage docs generator | scripts/system_tests/coverage_report.py | Generated testing docs pipeline. |
| Gap management script | scripts/system_tests/gap_tracker.py | Gap lifecycle tooling. |