Arxi System-Test and Validation Architecture

Audience: Engineers maintaining release confidence, deterministic validation, and audit-ready test artifacts.

Executive Overview
System-Test Contract
Registry-Driven Inventory
Coverage Matrix and Gap Model
Artifact and Transcript Contract
Execution Tooling
Suite Structure
File-by-File Cross Reference

Executive Overview

Arxi system-tests validate behavior through real production boundaries (CLI and adapter ingest), emit structured artifacts, and track coverage using registry + gaps TOML files. The harness is designed to keep runs deterministic and inspectable.

F:system-tests/README.md L12-L39 F:system-tests/AGENTS.md L12-L35

System-Test Contract

System-tests require:

fail-closed assertions,
no sleep-based correctness,
production command and data surfaces,
mandatory per-test artifact emission.

F:system-tests/AGENTS.md L14-L37

Feature gating keeps system-tests explicit in CI and local runs. F:system-tests/README.md L34-L45

Registry-Driven Inventory

system-tests/test_registry.toml is authoritative for:

categories,
per-test metadata,
command entrypoints,
required artifacts,
estimated runtime.

F:system-tests/test_registry.toml L5-L14 F:system-tests/test_registry.toml L15-L480

Coverage Matrix and Gap Model

system-tests/TEST_MATRIX.md defines P0/P1/P2 objective coverage snapshots. F:system-tests/TEST_MATRIX.md L12-L42

system-tests/test_gaps.toml tracks open/closed gaps with explicit acceptance criteria and category/priority mapping. F:system-tests/test_gaps.toml L4-L74 F:system-tests/test_gaps.toml L75-L190

As of 2026-02-07, all tracked P1 core gaps are closed (composite selector dedup, partial-segment proof anchors, cross-segment mismatch fail-closed, attachment closure fail-closed, SQLite/in-memory parity). P2 stress/perf coverage remains intentionally open. As of the same date, OSS Launch 0 security findings are system-test gated for CLI boundary limits/policies, manifest structural tamper paths, SQLite corruption materialization, and single-open-segment enforcement. As of the same date, sidecar HTTP lifecycle and restart idempotency persistence are system-test gated with real sidecar subprocess + TCP transport workflows. As of 2026-02-08, the CLI OSS world-class expansion follow-up is fully mirrored in system-tests: recorder-id shape validation parity, attachment-recording hostile-input fail-closed checks, auto-seal duration/combined lifecycle lanes, query JSON pagination + over-limit guardrails, and Decision Gate CLI ingest-fixture strict-fail/success command paths. As of 2026-02-08, sidecar container packaging is also system-test gated via the sidecar_docker suite (asset hardening checks + Docker Compose e2e lane with explicit skip/fail policy via ARXI_REQUIRE_DOCKER). As of 2026-02-08, the Docker Compose lane additionally validates containerized startup/readiness probe behavior (/startup, /ready) before and after segment-open lifecycle transitions.

Artifact and Transcript Contract

Each test run emits at minimum:

summary.json,
summary.md,
tool_transcript.json.

TestReporter and TestArtifacts create deterministic run roots, enforce run-root reuse policy, and produce standardized summary documents.

F:system-tests/tests/helpers/artifacts.rs L65-L131 F:system-tests/tests/helpers/artifacts.rs L133-L214 F:system-tests/tests/helpers/cli.rs L19-L107

Execution Tooling

Python helpers:

test_runner.py: registry-based execution with optional parallelism, per-test artifact roots, and manifest generation.
coverage_report.py: generates docs from registry + gaps.
gap_tracker.py: lists/shows/closes gaps and generates implementation prompts.

F:scripts/system_tests/test_runner.py L64-L112 F:scripts/system_tests/test_runner.py L119-L199 F:scripts/system_tests/coverage_report.py L43-L101 F:scripts/system_tests/gap_tracker.py L92-L140

Suite Structure

Suite modules cover:

smoke: CLI startup and help/version checks,
bundle: build/verify/inspect and tamper detection,
persistence: restart, determinism, and SQLite/in-memory parity checks,
operations: query ordering/cursor plus JSON pagination/limit guardrails and recorder-id + auto-seal config validation parity checks,
security: bounded CLI input surfaces, malformed-identifier rejection, secure signer-file policy, signer-rotation recovery/corruption behavior, contract path safety, hostile bundle parse-boundary checks, and hostile record-with-attachments boundary checks,
recorder: lifecycle plus auto-seal count/duration/combined behavior and attachment-recording persistence checks over the real CLI boundary,
sidecar: real sidecar process lifecycle over HTTP (record/seal/build/verify) and restart-boundary idempotency replay/conflict persistence checks,
sidecar_docker: Dockerfile/Compose/config hardening checks and Docker Compose build/up/down with containerized sidecar startup/readiness probe checks plus open/record/query workflow,
integration_openclaw: fixture-driven OpenClaw gateway/CLI ingest, signed/unsigned verification lanes, sequence-gap policy checks, sensitive field redaction, and bounded payload handling checks.
integration_decision_gate: fixture-driven Decision Gate MCP runpack flow ingest through the production arxi-decision-gate-adapter crate, signed/unsigned verification lanes, runpack-integrity strict-vs-anomaly policy checks (including manifest self-integrity recomputation), sensitive transcript-field redaction, bounded transcript payload handling checks, CLI decision-gate ingest-fixture command-path validation, and a fixture conformance gate that enforces canonical Decision Gate tool request/response shapes (including export-vs-verify checked_files semantics).

File-by-File Cross Reference

Area	File	Notes
Contract and standards	`system-tests/AGENTS.md`	Behavioral and artifact requirements for system-tests.
Execution overview	`system-tests/README.md`	How to run and extend suites.
Coverage snapshot	`system-tests/TEST_MATRIX.md`	P0/P1/P2 matrix.
Test registry	`system-tests/test_registry.toml`	Authoritative inventory and run commands.
Gap tracker data	`system-tests/test_gaps.toml`	Coverage gaps and acceptance criteria.
Artifact helper	`system-tests/tests/helpers/artifacts.rs`	Run-root and summary generation contract.
CLI helper	`system-tests/tests/helpers/cli.rs`	Real CLI command execution and transcript capture.
Sidecar helper	`system-tests/tests/helpers/sidecar.rs`	Real sidecar process start/stop and HTTP transcript capture.
Docker helper	`system-tests/tests/helpers/docker.rs`	Docker daemon/compose probes and command helpers for containerized lanes.
Sidecar suite	`system-tests/tests/suites/sidecar.rs`	Sidecar HTTP lifecycle and restart-idempotency validation.
Sidecar Docker suite	`system-tests/tests/suites/sidecar_docker.rs`	Sidecar container packaging hardening and Docker Compose workflow validation, including startup/readiness probes.
OpenClaw integration suite	`system-tests/tests/suites/integration_openclaw.rs`	Fixture-driven adapter ingest validation for gateway + CLI mock flows.
OpenClaw fixtures	`system-tests/tests/fixtures/openclaw_gateway_mock_events.json`	Gateway mock flow event fixture aligned to OpenClaw event schema.
OpenClaw fixtures	`system-tests/tests/fixtures/openclaw_cli_mock_events.json`	CLI fallback-style flow event fixture aligned to OpenClaw event schema.
OpenClaw integration architecture	Docs/architecture/arxi_openclaw_integration_architecture.md	Versioned mapping, redaction, and bounded payload policy contract.
Decision Gate production adapter	`crates/arxi-decision-gate-adapter/src/adapter.rs`	Canonical Decision Gate-to-Arxi mapping implementation exercised by system-tests.
Decision Gate integration suite	`system-tests/tests/suites/integration_decision_gate.rs`	Fixture-driven MCP runpack flow validation for control-plane coupling.
Decision Gate fixture	`system-tests/tests/fixtures/decision_gate_runpack_mock_flow.json`	Mock runpack MCP flow fixture aligned to Decision Gate transcript and runpack manifest layout.
Decision Gate integration architecture	Docs/architecture/arxi_decision_gate_integration_architecture.md	Versioned MCP flow mapping, runpack integrity policy, and transcript redaction/bounds contract.
Env parsing	`system-tests/src/config/env.rs`	Strict environment parsing for test config.
Runner script	`scripts/system_tests/test_runner.py`	Registry-driven execution engine.
Coverage docs generator	`scripts/system_tests/coverage_report.py`	Generated testing docs pipeline.
Gap management script	`scripts/system_tests/gap_tracker.py`	Gap lifecycle tooling.