Skip to content

Add a dormant, best-effort v4 RDS Postgres dual-write to the CI benchmark emitters#8512

Closed
connortsui20 wants to merge 33 commits into
developfrom
ct/bench-v4-emitters
Closed

Add a dormant, best-effort v4 RDS Postgres dual-write to the CI benchmark emitters#8512
connortsui20 wants to merge 33 commits into
developfrom
ct/bench-v4-emitters

Conversation

@connortsui20

Copy link
Copy Markdown
Member

The CI benchmark emitters currently need a manual vortex-bench-migrate run to populate the new v4 RDS Postgres database. This adds a best-effort, env-gated dual-write so the emitters populate it directly, ported from the unmerged ct/bench-v4 branch onto current develop. It ships dormant: every v4 step is gated on vars.GH_BENCH_INGEST_ROLE_ARN being set and carries continue-on-error: true, and the existing v2 (static S3 JSON) and v3 (DuckDB + /api/ingest) paths are left fully intact and run first. With the variable unset the new code is dead-but-safe — a v4 write failure can never fail a workflow. Turning it on (setting the role ARN, repointing the site URL) is a separate ops change after this merges.

What's in the diff:

  • A Python port of the measurement_id xxhash64 key (scripts/_measurement_id.py), the cross-language golden vectors, and a parity test asserting byte-for-byte agreement with the Rust reference. The golden vectors are the contract.
  • A --postgres mode in scripts/post-ingest.py: RDS IAM auth, verify-full TLS, a five-table-plus-commit upsert in a single transaction, a NaN/Inf guard, and a best-effort site revalidate. The existing --server (v3) path stays stdlib-only and untouched.
  • Testcontainer writer tests against Postgres 16, a revalidate test, and an operator cross-check utility.
  • The CI wiring (parity test in the required job, testcontainer suite in a docker-gated job) and the dormant v4 ingest step added to the three emitter workflows.

The parity test and the Postgres-16 testcontainer suite run in CI. This branch also carries big-plans orchestration scaffolding under .big-plans/ (the spine, per-step plan files, and the plan: commits); that's transient and will be removed in a follow-up cleanup PR rather than being part of the shipped feature. A few hardening and doc nits surfaced during review are tracked as deferred, non-blocking items in the spine.

🤖 Generated with Claude Code

Seed the bench-v4 CI emitter dual-write spine from the parallel exploration
sweep: goal, architecture decisions, out-of-scope, risks, scoped reviewer BANS,
and accepted tradeoffs. Phase Map decomposition deferred to Step 1.4.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Write the design spec and fold the three resolved design decisions into the
spine: code-port scope (everything incl. extras), the 4-phase A/D/C/B structure
with the ops-phase gauntlet/PR exemption, and the port-deps-as-is approach.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Fold grill-me outcomes into the spine and design spec: reorder phases to
D -> A -> C -> B; add the demo-safety model (no prod RDS writes until phase B,
phases C/B gated post-demo, pre-merge confidence via testcontainer PG16 +
golden tests with no prod/develop dependency, real-RDS verified read-only at B);
correct the deps mechanism to uv run --with (not PEP-723), the xxhash package
usage, and the global CA bundle; record the verified v4 failure-isolation
property and the testcontainer-vs-RDS gap risk.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Decompose the design into the Phase Map: Phase 1 (D, code) with sub-phases
1.1 measurement_id contract, 1.2 Postgres writer, 1.3 CI + workflow wiring
(SDD + gauntlet + PR + human gate, phase-4 depth); Phases 2-4 (A/C/B, ops) as
direct-CLI phases with machine-checkable CLI exit criteria and no gauntlet/PR.
Add the Orchestration notes section documenting the ops-phase protocol and the
dedicated-cleanup-PR wrap-up for resume.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Enter Phase 1 (D, code) at sub-phase 1.1 (measurement_id contract):
status implementing, phase_entry_sha null (filled by the next commit).

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Fill phase_entry_sha with commit 1's SHA (closes the two-commit entry window).

Signed-off-by: Connor Tsui <connor@spiraldb.com>
JIT writing-plans output for sub-phase 1.1: extract the three measurement_id
contract files from f9b36ae, repoint docstrings, verify golden parity.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Port the server-internal measurement_id xxhash64 hash to Python
(scripts/_measurement_id.py) with its 63-vector cross-language golden test
(scripts/test_measurement_id.py, scripts/measurement_id_golden.json), extracted
verbatim from the v4 emitter branch. Repoint the docstrings to the extracted
vortex-data/benchmarks-website repo. The golden vectors pin Rust == Python.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Collapse three lines the verbatim port carried from the source branch's
narrower line-length so the file passes `ruff format --check` (repo uses 120).
Whitespace-only; no test logic or values change.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Signed-off-by: Connor Tsui <connor@spiraldb.com>
Record sub-phase 1.1 (measurement_id contract) as gauntlet pr-2 accepted
(1 cycle), and park the 3 should-fix + 5 nit findings in Deferred work
(golden-vector coverage gaps are out-of-monorepo-scope; CI wiring + xxhash
importorskip + doc nits fold into sub-phase 1.3).

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Sub-phase 1.1 complete + gauntlet-accepted; advance position to 1.2,
status implementing.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
… stay out)

Class B plan-assumption violation resolved: the testcontainer writer test
depends on the real migrations/ + migrate-schema.py + the website repo's
schema.rs, all out of monorepo scope. Per user choice, adapt the test to a
self-contained scripts/_v4_schema_fixture.sql (6 tables) applied directly, drop
the migrate-runner dependency, and self-check SCHEMA_VERSION == 1 instead of
reading the absent schema.rs. Migrations + migrate-schema.py remain out.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
JIT writing-plans output for sub-phase 1.2: port the --postgres writer +
revalidate test + cross-check utility, and adapt the testcontainer test to a
self-contained schema fixture. 3 tasks.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Port the v4 Postgres dual-write path (RDS IAM auth, verify-full TLS, NaN/Inf
guard, 5-table + commit-dim upsert in one transaction, best-effort revalidate)
from the v4 emitter branch, keeping the v3 --server path stdlib-only and intact.
v4 deps (psycopg, boto3, xxhash) are imported lazily inside the --postgres path.
Repoint docstrings to the extracted vortex-data/benchmarks-website repo.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Port the pure-stdlib revalidate-hook test (asserts refresh_site_cache sends the
bearer header and swallows every failure so it can never change the ingest exit
code) and the cross_check_python_writer.py utility (confirms the Python writer
recomputes the same measurement_id as seeded Rust-loaded rows and UPDATEs rather
than duplicating) from the v4 emitter branch.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
…a fixture

Port the testcontainer Postgres writer test and adapt it off the out-of-scope
migrations/ + migrate-schema.py + benchmarks-website/server/src/schema.rs: apply
a self-contained scripts/_v4_schema_fixture.sql (the 6-table DDL mirroring the
website repo's migrations/001 plus the commit_timestamp column from 006 and its
covering index from 007) and self-check SCHEMA_VERSION == 1. The suite
spins postgres:16-alpine and exercises the real upsert/transaction path, the
load-bearing pre-merge confidence gate with no prod or develop dependency.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Signed-off-by: Connor Tsui <connor@spiraldb.com>
Record sub-phase 1.2 (Postgres writer) as gauntlet pr-3 accepted (1 cycle;
fresh + correctness + maint, zero must-fix; 100 testcontainer tests green), and
park the should-fix + nit findings in Deferred work (cross_check thin-commit
hardening, real-PG retry coverage, fixture/CONTRACT.md doc accuracy, and minor
port nits).

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Sub-phase 1.2 complete + gauntlet-accepted; advance position to 1.3,
status implementing.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
JIT writing-plans output for sub-phase 1.3: wire the scripts/ tests into ci.yml
(contract+revalidate in python-test, testcontainer in a docker-gated job, all
via uv run --with, no pyproject/uv.lock changes), and add the dormant
best-effort v4 step to the three emitter workflows. 2 tasks.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Add the measurement_id contract test + the revalidate test to the python-test
job (required, no docker), and a lightweight docker-gated scripts-test job for
the testcontainer Postgres writer suite. Deps are supplied per-invocation via
uv run --with (no pyproject/uv.lock changes).

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Insert the dormant, env-gated, continue-on-error v4 ingest step (OIDC
assume-role -> uv run post-ingest.py --postgres -> revalidate) after the v3
--server step in bench.yml, sql-benchmarks.yml, and v3-commit-metadata.yml
(the last also gains id-token: write). The block no-ops until
GH_BENCH_INGEST_ROLE_ARN is set and can never fail the job; the v3 path is
untouched.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Signed-off-by: Connor Tsui <connor@spiraldb.com>
Record sub-phase 1.3 (CI + workflow wiring) as gauntlet pr-3 accepted (1 cycle;
fresh + correctness + maint, zero must-fix). Park should-fixes: 4 cheap in-scope
ones (configure-aws-credentials SHA alignment, xxhash importorskip, AWS_REGION
message, revalidate method assertion) flagged to apply as a phase-D finalization
polish before the phase-end gauntlet; the rest are no-action nits / pre-existing
patterns. Note the scripts-test required-check as a branch-protection ops item.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Sub-phase 1.3 done + gauntlet-accepted; clear sub_phase (status stays reviewing)
to route into the Phase 3 phase-end review.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Align the new v4 configure-aws-credentials steps to the repo's existing v6 SHA
pin (one SHA per file); add pytest.importorskip("xxhash") so the measurement_id
test skips gracefully without xxhash locally; correct the RDS region error
message to AWS_DEFAULT_REGION (what boto3 reads); assert the revalidate request
method is POST.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Record the Phase 1 (D) gate: exit criteria all pass; phase-end gauntlet phase-4
accepted (spec + correctness + maint + arch, zero must-fix; 172 tests re-run
green). Fix the stale Accepted-tradeoffs PEP-723 wording (deps are via uv run
--with, not PEP-723). Park the phase-4 refactor/doc should-fixes in Deferred
work. status stays reviewing until the phase PR is opened.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Record a SESSION HANDOFF brief at the top of the spine for a clean fresh-session
resume: Phase 1 (D) is complete + phase-4 gauntlet accepted (gate in the ledger);
the open decision is the phase-D PR form (code-only [recommended] / include
scaffolding / hold); do not re-run the phase-end gauntlet on resume; demo-safety
and 1Password constraints still in force.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Record the phase-D PR (#8512, draft -> develop) in the Verdict Ledger and refresh
the SESSION HANDOFF brief: the open PR-form decision is resolved (user chose
include-scaffolding, so .big-plans/ rides in the diff), and the next step is the
Step 3.4 human gate. status stays reviewing here; the awaiting-human-gate
transition is the next commit.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
All three phase-boundary gate prerequisites now exist: exit criteria passed, the
phase-end gauntlet accepted, and the phase PR (#8512) number is committed. Move
Current Position to awaiting-human-gate so a resume re-presents the gate.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
@codspeed-hq

codspeed-hq Bot commented Jun 19, 2026

Copy link
Copy Markdown

Merging this PR will degrade performance by 10.28%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 3 improved benchmarks
❌ 8 regressed benchmarks
✅ 1570 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation decompress_rd[f64, (10000, 0.01)] 108.9 µs 139.3 µs -21.82%
Simulation decompress_rd[f64, (10000, 0.1)] 109.2 µs 139.6 µs -21.79%
Simulation decompress_rd[f64, (10000, 0.0)] 108.9 µs 139.2 µs -21.77%
Simulation decompress_rd[f32, (100000, 0.0)] 496.1 µs 583.9 µs -15.04%
WallTime cuda/bitpacked_u8/unpack/3bw[100M] 299.8 µs 352.7 µs -15%
Simulation decompress_rd[f32, (10000, 0.1)] 78.2 µs 91.4 µs -14.4%
Simulation decompress_rd[f32, (10000, 0.01)] 78.2 µs 91.1 µs -14.16%
Simulation decompress_rd[f32, (10000, 0.0)] 78.7 µs 91.3 µs -13.88%
Simulation bitwise_not_vortex_buffer_mut[128] 244.4 ns 215.3 ns +13.55%
Simulation bitwise_not_vortex_buffer_mut[1024] 304.7 ns 275.6 ns +10.58%
Simulation eq_i64_constant 317.9 µs 287.9 µs +10.42%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ct/bench-v4-emitters (9e601fb) with develop (9814173)

Open in CodSpeed

Opening the phase-D PR surfaced two real PR-only check failures, both fixed here
without touching any reviewed code:

- reuse-check: the .big-plans/ orchestration scaffolding (spine + task plans) that
  rides in this PR lacks SPDX headers. Add a REUSE.toml annotation licensing
  .big-plans/** as CC-BY-4.0, matching docs/ and .agents/skills/. The annotation is
  removed by the scaffolding-cleanup PR later.
- typos: the spell checker mis-splits the SQL verbs UPDATEs / UPDATEd / INSERTs /
  INSERTed in scripts/cross_check_python_writer.py docstrings (UPDATEs -> UPDAT).
  Add those four identifiers to _typos.toml's python ignore list rather than
  rewording the reviewed docstrings.

Verified locally: reuse lint compliant (2734/2734) and typos exits 0.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
Record the config-only reuse-check + Spell Check fixes (commit 05b6b79) in the
Verdict Ledger, plus the non-blocking DCO mismatch and the changelog label, so the
gate audit trail reflects why the final PR state differs from the gauntlet-accepted
tip.

Signed-off-by: Connor Tsui <connor@spiraldb.com>
@connortsui20

Copy link
Copy Markdown
Member Author

Superseded by #8513, a minimal code-only PR: just the --postgres writer, the measurement_id port, and the dormant workflow steps. The big-plans orchestration scaffolding and the test suite (testcontainer, revalidate, golden parity, cross-check) are intentionally left out of the repo.

@connortsui20 connortsui20 deleted the ct/bench-v4-emitters branch June 19, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant