diff --git a/.claude/skills/review-pr/SKILL.md b/.claude/skills/review-pr/SKILL.md index 9f0362a0..997ff8ca 100644 --- a/.claude/skills/review-pr/SKILL.md +++ b/.claude/skills/review-pr/SKILL.md @@ -51,9 +51,18 @@ Capture the PR identifier in `$PR` (the part of `$ARGUMENTS` left after strippin PR='' gh pr view "$PR" --json number,title,body,labels,state gh pr diff "$PR" +gh pr diff "$PR" --numstat # binary files show as `--` gh pr view "$PR" --comments ``` +**Committed-binary gate (runs at every level).** Scan the `--numstat` output for +any added/modified file git reports as binary (`-`/`-` in the added/deleted +columns). This repo builds its native/C libraries from source in CI and does not +commit build outputs, so any such file is a **Critical** finding regardless of +review level — report it even at level 0. See the "Committed build artifacts" +checklist for the rationale and the acceptable-exception (genuine test-input +fixtures only). + ## Step 2: PR title and description Check against CLAUDE.md conventions: @@ -144,7 +153,7 @@ Every agent receives: Launch the following agents in parallel. -**Agent 1 — Correctness & bugs:** NULL handling, edge cases, logic errors, off-by-one, operator precedence, error paths. Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite. +**Agent 1 — Correctness & bugs:** NULL handling, edge cases, logic errors, off-by-one, operator precedence, error paths. Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite. When the diff touches the store-and-forward sender, the async drainer / send loop, primary reconnect/failover, or pool startup (`lazy_connect` / `initial_connect_retry` / `SenderPool` / `QueryClientPool`), also verify the "Store-and-forward & pool startup invariants" checklist — a running drainer that propagates a transport error to the caller, imposes a reconnect time budget, or hard-fails on a transient outage is a Critical (data-loss) finding. **Agent 2 — Concurrency:** Race conditions, shared mutable state, missing volatile, lock ordering, thread-safety of data structures. Use the implicit contract list (lock order, thread-affinity) and check every callsite from 2.5b for violations of the new contract. @@ -154,7 +163,7 @@ Launch the following agents in parallel. **Agent 5 — Test coverage:** Coverage gaps, error path tests, NULL tests, boundary conditions, regression tests exist, `assertMemoryLeak()` usage. Cross-reference 2.5d: every cross-context exposure should have a test that exercises the changed symbol from that context. Missing tests for cross-context callsites is a high-priority finding. Test *efficacy* (whether those tests actually exercise the change and could fail) and test-*code* quality are handled by Agents 11-13 — here focus only on whether coverage exists for every new or changed path. -**Agent 6 — Code quality & standards:** Code smell, member ordering, naming conventions, modern Java features, dead code, third-party dependencies. +**Agent 6 — Code quality & standards:** Code smell, member ordering, naming conventions, modern Java features, dead code, third-party dependencies. Also scan the diff for any committed compiled binary / build artifact (run `git diff --numstat`/`--stat` and flag files git reports as binary) — the native/C libraries are built from source in CI, so a committed binary is a **Critical** finding (see the "Committed build artifacts" checklist). **Agent 7 — PR metadata & conventions:** Title format, description quality, commit messages, labels, SQL style in tests. @@ -278,6 +287,26 @@ Review the diff for: - Code smell: overly complex methods, deep nesting, unclear intent, dead code - No third-party Java dependencies on data paths +### Committed build artifacts +- **A newly committed compiled binary is always Critical.** This repo builds its + native/C libraries from source in CI (`rebuild_native_libs.yml`, + `build_native.yaml`, guarded by `check-glibc-floor.sh`) and does not commit + build outputs. A binary added or modified in the diff cannot be reviewed, + audited, or reproduced from source, can smuggle in unaudited or malicious + code, and bloats the repo history irreversibly — so it blocks the merge. +- Detect it structurally, not by extension alone: run `git diff --stat` / + `git diff --numstat` on the PR and flag every added/modified file git reports + as binary (`numstat` shows `-`/`-` for added/deleted lines; `--stat` shows a + `Bin … -> … bytes` marker). Typical offenders: `.so`, `.dylib`, `.dll`, `.a`, + `.o`, `.lib`, `.exe`, `.class`, `.jar`, `.war`, `.wasm`, `.node`, `.bin`. +- The finding stands even when the binary "looks" legitimate (e.g. a rebuilt + `libquestdb.*`): the correct source of these artifacts is the CI native-build + pipeline plus release packaging, never a PR diff. The only acceptable binaries + are genuine test-input fixtures/resources (data a test reads), not build + outputs — and even those must be justified. +- Suggested fix: drop the binary from the PR, confirm a `.gitignore` entry + covers it, and let CI native-build + release packaging produce it. + ### QuestDB coding standards - Class members grouped by kind (static vs instance) and visibility, sorted alphabetically - Boolean names use `is...` / `has...` prefix @@ -288,6 +317,68 @@ Review the diff for: - try-with-resources used where applicable - Native memory freed correctly +### Store-and-forward & pool startup invariants (QWP facade) +Apply this whenever the diff touches the SF sender, the async drainer / send +loop, primary reconnect/failover, `SenderPool` / `QueryClientPool` startup, +`lazy_connect`, or `initial_connect_retry`. A violation here is a **Critical** +finding: the whole point of store-and-forward is that a running producer never +loses data and never hard-fails on a transient outage. + +**Drainer (steady state — once the pool is running).** +- Once the pool is running, an async drainer thread ships buffered SF data to + the server. It MUST NOT propagate server / transport errors back to the + client (`Sender` producer calls, `flush()`, the pooled handle). The ONLY + error a running drainer may surface to the caller is **SF out of space** (the + on-disk / backing buffer is full and can accept no more rows). Flag any other + failure class (connect-refused, DNS, unreachable/black-hole, TLS/cert, auth, + role-reject, upgrade/protocol timeout, reset) that can escape the drainer + onto a producer or borrow call. +- Primary reconnect MUST be fully contained inside the drainer thread and MUST + have **no time limit** — no `reconnect_max_duration_millis`-style budget, no + deadline, no "give up and latch terminal after N ms". A budget that latches + the sender terminal on a long outage is a Critical violation: it drops a + producer that store-and-forward promised to keep alive. Flag any bounded + reconnect loop, `deadlineNanos` / `while (now < deadline)`, or terminal + `SenderError` reachable from the running drainer's reconnect path. +- The drainer must retry with **exponential backoff** and handle every connect + failure class gracefully, without a hard fail — it keeps buffering and keeps + retrying until the wire is back. The per-attempt backoff may be capped (a max + delay between attempts), but the RETRY LOOP ITSELF must be unbounded. Flag a + capped total retry duration or an attempt-count cap on the steady-state + drainer. +- **Sanctioned terminals (orphan-slot drainer only).** The orphan drainer + (`BackgroundDrainer`) MAY quarantine its slot (`.failed` sentinel, + human-in-the-loop) on conditions that are terminal by design: auth failure, + a non-421 upgrade reject, and a genuine cluster-wide durable-ack capability + gap that exhausted its documented settle budget (16 consecutive + capability-gap sweeps, or a wall-clock budget anchored at the FIRST + capability-gap error of the episode — whichever is hit first). These are + NOT violations of the no-budget rule above. The settle budget applies ONLY + to consecutive capability-gap attempts: transient classes (role reject, + transport error) must never increment it or burn its wall clock — a + transient state consuming the terminal budget (shared attempt counter, + entry-anchored deadline) IS a Critical violation of this checklist. + +**Pool startup — two modes; the mode decides who sees connectivity errors.** +- `lazy_connect=true`: `build()` MUST succeed with **no server present**. The + producing `Sender` must work immediately (writes buffer via SF), and once the + server comes up the read side must also connect and read (reads are deferred, + not disabled). Verify `build()` does not fail-fast, the sender does not throw + on the first write while the server is down, and a later `borrowQuery()` + succeeds once the server is up. +- `lazy_connect=false` (default): `build()` / the initial connect MUST expose + connectivity problems to the caller — DNS errors, connect-refused / + unreachable, TLS/cert, authentication/authorization, and connect/upgrade + timeouts must all surface as a thrown exception at startup, not be swallowed. + Verify each of those failure classes reaches the user during initialization. +- **In BOTH modes the boundary is the same:** connectivity errors are only + ever the caller's problem DURING initialization. Once the client has + connected and is past initialization, the running drainer reverts to the + steady-state contract above — it must NEVER expose transport problems, NEVER + impose a reconnect time budget, and NEVER hard-fail on a transient outage. + Anything that undermines the store-and-forward guarantee past init is + Critical. + ### SQL conventions (if tests or SQL involved) - Keywords in UPPERCASE - `expr::TYPE` cast syntax preferred over CAST() @@ -340,7 +431,10 @@ Review the diff for: Present ONLY verified findings (false positives are excluded). Structure as: ### Critical -Issues that must be fixed before merge. Each must include: +Issues that must be fixed before merge. **A newly committed compiled binary or +other build artifact (see the "Committed build artifacts" checklist) is always +Critical, no matter how legitimate it looks — native/C libraries are built from +source in CI, so a binary in the diff is never acceptable.** Each must include: - Exact file path and line numbers (including out-of-diff files) - Whether the finding is **in-diff** or **out-of-diff** - Code path trace showing why the bug is real diff --git a/.github/scripts/check-glibc-floor.sh b/.github/scripts/check-glibc-floor.sh new file mode 100755 index 00000000..77204943 --- /dev/null +++ b/.github/scripts/check-glibc-floor.sh @@ -0,0 +1,80 @@ +#!/usr/bin/env bash +# Assert the glibc runtime floor of a Linux native library. +# +# Usage: check-glibc-floor.sh +# e.g. check-glibc-floor.sh core/.../linux-x86-64/libquestdb.so 2.14 +# check-glibc-floor.sh core/.../linux-aarch64/libquestdb.so 2.17 +# +# The dynamic linker resolves .gnu.version_r at load time, so the HIGHEST +# GLIBC_x.y version node the library imports is its hard load floor: a host +# whose glibc is older than that node fails System.loadLibrary/dlopen with +# `version 'GLIBC_x.y' not found`. This script extracts every versioned import +# and fails if the highest one exceeds the allowed floor. +# +# Why the floors are what they are: +# * linux-x86-64 -> 2.14. The oldest node we intentionally keep is +# memcpy@GLIBC_2.14; clock_gettime is pinned back to GLIBC_2.2.5 by +# src/main/c/share/glibc_compat.h, and stat/fstat resolve to the inline +# __xstat/__fxstat@GLIBC_2.2.5 wrappers when built in a low-glibc container. +# A build on a modern host (glibc >= 2.33) instead emits stat@GLIBC_2.33 / +# fstat@GLIBC_2.33 and trips this guard -- that is exactly the regression it +# exists to catch. +# * linux-aarch64 -> 2.17. glibc gained aarch64 support in 2.17, so 2.17 is +# the lowest floor physically achievable on that architecture. +# +# Portable to bash 3.2 (no mapfile / no negative array indices) so it can be run +# locally on macOS as well as in the glibc build containers. +set -euo pipefail + +lib="${1:?usage: check-glibc-floor.sh }" +floor="${2:?usage: check-glibc-floor.sh }" + +if [ ! -f "$lib" ]; then + echo "::error::check-glibc-floor: library not found: $lib" + exit 1 +fi + +# All distinct versioned GLIBC nodes (e.g. 2.14, 2.2.5), sorted ascending. +# objdump prints them as (GLIBC_x.y) or GLIBC_x.y depending on the toolchain; +# the -o regex captures the token regardless of surrounding parentheses. +# GLIBC_PRIVATE has no digit after the underscore, so it is naturally excluded. +versions="$( + objdump -T "$lib" \ + | grep -oE 'GLIBC_[0-9]+(\.[0-9]+)+' \ + | sed 's/^GLIBC_//' \ + | sort -Vu +)" + +if [ -z "$versions" ]; then + echo "::error::check-glibc-floor: no versioned GLIBC symbols found in $lib (unexpected)." + exit 1 +fi + +highest="$(printf '%s\n' "$versions" | tail -n1)" + +echo "GLIBC version nodes required by $lib:" +printf '%s\n' "$versions" | sed 's/^/ GLIBC_/' +echo "Highest required: GLIBC_${highest} (allowed floor: GLIBC_${floor})" + +# leq A B -> succeeds when version A <= version B: sorting {A, B} with -V puts B +# last, or they are equal. +leq() { + [ "$1" = "$2" ] && return 0 + [ "$(printf '%s\n%s\n' "$1" "$2" | sort -V | tail -n1)" = "$2" ] +} + +if leq "$highest" "$floor"; then + echo "OK: $lib floor is GLIBC_${highest} (<= GLIBC_${floor})." + exit 0 +fi + +echo "::error::GLIBC floor regression in $lib: requires GLIBC_${highest}, above the GLIBC_${floor} floor." +echo "::error::This library will fail to load on hosts with glibc < ${highest}." +echo "Offending nodes above the floor and the symbols that pull them in:" +printf '%s\n' "$versions" | while IFS= read -r v; do + if ! leq "$v" "$floor"; then + echo " GLIBC_${v}:" + objdump -T "$lib" | grep -E "GLIBC_${v//./\\.}([^0-9]|\$)" | awk '{print " " $NF}' | sort -u + fi +done +exit 1 diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 5ccfaa64..f6a0cd74 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -17,10 +17,10 @@ defaults: jobs: # JDK 8 is the source of truth: the client ships as a Java 8 artifact # (io.questdb:questdb-client) and is released from JDK 8, so on JDK 8 it must - # compile, the full test suite must pass against the committed native - # libraries, and the javadoc jar must build (-P javadoc attaches it at the - # package phase). The committed native .so/.dylib/.dll are enough -- the only - # git submodule (zstd) is needed solely for C++ native rebuilds, not here. + # compile, the full test suite must pass, and the javadoc jar must build + # (-P javadoc attaches it at the package phase). The native libraries are no + # longer committed, so this job compiles libquestdb.so from source (hence the + # zstd submodule + cmake/nasm/build-essential toolchain) before the tests run. build-jdk8: name: Build, test & javadoc (JDK 8) runs-on: ubuntu-latest @@ -28,6 +28,9 @@ jobs: steps: - name: Check out uses: actions/checkout@v4 + with: + # zstd is required to compile the native library. + submodules: recursive - name: Set up JDK 8 uses: actions/setup-java@v4 @@ -36,6 +39,26 @@ jobs: java-version: "8" cache: maven + - name: Install native build toolchain + run: sudo apt-get update && sudo apt-get install -y cmake nasm build-essential + + - name: Build native libquestdb.so + # JAVA_HOME points at the JDK 8 above, so the lib is compiled against the + # Java 8 JNI headers -- the artifact's Java floor. Copy it into src + # resources (not target/) so it survives the `mvn clean` in the next step + # and gets packaged + loaded via the production bin/ path. + # NOTE: this builds on ubuntu-latest for FUNCTIONAL testing only; the + # library's glibc runtime floor is validated separately by the + # `glibc-floor` job, which rebuilds in the release low-glibc container. + run: | + cd core + cmake -DCMAKE_BUILD_TYPE=Release -B cmake-build-release -S. + cmake --build cmake-build-release --config Release + test -f target/classes/io/questdb/client/bin-local/libquestdb.so + mkdir -p src/main/resources/io/questdb/client/bin/linux-x86-64 + cp target/classes/io/questdb/client/bin-local/libquestdb.so \ + src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so + - name: Compile, test, and build javadoc run: mvn -B -ntp -P javadoc clean install @@ -61,3 +84,80 @@ jobs: - name: Compile (main + test) and build javadoc (no tests run) run: mvn -B -ntp -P javadoc -DskipTests clean package + + # GLIBC floor guard. The native libraries are built at release time in + # low-glibc manylinux containers (see maven_central_release.yml) and are NOT + # committed, so a floor regression is invisible to the functional test job + # above (it builds on ubuntu-latest, whose glibc is new enough to load almost + # anything). This job rebuilds the linux libraries in the SAME low-glibc + # environment as release and asserts the runtime floor with objdump, so a + # change that raises the floor (e.g. a new stat/fstat call pulling in + # stat@GLIBC_2.33 on a modern build host) fails the PR instead of silently + # shipping a library that cannot load on older distros. + # + # * linux-x86-64 -> GLIBC_2.14 (the intended floor: memcpy@GLIBC_2.14). + # * linux-aarch64 -> GLIBC_2.17 (the lowest floor glibc offers on aarch64). + # + # Uses manylinux_2_28 for both arches (stock Node 24, no glibc-2.17 shadow + # hack). The x86-64 floor is identical in manylinux2014 (2.17) and + # manylinux_2_28 (2.28) -- both resolve stat/fstat to the inline + # __xstat/__fxstat@GLIBC_2.2.5 wrappers -- so this validates the real shipped + # floor without the heavier manylinux2014 release toolchain. + glibc-floor: + name: GLIBC floor guard (${{ matrix.platform }}) + strategy: + fail-fast: false + matrix: + include: + - platform: linux-x86-64 + os: ubuntu-latest + image: quay.io/pypa/manylinux_2_28_x86_64 + jdk_arch: x64 + floor: "2.14" + cmake_args: "" + build_dir: cmake-build-release + - platform: linux-aarch64 + os: ubuntu-22.04-arm + image: quay.io/pypa/manylinux_2_28_aarch64 + jdk_arch: aarch64 + floor: "2.17" + cmake_args: "-DCMAKE_TOOLCHAIN_FILE=./src/main/c/toolchains/linux-arm64.cmake" + build_dir: cmake-build-release-arm64 + runs-on: ${{ matrix.os }} + timeout-minutes: 45 + container: + image: ${{ matrix.image }} + steps: + - name: Check out + uses: actions/checkout@v4 + with: + # zstd is required to compile the native library. + submodules: recursive + + - name: Install tooling + # binutils provides objdump for the floor check; nasm/zstd are build deps. + run: | + yum update -y + yum install -y wget nasm zstd binutils + + - name: Install Temurin JDK 8 (for jni.h) + # Build against the Java 8 JNI headers -- JDK 8 is the artifact's floor. + # The JDK version does not affect the glibc floor; it only supplies jni.h. + run: | + wget -v --timeout=180 -O jdk8.tar.gz \ + "https://api.adoptium.net/v3/binary/latest/8/ga/linux/${{ matrix.jdk_arch }}/jdk/hotspot/normal/eclipse" + mkdir jdk8 + tar xfz jdk8.tar.gz -C jdk8 --strip-components=1 + echo "JAVA_HOME=$(pwd)/jdk8" >> "$GITHUB_ENV" + + - name: Build native libquestdb.so + run: | + cd core + cmake ${{ matrix.cmake_args }} -DCMAKE_BUILD_TYPE=Release -B ${{ matrix.build_dir }} -S. + cmake --build ${{ matrix.build_dir }} --config Release + + - name: Assert GLIBC floor + run: | + ./.github/scripts/check-glibc-floor.sh \ + core/target/classes/io/questdb/client/bin-local/libquestdb.so \ + "${{ matrix.floor }}" diff --git a/.github/workflows/maven_central_release.yml b/.github/workflows/maven_central_release.yml index 56508328..77f52891 100644 --- a/.github/workflows/maven_central_release.yml +++ b/.github/workflows/maven_central_release.yml @@ -295,6 +295,8 @@ jobs: echo "::error::libquestdb.so has unresolved dependencies." exit 1 fi + # Refuse to ship if a symbol raised the glibc floor above 2.14. + ./.github/scripts/check-glibc-floor.sh "$lib" 2.14 cat > LoadCheck.java <<'EOF' public class LoadCheck { public static void main(String[] args) { @@ -360,6 +362,8 @@ jobs: echo "::error::libquestdb.so has unresolved dependencies." exit 1 fi + # 2.17 is the lowest floor glibc offers on aarch64. + ./.github/scripts/check-glibc-floor.sh "$lib" 2.17 cat > LoadCheck.java <<'EOF' public class LoadCheck { public static void main(String[] args) { diff --git a/.github/workflows/rebuild_native_libs.yml b/.github/workflows/rebuild_native_libs.yml index 026d3c3e..6878f16b 100644 --- a/.github/workflows/rebuild_native_libs.yml +++ b/.github/workflows/rebuild_native_libs.yml @@ -68,57 +68,38 @@ jobs: key: nativelibs-osx-${{ github.sha }} build-all-linux-x86-64: runs-on: ubuntu-latest - # manylinux2014 is a container with new-ish compilers and tools, but old glibc - 2.17 - # 2.17 is old enough to be compatible with most Linux distributions out there + # manylinux_2_28 (glibc 2.28) replaces the previous manylinux2014 (glibc + # 2.17) container: GitHub Actions now forces actions (checkout, cache) onto + # Node 24, whose binary requires glibc >= 2.27, so it can no longer run + # inside the glibc-2.17 image (the old Node-20-glibc-217 override hack only + # patched /__e/node20, not /__e/node24). 2.28 still runs stock Node 24 and + # matches the linux-aarch64 job, which already ships glibc-2.28 binaries. + # + # NOTE: the build container's glibc (2.28) does NOT dictate the artifact's + # runtime glibc floor. clock_gettime is pinned back to GLIBC_2.2.5 via + # src/main/c/share/glibc_compat.h so the linux-x86-64 .so keeps loading on + # glibc 2.14+ (its floor is memcpy@GLIBC_2.14), unchanged from before the + # container move. If you add a symbol with a higher version node here, the + # floor will rise -- check with: objdump -T libquestdb.so | grep GLIBC_. container: - image: quay.io/pypa/manylinux2014_x86_64 - volumes: - - /node20217:/node20217 - - /node20217:/__e/node20 + image: quay.io/pypa/manylinux_2_28_x86_64 steps: - - name: Install tools, most are needed to build nasm - run: | - ldd --version - yum update -y - yum install 'perl(Env)' perl-Font-TTF perl-Sort-Versions gcc wget perf asciidoc xmlto ghostscript adobe-source-sans-pro-fonts adobe-source-code-pro-fonts rpm-build zstd curl -y - - name: Build nasm - # we need nasm 2.14+ due to this bug https://bugzilla.nasm.us/show_bug.cgi?id=3392205 - # manylinux2014 distribution includes nasm 2.10 - # the nasm project itself provides RPMs, but they built against a newer glibc and other dependencies too - # thus we take src.rpm from nasm project and rebuild it in the manylinux2014 container - # this way we get a nasm binary that is compatible with the manylinux2014 environment - run: | - wget https://www.nasm.us/pub/nasm/releasebuilds/2.16.03/linux/nasm-2.16.03-0.fc39.src.rpm - rpmbuild --rebuild ./nasm-2.16.03-0.fc39.src.rpm - rpm -i ~/rpmbuild/RPMS/x86_64/nasm-2.16.03-0.el7.x86_64.rpm - - name: Install Node.js 20 glibc2.17 - # A hack to override default nodejs 20 to a build compatible with older glibc. - # Inspired by https://github.com/pytorch/test-infra/pull/5959 If it's good for pytorch, it's good for us too! :) - # Q: Why do we need this hack at all? A: Because many github actions, include action/checkout@v4, depend on nodejs 20. - # GitHub Actions runner provides a build of nodejs 20 that requires a newer glibc than manylinux2014 has. - # Thus we download a build of nodejs 20 that is compatible with manylinux2014 and override the default one. - run: | - curl -LO https://unofficial-builds.nodejs.org/download/release/v20.9.0/node-v20.9.0-linux-x64-glibc-217.tar.xz - tar -xf node-v20.9.0-linux-x64-glibc-217.tar.xz --strip-components 1 -C /node20217 - ldd /__e/node20/bin/node - uses: actions/checkout@v4 with: submodules: true - - name: Install up-to-date CMake + - name: Install tooling run: | - wget -nv https://github.com/Kitware/CMake/releases/download/v3.29.2/cmake-3.29.2-linux-x86_64.tar.gz - tar -zxf cmake-3.29.2-linux-x86_64.tar.gz - echo "PATH=`pwd`/cmake-3.29.2-linux-x86_64/bin/:$PATH" >> "$GITHUB_ENV" + yum update -y + yum install wget nasm zstd -y - name: Install GraalVM JDK 25 (for jni.h) run: | - wget -nv -O graalvm.tar.gz https://download.oracle.com/graalvm/25/latest/graalvm-jdk-25_linux-x64_bin.tar.gz + wget -v --timeout=180 -O graalvm.tar.gz https://download.oracle.com/graalvm/25/latest/graalvm-jdk-25_linux-x64_bin.tar.gz mkdir graalvm tar xfz graalvm.tar.gz -C graalvm --strip-components=1 echo "JAVA_HOME=`pwd`/graalvm" >> "$GITHUB_ENV" - name: Generate Makefiles run: | cd ./core - # git submodule update --init cmake -DCMAKE_BUILD_TYPE=Release -B cmake-build-release -S. - name: Build linux-x86-64 CXX Library run: | @@ -127,6 +108,11 @@ jobs: mkdir -p src/main/resources/io/questdb/client/bin/linux-x86-64/ mkdir -p src/main/bin/linux-x86-64/ cp target/classes/io/questdb/client/bin-local/libquestdb.so src/main/resources/io/questdb/client/bin/linux-x86-64/ + - name: Assert GLIBC floor (2.14) + # Never commit a library whose glibc floor regressed above 2.14. + run: | + bash ./.github/scripts/check-glibc-floor.sh \ + core/src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so 2.14 - name: Save linux-x86-64 Libraries to Cache uses: actions/cache/save@v3 with: @@ -162,6 +148,11 @@ jobs: mkdir -p src/main/resources/io/questdb/client/bin/linux-aarch64/ mkdir -p src/main/bin/linux-aarch64/ cp target/classes/io/questdb/client/bin-local/libquestdb.so src/main/resources/io/questdb/client/bin/linux-aarch64/ + - name: Assert GLIBC floor (2.17) + # 2.17 is the lowest floor glibc offers on aarch64. + run: | + bash ./.github/scripts/check-glibc-floor.sh \ + core/src/main/resources/io/questdb/client/bin/linux-aarch64/libquestdb.so 2.17 - name: Save linux-aarch64 Libraries to Cache uses: actions/cache/save@v3 with: diff --git a/.gitignore b/.gitignore index 9859a7c6..2a7284c6 100644 --- a/.gitignore +++ b/.gitignore @@ -29,4 +29,9 @@ core/CMakeCache.txt **/build **/CMakeFiles .envrc -.vscode \ No newline at end of file +.vscode +# Root-level Maven build output +/target + +# pi subagents runtime artifacts +.pi-subagents/ diff --git a/.pi/skills/review-pr/SKILL.md b/.pi/skills/review-pr/SKILL.md index 7a2767c6..0c210421 100644 --- a/.pi/skills/review-pr/SKILL.md +++ b/.pi/skills/review-pr/SKILL.md @@ -60,9 +60,18 @@ Capture the PR identifier in `$PR` (the part of `$ARGUMENTS` left after strippin PR='' gh pr view "$PR" --json number,title,body,labels,state gh pr diff "$PR" +gh pr diff "$PR" --numstat # binary files show as `--` gh pr view "$PR" --comments ``` +**Committed-binary gate (runs at every level).** Scan the `--numstat` output for +any added/modified file git reports as binary (`-`/`-` in the added/deleted +columns). This repo builds its native/C libraries from source in CI and does not +commit build outputs, so any such file is a **Critical** finding regardless of +review level — report it even at level 0. See the "Committed build artifacts" +checklist for the rationale and the acceptable-exception (genuine test-input +fixtures only). + ## Step 2: PR title and description Check against CLAUDE.md conventions: @@ -155,7 +164,7 @@ Launch the reviewers below with the `subagent` tool in `context: "fresh"` mode, Launch the following reviewers in parallel. -**Reviewer 1 — Correctness & bugs:** NULL handling, edge cases, logic errors, off-by-one, operator precedence, error paths. Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite. +**Reviewer 1 — Correctness & bugs:** NULL handling, edge cases, logic errors, off-by-one, operator precedence, error paths. Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite. When the diff touches the store-and-forward sender, the async drainer / send loop, primary reconnect/failover, or pool startup (`lazy_connect` / `initial_connect_retry` / `SenderPool` / `QueryClientPool`), also verify the "Store-and-forward & pool startup invariants" checklist — a running drainer that propagates a transport error to the caller, imposes a reconnect time budget, or hard-fails on a transient outage is a Critical (data-loss) finding. **Reviewer 2 — Concurrency:** Race conditions, shared mutable state, missing volatile, lock ordering, thread-safety of data structures. Use the implicit contract list (lock order, thread-affinity) and check every callsite from 2.5b for violations of the new contract. @@ -165,7 +174,7 @@ Launch the following reviewers in parallel. **Reviewer 5 — Test coverage:** Coverage gaps, error path tests, NULL tests, boundary conditions, regression tests exist, `assertMemoryLeak()` usage. Cross-reference 2.5d: every cross-context exposure should have a test that exercises the changed symbol from that context. Missing tests for cross-context callsites is a high-priority finding. Test *efficacy* (whether those tests actually exercise the change and could fail) and test-*code* quality are handled by Reviewers 11-13 — here focus only on whether coverage exists for every new or changed path. -**Reviewer 6 — Code quality & standards:** Code smell, member ordering, naming conventions, modern Java features, dead code, third-party dependencies. +**Reviewer 6 — Code quality & standards:** Code smell, member ordering, naming conventions, modern Java features, dead code, third-party dependencies. Also scan the diff for any committed compiled binary / build artifact (run `git diff --numstat`/`--stat` and flag files git reports as binary) — the native/C libraries are built from source in CI, so a committed binary is a **Critical** finding (see the "Committed build artifacts" checklist). **Reviewer 7 — PR metadata & conventions:** Title format, description quality, commit messages, labels, SQL style in tests. @@ -289,6 +298,26 @@ Review the diff for: - Code smell: overly complex methods, deep nesting, unclear intent, dead code - No third-party Java dependencies on data paths +### Committed build artifacts +- **A newly committed compiled binary is always Critical.** This repo builds its + native/C libraries from source in CI (`rebuild_native_libs.yml`, + `build_native.yaml`, guarded by `check-glibc-floor.sh`) and does not commit + build outputs. A binary added or modified in the diff cannot be reviewed, + audited, or reproduced from source, can smuggle in unaudited or malicious + code, and bloats the repo history irreversibly — so it blocks the merge. +- Detect it structurally, not by extension alone: run `git diff --stat` / + `git diff --numstat` on the PR and flag every added/modified file git reports + as binary (`numstat` shows `-`/`-` for added/deleted lines; `--stat` shows a + `Bin … -> … bytes` marker). Typical offenders: `.so`, `.dylib`, `.dll`, `.a`, + `.o`, `.lib`, `.exe`, `.class`, `.jar`, `.war`, `.wasm`, `.node`, `.bin`. +- The finding stands even when the binary "looks" legitimate (e.g. a rebuilt + `libquestdb.*`): the correct source of these artifacts is the CI native-build + pipeline plus release packaging, never a PR diff. The only acceptable binaries + are genuine test-input fixtures/resources (data a test reads), not build + outputs — and even those must be justified. +- Suggested fix: drop the binary from the PR, confirm a `.gitignore` entry + covers it, and let CI native-build + release packaging produce it. + ### QuestDB coding standards - Class members grouped by kind (static vs instance) and visibility, sorted alphabetically - Boolean names use `is...` / `has...` prefix @@ -299,6 +328,68 @@ Review the diff for: - try-with-resources used where applicable - Native memory freed correctly +### Store-and-forward & pool startup invariants (QWP facade) +Apply this whenever the diff touches the SF sender, the async drainer / send +loop, primary reconnect/failover, `SenderPool` / `QueryClientPool` startup, +`lazy_connect`, or `initial_connect_retry`. A violation here is a **Critical** +finding: the whole point of store-and-forward is that a running producer never +loses data and never hard-fails on a transient outage. + +**Drainer (steady state — once the pool is running).** +- Once the pool is running, an async drainer thread ships buffered SF data to + the server. It MUST NOT propagate server / transport errors back to the + client (`Sender` producer calls, `flush()`, the pooled handle). The ONLY + error a running drainer may surface to the caller is **SF out of space** (the + on-disk / backing buffer is full and can accept no more rows). Flag any other + failure class (connect-refused, DNS, unreachable/black-hole, TLS/cert, auth, + role-reject, upgrade/protocol timeout, reset) that can escape the drainer + onto a producer or borrow call. +- Primary reconnect MUST be fully contained inside the drainer thread and MUST + have **no time limit** — no `reconnect_max_duration_millis`-style budget, no + deadline, no "give up and latch terminal after N ms". A budget that latches + the sender terminal on a long outage is a Critical violation: it drops a + producer that store-and-forward promised to keep alive. Flag any bounded + reconnect loop, `deadlineNanos` / `while (now < deadline)`, or terminal + `SenderError` reachable from the running drainer's reconnect path. +- The drainer must retry with **exponential backoff** and handle every connect + failure class gracefully, without a hard fail — it keeps buffering and keeps + retrying until the wire is back. The per-attempt backoff may be capped (a max + delay between attempts), but the RETRY LOOP ITSELF must be unbounded. Flag a + capped total retry duration or an attempt-count cap on the steady-state + drainer. +- **Sanctioned terminals (orphan-slot drainer only).** The orphan drainer + (`BackgroundDrainer`) MAY quarantine its slot (`.failed` sentinel, + human-in-the-loop) on conditions that are terminal by design: auth failure, + a non-421 upgrade reject, and a genuine cluster-wide durable-ack capability + gap that exhausted its documented settle budget (16 consecutive + capability-gap sweeps, or a wall-clock budget anchored at the FIRST + capability-gap error of the episode — whichever is hit first). These are + NOT violations of the no-budget rule above. The settle budget applies ONLY + to consecutive capability-gap attempts: transient classes (role reject, + transport error) must never increment it or burn its wall clock — a + transient state consuming the terminal budget (shared attempt counter, + entry-anchored deadline) IS a Critical violation of this checklist. + +**Pool startup — two modes; the mode decides who sees connectivity errors.** +- `lazy_connect=true`: `build()` MUST succeed with **no server present**. The + producing `Sender` must work immediately (writes buffer via SF), and once the + server comes up the read side must also connect and read (reads are deferred, + not disabled). Verify `build()` does not fail-fast, the sender does not throw + on the first write while the server is down, and a later `borrowQuery()` + succeeds once the server is up. +- `lazy_connect=false` (default): `build()` / the initial connect MUST expose + connectivity problems to the caller — DNS errors, connect-refused / + unreachable, TLS/cert, authentication/authorization, and connect/upgrade + timeouts must all surface as a thrown exception at startup, not be swallowed. + Verify each of those failure classes reaches the user during initialization. +- **In BOTH modes the boundary is the same:** connectivity errors are only + ever the caller's problem DURING initialization. Once the client has + connected and is past initialization, the running drainer reverts to the + steady-state contract above — it must NEVER expose transport problems, NEVER + impose a reconnect time budget, and NEVER hard-fail on a transient outage. + Anything that undermines the store-and-forward guarantee past init is + Critical. + ### SQL conventions (if tests or SQL involved) - Keywords in UPPERCASE - `expr::TYPE` cast syntax preferred over CAST() @@ -351,7 +442,10 @@ Review the diff for: Present ONLY verified findings (false positives are excluded). Structure as: ### Critical -Issues that must be fixed before merge. Each must include: +Issues that must be fixed before merge. **A newly committed compiled binary or +other build artifact (see the "Committed build artifacts" checklist) is always +Critical, no matter how legitimate it looks — native/C libraries are built from +source in CI, so a binary in the diff is never acceptable.** Each must include: - Exact file path and line numbers (including out-of-diff files) - Whether the finding is **in-diff** or **out-of-diff** - Code path trace showing why the bug is real diff --git a/ci/build_native.yaml b/ci/build_native.yaml new file mode 100644 index 00000000..a831e58d --- /dev/null +++ b/ci/build_native.yaml @@ -0,0 +1,92 @@ +# Builds the native libquestdb shared library on the test runner itself. +# +# The Linux (.so) and Windows (.dll) binaries are no longer committed to the +# repository -- they are produced and committed only by the release +# "Build and Push Release CXX Libraries" GitHub Action. So the test CI has to +# compile them locally before running the tests. +# +# All three platforms are built on their own native runner: Linux (.so), +# Windows (.dll) and macOS (.dylib). None of these binaries are committed. +# +# CMake writes the artifact to: +# core/target/classes/io/questdb/client/bin-local/libquestdb. +# which io.questdb.client.std.Os loads first (the "dev CXX lib" path), so the +# client tests pick it up directly. We additionally copy it into +# core/src/main/resources/io/questdb/client/bin//libquestdb. +# so that `mvn install` packages it into the client jar exactly like the +# committed binary used to be -- this is what the downstream QuestDB OSS server +# tests load from the installed jar. +# +# JAVA_HOME (set to GraalVM JDK 25 by setup.yaml) provides jni.h / jni_md.h: +# - Linux: $JAVA_HOME/include + $JAVA_HOME/include/linux +# - macOS: $JAVA_HOME/include + $JAVA_HOME/include/darwin +# - Windows: %JAVA_HOME%\include + %JAVA_HOME%\include\win32 +steps: + - bash: | + set -eux + git submodule update --init --recursive core/src/main/c/share/zstd + displayName: "Init zstd submodule" + + - bash: | + set -eux + sudo apt-get update + sudo apt-get install -y cmake nasm build-essential + cd core + cmake -DCMAKE_BUILD_TYPE=Release -B cmake-build-release -S. + cmake --build cmake-build-release --config Release + lib="target/classes/io/questdb/client/bin-local/libquestdb.so" + test -f "$lib" + # Fail fast if the linker left an unresolved dependency in the .so. + if ldd "$lib" | grep -i "not found"; then + echo "libquestdb.so has unresolved dependencies" + exit 1 + fi + mkdir -p src/main/resources/io/questdb/client/bin/linux-x86-64 + cp "$lib" src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so + displayName: "Build native libquestdb.so (Linux x86-64)" + condition: eq(variables['Agent.OS'], 'Linux') + + - bash: | + set -eux + command -v cmake >/dev/null 2>&1 || brew install cmake + command -v nasm >/dev/null 2>&1 || brew install nasm + # darwin-aarch64 on Apple silicon agents, darwin-x86-64 on Intel agents. + case "$(uname -m)" in + arm64) platform="darwin-aarch64" ;; + x86_64) platform="darwin-x86-64" ;; + *) echo "unsupported macOS arch: $(uname -m)"; exit 1 ;; + esac + cd core + # Pin the dylib's minimum macOS version so the artifact stays loadable on + # older macOS, matching the release build. + export MACOSX_DEPLOYMENT_TARGET=13.0 + cmake -DCMAKE_BUILD_TYPE=Release -B cmake-build-release -S. + cmake --build cmake-build-release --config Release + lib="target/classes/io/questdb/client/bin-local/libquestdb.dylib" + test -f "$lib" + mkdir -p "src/main/resources/io/questdb/client/bin/${platform}" + cp "$lib" "src/main/resources/io/questdb/client/bin/${platform}/libquestdb.dylib" + displayName: "Build native libquestdb.dylib (macOS)" + condition: eq(variables['Agent.OS'], 'Darwin') + + - powershell: | + $ErrorActionPreference = "Stop" + # The CMake build is GCC/MinGW based (gcc flags, -static-libgcc/-static-libstdc++), + # so build the Windows DLL with the MinGW-w64 toolchain + NASM, not MSVC. + choco install -y --no-progress nasm mingw + Import-Module "$env:ChocolateyInstall\helpers\chocolateyProfile.psm1" + refreshenv + # choco's nasm package does not put nasm on PATH; add it explicitly. + $env:PATH = "C:\Program Files\NASM;" + $env:PATH + gcc --version + mingw32-make --version + nasm --version + cd core + cmake -G "MinGW Makefiles" -DCMAKE_BUILD_TYPE=Release -B cmake-build-release -S . + cmake --build cmake-build-release --config Release + $lib = "target/classes/io/questdb/client/bin-local/libquestdb.dll" + if (!(Test-Path $lib)) { throw "native build produced no $lib" } + New-Item -ItemType Directory -Force -Path "src/main/resources/io/questdb/client/bin/windows-x86-64" | Out-Null + Copy-Item $lib "src/main/resources/io/questdb/client/bin/windows-x86-64/libquestdb.dll" -Force + displayName: "Build native libquestdb.dll (Windows x86-64)" + condition: eq(variables['Agent.OS'], 'Windows_NT') diff --git a/ci/run_tests_pipeline.yaml b/ci/run_tests_pipeline.yaml index 3268313b..86d65410 100644 --- a/ci/run_tests_pipeline.yaml +++ b/ci/run_tests_pipeline.yaml @@ -54,10 +54,6 @@ stages: imageName: "macos-15-arm64" poolName: "Azure Pipelines" jdkArch: "arm64" - mac-x64: - imageName: "macos-15" - poolName: "Azure Pipelines" - jdkArch: "x64" windows-msvc-2022-x64: imageName: "windows-2022" poolName: "Azure Pipelines" @@ -82,6 +78,13 @@ stages: maven | "$(Agent.OS)" path: $(HOME)/.m2/repository displayName: "Cache Maven repository" + # Compile the native libquestdb shared library on the runner; no + # platform's binary is committed anymore. Must run before the client + # jar is installed so the freshly built lib is packaged into it. The + # template builds the right artifact for the current native agent -- + # Linux (.so), Windows (.dll), and macOS (.dylib) alike (see + # build_native.yaml). + - template: build_native.yaml - bash: | BRANCH="${SYSTEM_PULLREQUEST_SOURCEBRANCH:-$BUILD_SOURCEBRANCHNAME}" BRANCH="${BRANCH#refs/heads/}" @@ -149,6 +152,9 @@ stages: maven | "$(Agent.OS)" path: $(HOME)/.m2/repository displayName: "Cache Maven repository" + # Native binaries are no longer committed; compile libquestdb.so on the + # runner so the coverage test run can load it (same as BuildAndTest). + - template: build_native.yaml - task: Maven@3 displayName: "Run tests with coverage" inputs: diff --git a/core/CMakeLists.txt b/core/CMakeLists.txt index 3538aa7f..29611089 100644 --- a/core/CMakeLists.txt +++ b/core/CMakeLists.txt @@ -48,6 +48,7 @@ set( src/main/c/share/files.h src/main/c/share/net.h src/main/c/share/os.h + src/main/c/share/glibc_compat.h src/main/c/share/ooo.cpp src/main/c/share/cpprt_overrides.h src/main/c/share/cpprt_overrides.cpp diff --git a/core/src/main/c/share/glibc_compat.h b/core/src/main/c/share/glibc_compat.h new file mode 100644 index 00000000..24ea6211 --- /dev/null +++ b/core/src/main/c/share/glibc_compat.h @@ -0,0 +1,53 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +#ifndef QUESTDB_GLIBC_COMPAT_H +#define QUESTDB_GLIBC_COMPAT_H + +// Pin clock_gettime() to its original GLIBC_2.2.5 symbol version. +// +// glibc 2.17 moved clock_gettime() out of librt and into libc, exporting it +// under a NEW version node: clock_gettime@GLIBC_2.17. The release binaries are +// built in a modern toolchain container (CI uses manylinux_2_28 / glibc 2.28), +// so without this pin the linker binds our calls to clock_gettime@GLIBC_2.17. +// That single symbol raises the whole library's glibc floor to 2.17 and makes +// it fail to LOAD on hosts running glibc 2.14-2.16 with: +// +// version `GLIBC_2.17' not found (required by libquestdb.so) +// +// The original clock_gettime@GLIBC_2.2.5 symbol is still exported as a compat +// symbol by librt.so.1 on every glibc since (and by libc after the 2.34 librt +// merge), so forcing the reference back to it keeps the library loadable down +// to the previous floor (glibc 2.14, set by memcpy@GLIBC_2.14) with no change +// in runtime behaviour. librt is already a NEEDED dependency (CMake links rt). +// +// Scope: x86-64 glibc only. aarch64 glibc started at 2.17 and has only ever +// shipped clock_gettime in libc@GLIBC_2.17 -- there is no GLIBC_2.2.5 version +// there, so emitting the pin on aarch64 would fail the link with an undefined +// clock_gettime@GLIBC_2.2.5. The directive is a no-op on macOS/Windows. +#if defined(__linux__) && defined(__GLIBC__) && defined(__x86_64__) +__asm__(".symver clock_gettime,clock_gettime@GLIBC_2.2.5"); +#endif + +#endif // QUESTDB_GLIBC_COMPAT_H diff --git a/core/src/main/c/share/net.c b/core/src/main/c/share/net.c index 05660f2b..3b0162fc 100644 --- a/core/src/main/c/share/net.c +++ b/core/src/main/c/share/net.c @@ -33,6 +33,9 @@ #include #include #include +#include +#include +#include "glibc_compat.h" #include "net.h" #include #include "sysutil.h" @@ -298,6 +301,100 @@ JNIEXPORT jint JNICALL Java_io_questdb_client_network_Net_connectAddrInfo return handleEintrInConnect(fd, result); } +// Waits up to timeout_millis for an in-progress non-blocking connect on fd to +// finish. Returns 0 on success, -1 on connection failure (errno set so the +// caller can read it via Os.errno()), or com_questdb_network_Net_ECONNTIMEOUT +// on timeout. +static jint awaitConnectComplete(int fd, jint timeout_millis) { + // Fix a single absolute deadline up front. Recomputing the remaining budget + // against a moving baseline on each EINTR (reset start = now, then subtract + // whole milliseconds) lets a high-frequency signal storm extend the timeout: + // under sub-millisecond interrupts every interval truncates to 0 ms, the + // budget never decrements, and poll is re-armed with the full budget each + // time. A fixed deadline is immune to interrupt frequency -- the remaining + // time can only ever decrease. + struct timespec deadline; + clock_gettime(CLOCK_MONOTONIC, &deadline); + long budget_millis = timeout_millis > 0 ? timeout_millis : 0; + deadline.tv_sec += budget_millis / 1000L; + deadline.tv_nsec += (budget_millis % 1000L) * 1000000L; + if (deadline.tv_nsec >= 1000000000L) { + deadline.tv_sec += 1; + deadline.tv_nsec -= 1000000000L; + } + + for (;;) { + struct timespec now; + clock_gettime(CLOCK_MONOTONIC, &now); + // Remaining time until the deadline, truncated to whole milliseconds for + // poll(). Truncation only ever under-shoots by < 1 ms (it never extends + // the wait), which keeps the timeout a strict upper bound. + long remaining_millis = (deadline.tv_sec - now.tv_sec) * 1000L + + (deadline.tv_nsec - now.tv_nsec) / 1000000L; + if (remaining_millis <= 0) { + errno = ETIMEDOUT; + return com_questdb_network_Net_ECONNTIMEOUT; + } + + struct pollfd pfd; + pfd.fd = fd; + pfd.events = POLLOUT; + pfd.revents = 0; + + int rc = poll(&pfd, 1, (int) remaining_millis); + if (rc > 0) { + // The connect attempt has finished one way or another; the only + // authoritative result is SO_ERROR (POLLOUT alone does not mean + // success -- a refused connection is also reported as writable). + int so_error = 0; + socklen_t len = sizeof(so_error); + if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &so_error, &len) < 0) { + return -1; + } + if (so_error != 0) { + errno = so_error; + return -1; + } + return 0; + } + if (rc == 0) { + errno = ETIMEDOUT; + return com_questdb_network_Net_ECONNTIMEOUT; + } + if (errno != EINTR) { + return -1; + } + // Interrupted by a signal: loop and recompute the remaining time against + // the fixed deadline. EINTR storms cannot extend the timeout. + } +} + +JNIEXPORT jint JNICALL Java_io_questdb_client_network_Net_connectAddrInfoTimeout + (JNIEnv *e, jclass cl, jint fd, jlong lpAddrInfo, jint timeoutMillis) { + struct addrinfo *addr = (struct addrinfo *) lpAddrInfo; + + // Switch to non-blocking BEFORE connect so connect() returns immediately + // with EINPROGRESS instead of blocking on the OS connect timeout. The + // socket is left non-blocking on success, matching the post-connect + // configureNonBlocking() the callers already perform. + int flags = fcntl((int) fd, F_GETFL, 0); + if (flags < 0) { + return -1; + } + if (fcntl((int) fd, F_SETFL, flags | O_NONBLOCK) < 0) { + return -1; + } + + int result = connect((int) fd, addr->ai_addr, (int) addr->ai_addrlen); + if (result == 0) { + return 0; // connected immediately (e.g. loopback) + } + if (errno == EINPROGRESS || errno == EINTR || errno == EWOULDBLOCK) { + return awaitConnectComplete((int) fd, timeoutMillis); + } + return -1; // immediate failure, errno set +} + JNIEXPORT void JNICALL Java_io_questdb_client_network_Net_freeAddrInfo0 (JNIEnv *e, jclass cl, jlong address) { if (address != 0) { diff --git a/core/src/main/c/share/net.h b/core/src/main/c/share/net.h index 13adafcb..27143639 100644 --- a/core/src/main/c/share/net.h +++ b/core/src/main/c/share/net.h @@ -13,6 +13,8 @@ extern "C" { #define com_questdb_network_Net_EPEERDISCONNECT -1L #undef com_questdb_network_Net_EOTHERDISCONNECT #define com_questdb_network_Net_EOTHERDISCONNECT -2L +#undef com_questdb_network_Net_ECONNTIMEOUT +#define com_questdb_network_Net_ECONNTIMEOUT -3L /* * Class: io_questdb_client_network_Net diff --git a/core/src/main/c/share/os.c b/core/src/main/c/share/os.c index 7262e3f4..ee0b1f69 100644 --- a/core/src/main/c/share/os.c +++ b/core/src/main/c/share/os.c @@ -30,6 +30,7 @@ #include #include #include +#include "glibc_compat.h" #include "../share/os.h" #ifdef __APPLE__ diff --git a/core/src/main/c/windows/net.c b/core/src/main/c/windows/net.c index c32957d4..fd290629 100644 --- a/core/src/main/c/windows/net.c +++ b/core/src/main/c/windows/net.c @@ -160,6 +160,66 @@ JNIEXPORT jint JNICALL Java_io_questdb_client_network_Net_connectAddrInfo return res; } +JNIEXPORT jint JNICALL Java_io_questdb_client_network_Net_connectAddrInfoTimeout + (JNIEnv *e, jclass cl, jint fd, jlong lpAddrInfo, jint timeoutMillis) { + struct addrinfo *addr = (struct addrinfo *) lpAddrInfo; + SOCKET s = (SOCKET) fd; + + // Switch to non-blocking BEFORE connect so it returns immediately with + // WSAEWOULDBLOCK instead of blocking on the OS connect timeout. + u_long mode = 1; + if (ioctlsocket(s, FIONBIO, &mode) != 0) { + SaveLastError(); + return -1; + } + + int res = connect(s, addr->ai_addr, (int) addr->ai_addrlen); + if (res == 0) { + return 0; // connected immediately (e.g. loopback) + } + if (WSAGetLastError() != WSAEWOULDBLOCK) { + SaveLastError(); + return -1; + } + + fd_set writefds, exceptfds; + FD_ZERO(&writefds); + FD_ZERO(&exceptfds); + FD_SET(s, &writefds); + FD_SET(s, &exceptfds); + + struct timeval tv; + tv.tv_sec = timeoutMillis / 1000; + tv.tv_usec = (timeoutMillis % 1000) * 1000; + + // Winsock signals a failed non-blocking connect via the exception set. + int sel = select(0, NULL, &writefds, &exceptfds, &tv); + if (sel == 0) { + WSASetLastError(WSAETIMEDOUT); + SaveLastError(); + return com_questdb_network_Net_ECONNTIMEOUT; + } + if (sel == SOCKET_ERROR) { + SaveLastError(); + return -1; + } + + int so_error = 0; + int len = sizeof(so_error); + if (FD_ISSET(s, &exceptfds) || !FD_ISSET(s, &writefds)) { + getsockopt(s, SOL_SOCKET, SO_ERROR, (char *) &so_error, &len); + WSASetLastError(so_error != 0 ? so_error : WSAECONNREFUSED); + SaveLastError(); + return -1; + } + if (getsockopt(s, SOL_SOCKET, SO_ERROR, (char *) &so_error, &len) == 0 && so_error != 0) { + WSASetLastError(so_error); + SaveLastError(); + return -1; + } + return 0; +} + JNIEXPORT jint JNICALL Java_io_questdb_client_network_Net_configureNonBlocking (JNIEnv *e, jclass cl, jint fd) { u_long mode = 1; diff --git a/core/src/main/java/io/questdb/client/Completion.java b/core/src/main/java/io/questdb/client/Completion.java index 0888370d..615799e0 100644 --- a/core/src/main/java/io/questdb/client/Completion.java +++ b/core/src/main/java/io/questdb/client/Completion.java @@ -36,15 +36,22 @@ * {@link #await(long, TimeUnit)} returning {@code true}, or an explicit * {@link #cancel()} that races to terminal). *

- * Signaling: the Completion is signaled from the I/O thread of the pooled - * query client when the handler's terminal callback ({@code onEnd}, - * {@code onError}, or {@code onExecDone}) returns. + * Signaling: the Completion is signaled on the worker (dispatch) thread of the + * pooled query client when the handler's terminal callback ({@code onEnd}, + * {@code onError}, or {@code onExecDone}) returns -- that callback runs inline + * on the worker thread, not on the I/O thread. Because of this, {@code await()} + * must never be called from inside a handler (it would self-deadlock on the + * worker thread); use {@link #cancel()} to stop a query from inside a handler. */ public interface Completion { /** * Blocks until the query completes. Rethrows any server-reported failure * as a {@link QueryException}. Returns normally on success. + *

+ * Must NOT be called from a result handler (it runs on the worker thread + * and would self-deadlock); calling it there throws + * {@link IllegalStateException}. Use {@link #cancel()} instead. * * @throws QueryException if the server reported an error or * {@link #cancel()} won the race diff --git a/core/src/main/java/io/questdb/client/HttpClientConfiguration.java b/core/src/main/java/io/questdb/client/HttpClientConfiguration.java index c644f698..587b8111 100644 --- a/core/src/main/java/io/questdb/client/HttpClientConfiguration.java +++ b/core/src/main/java/io/questdb/client/HttpClientConfiguration.java @@ -38,6 +38,15 @@ default boolean fixBrokenConnection() { return true; } + /** + * Upper bound, in milliseconds, on establishing the TCP connection. When + * {@code <= 0} (the default) no application-level connect timeout is applied + * and the connect falls back to the OS-level TCP connect timeout. + */ + default int getConnectTimeout() { + return 0; + } + default EpollFacade getEpollFacade() { return EpollFacadeImpl.INSTANCE; } diff --git a/core/src/main/java/io/questdb/client/Query.java b/core/src/main/java/io/questdb/client/Query.java index f6832e84..c2a752f7 100644 --- a/core/src/main/java/io/questdb/client/Query.java +++ b/core/src/main/java/io/questdb/client/Query.java @@ -27,19 +27,29 @@ import io.questdb.client.cutlass.qwp.client.QwpBindSetter; import io.questdb.client.cutlass.qwp.client.QwpColumnBatchHandler; +import java.io.Closeable; + /** - * Per-thread, reusable builder for one query. Obtained from - * {@link QuestDB#query()}: every call on the same thread returns the same - * instance, reset to empty. + * A query handle leased from the {@link QuestDB} pool via + * {@link QuestDB#borrowQuery()}. The handle holds one pooled query client (one + * WebSocket + I/O thread) for the lifetime of the borrow; the caller MUST + * {@link #close()} it to release the client back to the pool (typically via + * try-with-resources). + *

+ * Allocation: the per-submit path is allocation-free -- the heavy query state + * is pre-allocated on the leased pool slot and reused, and {@link #submit()} + * returns this same handle as its {@link Completion}. {@code borrowQuery()} + * creates one small lease handle per borrow (often scalar-replaced by the JIT + * when used with try-with-resources). *

* Lifecycle: configure with {@link #sql}, optional {@link #binds}, and - * {@link #handler}, then call {@link #submit()} to obtain a {@link Completion}. - * After the Completion terminates, the next {@code QuestDB.query()} call on - * the same thread returns this same instance with its state reset. + * {@link #handler}, then call {@link #submit()} to obtain a {@link Completion} + * and {@code await()} it before the next {@link #submit()}. *

- * Thread safety: not thread-safe. One in-flight query per thread. + * Thread safety: not thread-safe and single-flight -- one in-flight query per + * handle. To run queries concurrently, borrow one handle per concurrent query. */ -public interface Query { +public interface Query extends Closeable { /** Discards the current configuration without submitting. */ void abandon(); @@ -53,9 +63,39 @@ public interface Query { Query binds(QwpBindSetter binds); /** - * Sets the result-batch handler. The handler is invoked on the pooled - * query client's I/O thread; if it touches caller state, it is - * responsible for its own synchronization. + * Releases the leased pooled query client back to the pool. The caller + * MUST call this (typically via try-with-resources). A real disconnect only + * happens at {@link QuestDB#close()}. Idempotent. + *

+ * If a submit is still in flight (the caller never awaited, or its + * {@code await(timeout)} expired), {@code close()} cancels it and waits for + * the terminal event so the client is idle before it returns to the pool. + * That wait is bounded by {@code query_close_timeout_ms} (default 5000ms, + * see {@link QuestDBBuilder#queryCloseTimeoutMillis(long)}) and is + * interruptible -- interrupting the calling thread aborts it. If the query + * does not drain within the budget, the client is discarded rather than + * returned (its connection may carry late frames for the abandoned query), + * and the pool grows a fresh one on the next borrow. {@code close()} + * therefore never blocks the caller unbounded, even when the server is slow + * to honor the cancel. + *

+ * Must NOT be called from a result handler: handlers run on the worker + * thread, so {@code close()} would block waiting for a terminal event that + * only that thread can deliver. Calling it there throws + * {@link IllegalStateException}. Use {@link #cancel()} (non-blocking) to + * stop a query from inside a handler. + */ + @Override + void close(); + + /** + * Sets the result-batch handler. The handler is invoked on the worker + * (dispatch) thread that drives {@code execute()} -- it consumes the pooled + * query client's I/O-thread event queue inline, it does NOT run on the I/O + * thread. If it touches caller state, it is responsible for its own + * synchronization. A handler must not call the blocking {@link #close()} or + * {@link Completion#await()} (they would self-deadlock on the worker + * thread); use {@link #cancel()} to stop from inside a handler. */ Query handler(QwpColumnBatchHandler handler); @@ -65,11 +105,12 @@ public interface Query { Query sql(CharSequence sql); /** - * Submits the query for execution. Returns the {@link Completion} field - * cached on this instance; never allocates. Blocks up to the builder's - * configured acquire timeout if the query pool is exhausted. + * Submits the query for execution on the leased client. Returns this handle + * as its own {@link Completion}; never allocates. The handle is + * single-flight: {@code await()} the returned Completion before the next + * {@code submit()}. * - * @return the single-flight Completion bound to this Query instance + * @return the single-flight Completion bound to this Query handle */ Completion submit(); } diff --git a/core/src/main/java/io/questdb/client/QuestDB.java b/core/src/main/java/io/questdb/client/QuestDB.java index a608e12f..ee93afcf 100644 --- a/core/src/main/java/io/questdb/client/QuestDB.java +++ b/core/src/main/java/io/questdb/client/QuestDB.java @@ -24,8 +24,6 @@ package io.questdb.client; -import io.questdb.client.cutlass.qwp.client.QwpColumnBatchHandler; - import java.io.Closeable; /** @@ -34,37 +32,42 @@ * share across threads. *

* Steady-state allocation is zero: pooled instances are pre-allocated and - * reused, the per-thread {@link Query} handle is cached in a {@code ThreadLocal}, - * and the {@link Completion} associated with each query is a field on that - * cached handle. + * reused, each borrowed {@link Query} handle is a pre-allocated front bound to + * its pool slot, and the {@link Completion} associated with each query is a + * field on that handle. *

- * Configuration: use {@link #connect(CharSequence)} when the same address list - * and credentials serve both ingest and egress -- the most common case. - * Use {@link #connect(CharSequence, CharSequence)} or {@link #builder()} when - * ingest and egress endpoints differ. + * Configuration: one {@code ws}/{@code wss} string describes the whole cluster + * (a single {@code addr} server list) and both the ingest and query pools + * connect across it. Use {@link #connect(CharSequence)} for the common case, or + * {@link #builder()} for pool sizing and the ingest callbacks. To tolerate the + * server being down at startup, set {@code lazy_connect=true} in the config + * (async ingest + lazy reads; reads stay enabled and connect once the server + * is up). *

* Thread safety: instances are safe to share. {@link #borrowSender()} and - * {@link #query()} may be called concurrently from any thread; the pool + * {@link #borrowQuery()} may be called concurrently from any thread; the pool * guarantees mutual exclusion of pooled resources. */ public interface QuestDB extends Closeable { /** * Builder for advanced configuration (pool sizes, acquisition timeouts, - * differing ingest/egress configs). + * ingest callbacks). */ static QuestDBBuilder builder() { return new QuestDBBuilder(); } /** - * Connects with a single configuration string used for both ingest and - * egress. The schema must be {@code ws} or {@code wss}: QuestDB ingests and - * queries over QWP (the QuestDB WebSocket protocol), so one string - * configures both clients. + * Connects with a single configuration string for the whole QuestDB cluster, + * used for both ingest and egress. The schema must be {@code ws} or + * {@code wss}: QuestDB ingests and queries over QWP (the QuestDB WebSocket + * protocol), so one string configures both clients. List every cluster node + * in a single {@code addr} server list and both pools connect across it. *

- * Use {@link #connect(CharSequence, CharSequence)} or {@link #builder()} - * when ingest and egress use different addresses or credentials. + * Use {@link #builder()} for pool sizing and the ingest callbacks. To + * tolerate the server being down at startup, set {@code lazy_connect=true} + * in the config (async ingest + lazy reads, reads still enabled). * * @param configurationString a {@code ws}/{@code wss} config string (see * {@link Sender#fromConfig} or @@ -76,20 +79,29 @@ static QuestDB connect(CharSequence configurationString) { } /** - * Connects with explicit ingest and egress configuration strings. + * Borrows a {@link Query} handle from the pool. The caller MUST call + * {@link Query#close()} on the returned instance to release it back to the + * pool (typically via try-with-resources). The handle leases one pooled + * query client (one WebSocket + I/O thread) for the borrow's lifetime; + * submit one or more queries on it, then close it. + *

+ * Allocation: zero at steady state -- the returned instance is a + * pre-allocated handle bound to the leased pool slot. + *

+ * Blocking: blocks up to the builder's + * {@link QuestDBBuilder#acquireTimeoutMillis(long) acquire timeout} when + * the pool is exhausted; throws on timeout. + *

+ * Concurrency: a single handle is single-flight. To run queries + * concurrently, borrow one handle per concurrent query (up to + * {@code query_pool_max}). * - * @param ingestConfigurationString config for the {@link Sender} pool - * ({@link Sender#fromConfig} format) - * @param queryConfigurationString config for the query pool - * ({@link io.questdb.client.cutlass.qwp.client.QwpQueryClient#fromConfig} format) - * @return a connected QuestDB handle + * @return a Query handle leased from the pool; release with + * {@link Query#close()} + * @throws QueryException if the pool is exhausted beyond the acquire + * timeout, or if this handle is closed */ - static QuestDB connect(CharSequence ingestConfigurationString, CharSequence queryConfigurationString) { - return builder() - .ingestConfig(ingestConfigurationString) - .queryConfig(queryConfigurationString) - .build(); - } + Query borrowQuery(); /** * Borrows a {@link Sender} from the pool. The caller MUST call @@ -125,61 +137,4 @@ static QuestDB connect(CharSequence ingestConfigurationString, CharSequence quer */ @Override void close(); - - /** - * One-shot convenience for queries with no bind parameters. Equivalent to - * {@code query().sql(sql).handler(handler).submit()}. Returns the same - * thread-local {@link Completion} instance that {@link #query()} would, - * so this method is also zero-allocation at steady state. - * - * @param sql the SQL text; the buffer is not retained after submit - * @param handler the result-batch handler; invoked on the pooled query - * client's I/O thread - * @return a single-flight handle for the in-flight query - */ - Completion executeSql(CharSequence sql, QwpColumnBatchHandler handler); - - /** - * Allocates a fresh {@link Query} handle. Unlike {@link #query()}, this - * does NOT return the per-thread cached instance; every call allocates. - *

- * Use this when one thread needs to hold multiple in-flight queries - * concurrently (each {@code submit()} acquires its own worker from the - * query pool, so up to {@code queryPoolSize} concurrent queries on a - * single thread is fine). For the common case of one query at a time, - * prefer {@link #query()} -- it is allocation-free. - */ - Query newQuery(); - - /** - * Opens a query builder for the calling thread. Returns the same - * thread-local instance on every call: callers do not need to cache it - * themselves. The returned {@code Query} is in a reset state and is not - * thread-safe -- one in-flight query per thread. - *

- * For multiple concurrent in-flight queries from a single thread, use - * {@link #newQuery()} instead. - */ - Query query(); - - /** - * Releases the thread-affine {@link Sender} (if any) currently attached - * to the calling thread back to the pool. Call this on threads borrowed - * from pools you do not own (for example, Netty event loops) before they - * are recycled, to avoid pinning a {@link Sender} for the lifetime of - * a thread that no longer needs it. - */ - void releaseSender(); - - /** - * Returns a {@link Sender} pinned to the calling thread. First call on - * a thread takes one from the pool and pins it; subsequent calls on the - * same thread return the same instance. The pin is released by - * {@link #releaseSender()} or by {@link #close()} on this handle. - *

- * Use this for long-lived, dedicated producer threads where borrow/return - * overhead would dominate. For short-lived or event-loop callers, prefer - * {@link #borrowSender()}. - */ - Sender sender(); } diff --git a/core/src/main/java/io/questdb/client/QuestDBBuilder.java b/core/src/main/java/io/questdb/client/QuestDBBuilder.java index cae00942..be18bfbe 100644 --- a/core/src/main/java/io/questdb/client/QuestDBBuilder.java +++ b/core/src/main/java/io/questdb/client/QuestDBBuilder.java @@ -25,6 +25,7 @@ package io.questdb.client; import io.questdb.client.cutlass.qwp.client.QwpQueryClient; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener; import io.questdb.client.impl.ConfigString; import io.questdb.client.impl.ConfigView; import io.questdb.client.impl.QuestDBImpl; @@ -35,14 +36,20 @@ /** * Builder for {@link QuestDB}. Most callers use {@link QuestDB#connect(CharSequence)}; - * this builder is for pool sizing, idle/lifetime knobs, acquire timeout, - * and the case where ingest and egress configs differ. + * this builder adds pool sizing, idle/lifetime knobs, the acquire timeout, and + * the ingest callbacks. *

- * Both configs must use the {@code ws} or {@code wss} schema (QWP over - * WebSocket). A pool key (e.g. {@code sender_pool_min}) may be carried in the - * connect string or set with an explicit builder call; an explicit call always - * wins. When both connect strings carry the same pool key with different values, - * {@link #build()} fails. + * To tolerate the server being down at startup, set {@code lazy_connect=true} + * in the config: the ingest side connects asynchronously (writes buffer until + * the wire is up) and the read pool connects lazily on first use. Reads stay + * fully enabled -- they just connect once the server is available. + *

+ * One configuration string describes the whole QuestDB cluster (see + * {@link #fromConfig}): list every node in a single {@code addr} server list and + * both the ingest and query pools connect across it. The schema must be + * {@code ws} or {@code wss} (QWP over WebSocket). A pool key (e.g. + * {@code sender_pool_min}) may be carried in the connect string or set with an + * explicit builder call; an explicit call always wins. */ public final class QuestDBBuilder { @@ -52,6 +59,7 @@ public final class QuestDBBuilder { static final long DEFAULT_MAX_LIFETIME_MILLIS = 30 * 60_000L; static final int DEFAULT_POOL_MAX = 4; static final int DEFAULT_POOL_MIN = 1; + static final long DEFAULT_QUERY_CLOSE_TIMEOUT_MILLIS = 5_000; // Every valid pool value is >= 0, so -1 unambiguously marks "not set // explicitly". The public pool setters are the only writers of these @@ -59,11 +67,16 @@ public final class QuestDBBuilder { private static final int UNSET = -1; private long acquireTimeoutMillis = UNSET; + // Optional ingest-side async callbacks. Null -> each pooled Sender uses its + // loud-not-silent default. Applied to every Sender the pool builds. + private SenderConnectionListener connectionListener; + private BackgroundDrainerListener drainerListener; + private SenderErrorHandler errorHandler; private long housekeeperIntervalMillis = UNSET; + private String config; private long idleTimeoutMillis = UNSET; - private String ingestConfig; private long maxLifetimeMillis = UNSET; - private String queryConfig; + private long queryCloseTimeoutMillis = UNSET; private int queryPoolMax = UNSET; private int queryPoolMin = UNSET; private int senderPoolMax = UNSET; @@ -85,6 +98,73 @@ public QuestDBBuilder acquireTimeoutMillis(long millis) { return this; } + /** + * Maximum time {@link Query#close()} waits for an in-flight query to drain + * (after issuing a cancel) before discarding the leased query client and + * letting the pool grow a fresh one. Bounds the close of a handle whose + * {@code submit()} is still running -- e.g. when the caller's own + * {@code await(timeout)} expired and they gave up. Defaults to 5000ms. + */ + public QuestDBBuilder queryCloseTimeoutMillis(long millis) { + if (millis < 0) { + throw new IllegalArgumentException("queryCloseTimeoutMillis must be >= 0"); + } + this.queryCloseTimeoutMillis = millis; + return this; + } + + /** + * Sets the async connection-event listener applied to every pooled ingest + * {@link Sender}. The listener observes connect / disconnect / failover + * transitions across the whole sender pool; events are delivered on the + * senders' I/O threads, so the listener must be thread-safe and must not + * block. Pass {@code null} (the default) to keep each sender's + * loud-not-silent default listener. + * + * @param listener the shared connection listener, or {@code null} for the default + * @return this instance for method chaining + */ + public QuestDBBuilder connectionListener(SenderConnectionListener listener) { + this.connectionListener = listener; + return this; + } + + /** + * Sets the background orphan-slot drainer listener applied to every pooled + * ingest {@link Sender}. The listener observes the background drainer + * events of every sender the pool builds: durable-ack capability-gap + * retries, transient all-replica failover windows, and the eventual + * escalation to a {@code .failed} sentinel. Events are delivered on the + * drainers' own threads, so the listener must be thread-safe and must not + * block. Only meaningful when the configuration enables + * {@code drain_orphans}. Pass {@code null} (the default) to keep the + * drainers' default (no listener). + * + * @param listener the shared drainer listener, or {@code null} for the default + * @return this instance for method chaining + */ + public QuestDBBuilder drainerListener(BackgroundDrainerListener listener) { + this.drainerListener = listener; + return this; + } + + /** + * Sets the async error handler applied to every pooled ingest + * {@link Sender}. The handler receives terminal/async ingest errors + * (terminal upgrade failures, write errors) + * from across the whole sender pool; notifications are delivered on the + * senders' I/O threads, so the handler must be thread-safe and must not + * block. Pass {@code null} (the default) to keep each sender's + * loud-not-silent default handler. + * + * @param handler the shared error handler, or {@code null} for the default + * @return this instance for method chaining + */ + public QuestDBBuilder errorHandler(SenderErrorHandler handler) { + this.errorHandler = handler; + return this; + } + /** * Builds the {@link QuestDB} handle. Validates both connect strings up * front -- so a malformed config fails here even when both pools have @@ -101,39 +181,45 @@ public QuestDBBuilder acquireTimeoutMillis(long millis) { * and is delivered once the server acks; until then it stays preserved. */ public QuestDB build() { - if (ingestConfig == null) { - throw new IllegalStateException("ingest configuration is required; call fromConfig() or ingestConfig()"); + if (config == null) { + throw new IllegalStateException("configuration is required; call fromConfig()"); } - if (queryConfig == null) { - throw new IllegalStateException("query configuration is required; call fromConfig() or queryConfig()"); + ConfigString cs = ConfigString.parse(config); + ConfigView view = new ConfigView(cs); + // Validate the single cluster config exactly as both pools will, but + // without connecting: the full Sender parse plus validateParameters + // (ingress value keys are registry-STRING, so only the real parse + // validates their values), then the typed egress validateConfig. Each + // side applies the keys it owns and silently ignores the rest, so one + // string drives both. A malformed config therefore fails here even when + // a pool min is 0 and nothing connects. + Sender.LineSenderBuilder.validateWsConfigString(config); + QwpQueryClient.validateConfig(view, "wss".equals(cs.schema())); + + // lazy_connect: tolerate a down server at startup without disabling + // reads. The ingest side connects asynchronously (writes buffer until the + // wire is up) and the read pool defaults to min=0 -- it connects lazily + // on the first query once the server is up. Reads stay enabled. + boolean lazyConnect = view.getBool("lazy_connect", false); + String ingestConfig = config; + if (lazyConnect) { + ingestConfig = resolveLazyConnect(view); } - ConfigString ingestCs = ConfigString.parse(ingestConfig); - ConfigString queryCs = ConfigString.parse(queryConfig); - ConfigView ingestView = new ConfigView(ingestCs); - ConfigView queryView = new ConfigView(queryCs); - // Validate both connect strings exactly as the pools will, but without - // connecting. The ingest string runs the full Sender parse plus - // validateParameters -- ingress value keys are registry-STRING, so only - // the real parse validates their values. The egress string runs the - // typed validateConfig. A malformed config therefore fails here even - // when a pool min is 0 and nothing connects. - Sender.LineSenderBuilder.validateWsConfigString(ingestConfig); - QwpQueryClient.validateConfig(queryView, "wss".equals(queryCs.schema())); - - // A view carries no side; getInt/getLong read any key, so the ingest - // and query views also serve the POOL reads. - resolvePoolInt(senderPoolMin, "sender_pool_min", ingestView, queryView, DEFAULT_POOL_MIN, this::senderPoolMin); - resolvePoolInt(senderPoolMax, "sender_pool_max", ingestView, queryView, DEFAULT_POOL_MAX, this::senderPoolMax); - resolvePoolInt(queryPoolMin, "query_pool_min", ingestView, queryView, DEFAULT_POOL_MIN, this::queryPoolMin); - resolvePoolInt(queryPoolMax, "query_pool_max", ingestView, queryView, DEFAULT_POOL_MAX, this::queryPoolMax); - resolvePoolLong(acquireTimeoutMillis, "acquire_timeout_ms", ingestView, queryView, DEFAULT_ACQUIRE_TIMEOUT_MILLIS, this::acquireTimeoutMillis); - resolvePoolLong(idleTimeoutMillis, "idle_timeout_ms", ingestView, queryView, DEFAULT_IDLE_TIMEOUT_MILLIS, this::idleTimeoutMillis); - resolvePoolLong(maxLifetimeMillis, "max_lifetime_ms", ingestView, queryView, DEFAULT_MAX_LIFETIME_MILLIS, this::maxLifetimeMillis); - resolvePoolLong(housekeeperIntervalMillis, "housekeeper_interval_ms", ingestView, queryView, DEFAULT_HOUSEKEEPER_INTERVAL_MILLIS, this::housekeeperIntervalMillis); + + resolvePoolInt(senderPoolMin, "sender_pool_min", view, DEFAULT_POOL_MIN, this::senderPoolMin); + resolvePoolInt(senderPoolMax, "sender_pool_max", view, DEFAULT_POOL_MAX, this::senderPoolMax); + // lazy_connect makes the read pool lazy (min=0); without it the default min is 1. + resolvePoolInt(queryPoolMin, "query_pool_min", view, lazyConnect ? 0 : DEFAULT_POOL_MIN, this::queryPoolMin); + resolvePoolInt(queryPoolMax, "query_pool_max", view, DEFAULT_POOL_MAX, this::queryPoolMax); + resolvePoolLong(acquireTimeoutMillis, "acquire_timeout_ms", view, DEFAULT_ACQUIRE_TIMEOUT_MILLIS, this::acquireTimeoutMillis); + resolvePoolLong(queryCloseTimeoutMillis, "query_close_timeout_ms", view, DEFAULT_QUERY_CLOSE_TIMEOUT_MILLIS, this::queryCloseTimeoutMillis); + resolvePoolLong(idleTimeoutMillis, "idle_timeout_ms", view, DEFAULT_IDLE_TIMEOUT_MILLIS, this::idleTimeoutMillis); + resolvePoolLong(maxLifetimeMillis, "max_lifetime_ms", view, DEFAULT_MAX_LIFETIME_MILLIS, this::maxLifetimeMillis); + resolvePoolLong(housekeeperIntervalMillis, "housekeeper_interval_ms", view, DEFAULT_HOUSEKEEPER_INTERVAL_MILLIS, this::housekeeperIntervalMillis); return new QuestDBImpl( ingestConfig, - queryConfig, + config, senderPoolMin, senderPoolMax, queryPoolMin, @@ -141,19 +227,63 @@ public QuestDB build() { acquireTimeoutMillis, idleTimeoutMillis, maxLifetimeMillis, - housekeeperIntervalMillis + housekeeperIntervalMillis, + queryCloseTimeoutMillis, + errorHandler, + connectionListener, + drainerListener ); } + // Validates the lazy_connect contract and returns the ingest config to use: + // the original string with a non-blocking async initial connect injected + // when the user did not set one. lazy_connect requires BOTH sides to start + // non-blocking, so an explicit knob that forces a blocking / fail-fast + // startup is a configuration conflict and is rejected with a clear remedy. + private String resolveLazyConnect(ConfigView view) { + // (1) ingest side: only initial_connect_retry=async is non-blocking; + // off/false/on/true/sync all block or fail-fast at startup. + String mode = view.getStr("initial_connect_retry"); + if (mode != null && !"async".equalsIgnoreCase(mode)) { + throw new IllegalArgumentException( + "conflicting configuration: lazy_connect=true needs a non-blocking startup, but " + + "initial_connect_retry=" + mode + " makes the initial connect block / fail-fast. " + + "Resolve by removing initial_connect_retry (lazy_connect implies " + + "initial_connect_retry=async) or setting initial_connect_retry=async."); + } + // (2) read side: lazy_connect requires query_pool_min=0 so the read pool + // does not eagerly fail-fast at startup. An explicit query_pool_min > 0 + // (builder call or connect string) contradicts that. + int explicitQueryMin; + if (queryPoolMin != UNSET) { + explicitQueryMin = queryPoolMin; // explicit builder call + } else if (view.has("query_pool_min")) { + explicitQueryMin = view.getInt("query_pool_min", UNSET); // connect string + } else { + explicitQueryMin = 0; // unset -> lazy default of 0 + } + if (explicitQueryMin > 0) { + throw new IllegalArgumentException( + "conflicting configuration: lazy_connect=true needs query_pool_min=0 (the read pool " + + "connects lazily on first use and must not fail-fast at startup), but query_pool_min=" + + explicitQueryMin + " was set. Resolve by removing query_pool_min (lazy_connect " + + "defaults it to 0) or setting query_pool_min=0."); + } + // No explicit initial_connect_retry -> inject async so the ingest build + // is non-blocking. An explicit async needs no injection. + return mode == null ? withDefaultAsyncConnect(config) : config; + } + /** - * Sets a single configuration string used for both ingest and egress. The - * schema must be {@code ws} or {@code wss}. + * Sets the single configuration string for the whole QuestDB cluster -- + * used for both ingest and egress. List every cluster node in one + * {@code addr} (comma-separated, or by repeating the key); the ingest and + * query pools each connect across that one server list. The schema must be + * {@code ws} or {@code wss}. */ public QuestDBBuilder fromConfig(CharSequence configurationString) { - requireWebSocketSchema(configurationString, "connection"); - String s = configurationString.toString(); - this.ingestConfig = s; - this.queryConfig = s; + requireWebSocketSchema(configurationString, "cluster"); + this.config = configurationString.toString(); return this; } @@ -183,16 +313,6 @@ public QuestDBBuilder idleTimeoutMillis(long millis) { return this; } - /** - * Sets the ingest-side configuration. The schema must be {@code ws} or - * {@code wss}. - */ - public QuestDBBuilder ingestConfig(CharSequence configurationString) { - requireWebSocketSchema(configurationString, "ingest"); - this.ingestConfig = configurationString.toString(); - return this; - } - /** * Maximum age of a pooled connection before the housekeeper recycles it * (next time it is idle). Useful for picking up DNS / load-balancer @@ -206,16 +326,6 @@ public QuestDBBuilder maxLifetimeMillis(long millis) { return this; } - /** - * Sets the query-side configuration. The schema must be {@code ws} or - * {@code wss}. - */ - public QuestDBBuilder queryConfig(CharSequence configurationString) { - requireWebSocketSchema(configurationString, "query"); - this.queryConfig = configurationString.toString(); - return this; - } - /** * Maximum query-pool size. Defaults to 4. */ @@ -303,12 +413,24 @@ public java.util.Map poolConfigSnapshotForTest() { m.put("query_pool_min", queryPoolMin); m.put("query_pool_max", queryPoolMax); m.put("acquire_timeout_ms", acquireTimeoutMillis); + m.put("query_close_timeout_ms", queryCloseTimeoutMillis); m.put("idle_timeout_ms", idleTimeoutMillis); m.put("max_lifetime_ms", maxLifetimeMillis); m.put("housekeeper_interval_ms", housekeeperIntervalMillis); return m; } + // Inject a non-blocking async initial connect right after the schema + // separator so lazy_connect's build never blocks or fail-fast on a down + // server. Only used when the user set no initial_connect_retry of their own + // (resolveLazyConnect rejects an explicit blocking mode rather than silently + // overriding it), so placement is immaterial -- there is no competing value. + private static String withDefaultAsyncConnect(String config) { + int sep = config.indexOf("::"); + // sep >= 0: fromConfig() validated a ws/wss schema, so "::" is present. + return config.substring(0, sep + 2) + "initial_connect_retry=async;" + config.substring(sep + 2); + } + private static void requireWebSocketSchema(CharSequence config, String role) { String schema = ConfigString.parse(config).schema(); if (!"ws".equals(schema) && !"wss".equals(schema)) { @@ -317,53 +439,17 @@ private static void requireWebSocketSchema(CharSequence config, String role) { } } - private void resolvePoolInt(int current, String key, ConfigView ingest, ConfigView query, int dflt, IntConsumer setter) { + private void resolvePoolInt(int current, String key, ConfigView view, int dflt, IntConsumer setter) { if (current != UNSET) { - return; // explicit builder call wins; skip the conflict check - } - boolean inIngest = ingest.has(key); - boolean inQuery = query.has(key); - int value; - if (inIngest && inQuery) { - int vi = ingest.getInt(key, UNSET); - int vq = query.getInt(key, UNSET); - if (vi != vq) { - throw new IllegalArgumentException( - "conflicting pool config: " + key + " (ingest=" + vi + ", query=" + vq + ")"); - } - value = vi; - } else if (inIngest) { - value = ingest.getInt(key, UNSET); - } else if (inQuery) { - value = query.getInt(key, UNSET); - } else { - value = dflt; + return; // explicit builder call wins } - setter.accept(value); + setter.accept(view.has(key) ? view.getInt(key, UNSET) : dflt); } - private void resolvePoolLong(long current, String key, ConfigView ingest, ConfigView query, long dflt, LongConsumer setter) { + private void resolvePoolLong(long current, String key, ConfigView view, long dflt, LongConsumer setter) { if (current != UNSET) { - return; // explicit builder call wins; skip the conflict check - } - boolean inIngest = ingest.has(key); - boolean inQuery = query.has(key); - long value; - if (inIngest && inQuery) { - long vi = ingest.getLong(key, UNSET); - long vq = query.getLong(key, UNSET); - if (vi != vq) { - throw new IllegalArgumentException( - "conflicting pool config: " + key + " (ingest=" + vi + ", query=" + vq + ")"); - } - value = vi; - } else if (inIngest) { - value = ingest.getLong(key, UNSET); - } else if (inQuery) { - value = query.getLong(key, UNSET); - } else { - value = dflt; + return; // explicit builder call wins } - setter.accept(value); + setter.accept(view.has(key) ? view.getLong(key, UNSET) : dflt); } } diff --git a/core/src/main/java/io/questdb/client/Sender.java b/core/src/main/java/io/questdb/client/Sender.java index 604f45d5..4a1419a7 100644 --- a/core/src/main/java/io/questdb/client/Sender.java +++ b/core/src/main/java/io/questdb/client/Sender.java @@ -791,11 +791,12 @@ default Sender uuidColumn(CharSequence name, long lo, long hi) { * unconnected sender; the I/O thread runs the same retry loop in * the background. The user thread can call {@code at()} / * {@code flush()} immediately; rows accumulate in the cursor SF - * engine until the wire is up. A connect-budget exhaustion or a - * terminal upgrade failure is delivered to the async error inbox - * as a {@link io.questdb.client.SenderError} (no synchronous - * throw on the user call site). Wire {@code error_handler=...} - * to observe these. + * engine until the wire is up. Connect failures are retried + * indefinitely in the background; a terminal upgrade failure + * (auth reject, capability mismatch) is delivered to the async + * error inbox as a {@link io.questdb.client.SenderError} (no + * synchronous throw on the user call site). Wire + * {@code error_handler=...} to observe these. * *

* Default resolution when the caller does not pick a value: @@ -1011,6 +1012,9 @@ final class LineSenderBuilder { private int autoFlushRows = PARAMETER_NOT_SET_EXPLICITLY; private int bufferCapacity = PARAMETER_NOT_SET_EXPLICITLY; private long closeFlushTimeoutMillis = CLOSE_FLUSH_TIMEOUT_NOT_SET; + // Upper bound (ms) on the TCP connect. PARAMETER_NOT_SET_EXPLICITLY -> + // 0 (no application-level connect timeout; OS connect timeout applies). + private int connectTimeoutMillis = PARAMETER_NOT_SET_EXPLICITLY; // Optional user-supplied async connection-event listener. When null, // the sender uses DefaultSenderConnectionListener.INSTANCE // (loud-not-silent log of every transition). @@ -1018,6 +1022,11 @@ final class LineSenderBuilder { // Bounded inbox capacity for the async connection-event dispatcher. // PARAMETER_NOT_SET_EXPLICITLY → spec default (64). private int connectionListenerInboxCapacity = PARAMETER_NOT_SET_EXPLICITLY; + // Optional user-supplied observer for background orphan-slot drainer + // events (durable-ack capability-gap retries, all-replica failover + // windows, persistent-failure escalation). When null, drainers run + // without a listener. Only meaningful with drainOrphans=true. + private io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener drainerListener; // Orphan adoption: when true, the foreground sender scans // /*/ at startup for sibling slots that hold unacked data // and reports them. Default false. Spec calls for spawning @@ -1078,6 +1087,11 @@ public String getSettingsPath() { public int getTimeout() { return httpTimeout == PARAMETER_NOT_SET_EXPLICITLY ? DEFAULT_HTTP_TIMEOUT : httpTimeout; } + + @Override + public int getConnectTimeout() { + return connectTimeoutMillis == PARAMETER_NOT_SET_EXPLICITLY ? 0 : connectTimeoutMillis; + } }; private long minRequestThroughput = PARAMETER_NOT_SET_EXPLICITLY; private int multicastTtl = PARAMETER_NOT_SET_EXPLICITLY; @@ -1199,6 +1213,28 @@ public AdvancedTlsSettings advancedTls() { return new AdvancedTlsSettings(); } + /** + * Upper bound, in milliseconds, on establishing the TCP connection to a + * QuestDB endpoint. When set, a connect that does not complete within + * this budget is aborted (instead of riding the much longer OS-level + * connect timeout). Applies to both HTTP/WebSocket transports. Default + * is unset (0), which falls back to the OS connect timeout. + * + * @param millis connect timeout in milliseconds; must be > 0 + * @return this instance for method chaining + */ + public LineSenderBuilder connectTimeoutMillis(int millis) { + if (this.connectTimeoutMillis != PARAMETER_NOT_SET_EXPLICITLY) { + throw new LineSenderException("connect timeout was already configured ") + .put("[connect_timeout=").put(this.connectTimeoutMillis).put("]"); + } + if (millis <= 0) { + throw new LineSenderException("connect_timeout must be > 0: ").put(millis); + } + this.connectTimeoutMillis = millis; + return this; + } + /** * Per-endpoint timeout on the WebSocket upgrade response read. Default * {@value QwpWebSocketSender#DEFAULT_AUTH_TIMEOUT_MS} ms. @@ -1531,6 +1567,7 @@ public Sender build() { actualErrorInboxCapacity, actualDurableAckKeepaliveIntervalMillis, authTimeoutMillis, + connectTimeoutMillis == PARAMETER_NOT_SET_EXPLICITLY ? 0 : connectTimeoutMillis, connectionListener, actualConnectionListenerInboxCapacity ); @@ -1553,6 +1590,12 @@ public Sender build() { // WebSocketClient inside the abandoned `connected`. connected.setTransactional(transactional); try { + // Install the drainer listener BEFORE startOrphanDrainers + // below: drainers must see the listener at submit time so + // no early drainer event is lost to a late installation. + if (drainerListener != null) { + connected.setDrainerListener(drainerListener); + } // Once the foreground sender is up, dispatch drainers // for any sibling orphan slots. Scan AFTER we acquire // our own slot lock so we never accidentally try to @@ -1755,6 +1798,31 @@ public LineSenderBuilder disableAutoFlush() { return this; } + /** + * Sets the async listener observing background orphan-slot drainer + * events: per-attempt durable-ack capability-gap retries + * ({@code onDurableAckUnavailable}), transient all-replica failover + * windows ({@code onPrimaryUnavailable}), and the eventual escalation + * to a {@code .failed} sentinel + * ({@code onDurableAckPersistentFailure}). The listener runs on the + * drainers' own threads, so it must be thread-safe and must not block + * — hand off to a queue or metrics sink and return. Only meaningful + * when {@link #drainOrphans(boolean)} is enabled. + * + *

WebSocket transport only; setting on other transports throws. + * + * @param listener the listener; {@code null} keeps the default (no listener) + * @return this instance for method chaining + */ + public LineSenderBuilder drainerListener( + io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener listener) { + if (protocol != PARAMETER_NOT_SET_EXPLICITLY && protocol != PROTOCOL_WEBSOCKET) { + throw new LineSenderException("drainer_listener is only supported for WebSocket transport"); + } + this.drainerListener = listener; + return this; + } + /** * Opt in to adopting sibling slots under {@code /*} at * startup that hold unacked data left behind by a crashed sender or @@ -1772,6 +1840,16 @@ public LineSenderBuilder disableAutoFlush() { * Slots flagged with the {@code .failed} sentinel are skipped * (manual reset required), and the foreground sender's own slot is * never adopted. + *

+ * Close-latency note: {@code close()} stops adopted drainers. A + * drainer still connecting (e.g. during an outage) is stop-signaled + * immediately and exits within ~50ms; a drainer actively replaying + * frames is given a ~2.5s grace window to finish, plus a 0.5s stop + * window — so {@code close()} may take up to ~3s while orphan + * drainers are in flight (and a drainer parked in a blocking native + * connect is abandoned to exit on its own daemon thread). + * Un-drained slots stay on disk and are re-adopted by the next + * sender that enables {@code drain_orphans}. */ public LineSenderBuilder drainOrphans(boolean enabled) { if (protocol != PARAMETER_NOT_SET_EXPLICITLY && protocol != PROTOCOL_WEBSOCKET) { @@ -2357,15 +2435,16 @@ public LineSenderBuilder reconnectMaxBackoffMillis(long millis) { } /** - * Per-outage cap on the cursor I/O loop's reconnect retry budget. - * Once a wire failure occurs, the loop retries with exponential - * backoff until either reconnect succeeds (timer resets) or this - * many millis elapse since the first failure of this outage — - * whichever comes first. On budget exhaustion, the next user - * thread API call throws. + * Cap on the blocking initial-connect retry budget when + * {@code initial_connect_retry=sync}. {@code fromConfig} retries + * with exponential backoff until connect succeeds or this many + * millis elapse, then throws. The background reconnect loop + * (mid-stream outages and async initial connect) does NOT consult + * this value: it retries indefinitely and halts only on a terminal + * auth/upgrade error or {@code close()}. *

- * Default {@code 300_000} (5 minutes). Lower for fail-fast services; - * higher for tolerating long maintenance windows. WebSocket only. + * Default {@code 300_000} (5 minutes). Lower for fail-fast startup; + * higher for tolerating a slow server boot. WebSocket only. */ public LineSenderBuilder reconnectMaxDurationMillis(long millis) { if (protocol != PARAMETER_NOT_SET_EXPLICITLY && protocol != PROTOCOL_WEBSOCKET) { @@ -3166,6 +3245,9 @@ private LineSenderBuilder fromConfig(CharSequence configurationString) { pos = getValue(configurationString, pos, sink, "request_timeout"); int requestTimeout = parseIntValue(sink, "request_timeout"); httpTimeoutMillis(requestTimeout); + } else if (Chars.equals("connect_timeout", sink)) { + pos = getValue(configurationString, pos, sink, "connect_timeout"); + connectTimeoutMillis(parseIntValue(sink, "connect_timeout")); } else if (Chars.equals("request_min_throughput", sink)) { pos = getValue(configurationString, pos, sink, "request_min_throughput"); int requestMinThroughput = parseIntValue(sink, "request_min_throughput"); @@ -3446,6 +3528,9 @@ private LineSenderBuilder fromConfigWebSocket(CharSequence configurationString) if (view.has("auth_timeout_ms")) { authTimeoutMillis(view.getLong("auth_timeout_ms", 0)); } + if (view.has("connect_timeout")) { + connectTimeoutMillis((int) view.getLong("connect_timeout", 0)); + } s = view.getStr("auto_flush_rows"); if (s != null) { @@ -3701,6 +3786,7 @@ public java.util.Map wsConfigSnapshotForTest() { m.put("connection_listener_inbox_capacity", connectionListenerInboxCapacity); m.put("token", httpToken); m.put("auth_timeout_ms", authTimeoutMillis); + m.put("connect_timeout", connectTimeoutMillis == PARAMETER_NOT_SET_EXPLICITLY ? 0 : connectTimeoutMillis); m.put("username", username); m.put("password", password); m.put("tls_verify", tlsValidationMode == null ? null : tlsValidationMode.name()); diff --git a/core/src/main/java/io/questdb/client/SenderConnectionEvent.java b/core/src/main/java/io/questdb/client/SenderConnectionEvent.java index 7d0c2c61..fd450ba9 100644 --- a/core/src/main/java/io/questdb/client/SenderConnectionEvent.java +++ b/core/src/main/java/io/questdb/client/SenderConnectionEvent.java @@ -96,8 +96,8 @@ public long getAttemptNumber() { /** * The classified cause of the event, or {@code null} for success/info * events ({@link Kind#CONNECTED}, {@link Kind#FAILED_OVER}, - * {@link Kind#RECONNECTED}). For terminal kinds - * ({@link Kind#AUTH_FAILED}, {@link Kind#RECONNECT_BUDGET_EXHAUSTED}) this + * {@link Kind#RECONNECTED}). For the terminal kind + * ({@link Kind#AUTH_FAILED}) this * carries the typed exception that caused the sender to halt. */ @Nullable @@ -223,8 +223,10 @@ public enum Kind { /** * Every endpoint in the configured address list was attempted and none * accepted the connection in this sweep. The client will back off and - * retry the sweep until the reconnect budget is exhausted. Fired once - * per failed sweep. + * retry the sweep — bounded by {@code reconnect_max_duration_millis} + * during a blocking (sync) initial connect, indefinitely otherwise + * (Invariant B: the background loop never gives up on a wall-clock + * budget). Fired once per failed sweep. */ ALL_ENDPOINTS_UNREACHABLE, @@ -234,14 +236,6 @@ public enum Kind { * producer-thread API call surfaces a {@code LineSenderException}. * {@link #getCause()} carries the {@code QwpAuthFailedException}. */ - AUTH_FAILED, - - /** - * Terminal: the configured reconnect time budget was exhausted without - * a successful reconnect. The sender will halt; the next producer-thread - * API call surfaces a {@code LineSenderException}. {@link #getCause()} - * carries the last observed reconnect error. - */ - RECONNECT_BUDGET_EXHAUSTED + AUTH_FAILED } } diff --git a/core/src/main/java/io/questdb/client/SenderConnectionListener.java b/core/src/main/java/io/questdb/client/SenderConnectionListener.java index 2620ca6c..4595fbbd 100644 --- a/core/src/main/java/io/questdb/client/SenderConnectionListener.java +++ b/core/src/main/java/io/questdb/client/SenderConnectionListener.java @@ -51,8 +51,8 @@ * {@link SenderConnectionEvent.Kind#RECONNECTED}) are guaranteed to fire on * each transition. Failure events ({@code ENDPOINT_ATTEMPT_FAILED}, * {@code ALL_ENDPOINTS_UNREACHABLE}) may be coalesced under inbox pressure. - * Terminal events ({@code AUTH_FAILED}, {@code RECONNECT_BUDGET_EXHAUSTED}) - * fire before the producer-thread {@code LineSenderException} is observable on + * The terminal event {@code AUTH_FAILED} + * fires before the producer-thread {@code LineSenderException} is observable on * the next API call -- so a listener can react sooner than the producer learns * via exception, but should not assume the listener fires first under heavy * notification load. diff --git a/core/src/main/java/io/questdb/client/cutlass/http/client/HttpClient.java b/core/src/main/java/io/questdb/client/cutlass/http/client/HttpClient.java index 94562663..0175ad6c 100644 --- a/core/src/main/java/io/questdb/client/cutlass/http/client/HttpClient.java +++ b/core/src/main/java/io/questdb/client/cutlass/http/client/HttpClient.java @@ -66,6 +66,7 @@ public abstract class HttpClient implements QuietCloseable { protected final NetworkFacade nf; protected final Socket socket; private final ObjectPool csPool = new ObjectPool<>(DirectUtf8String.FACTORY, 64); + private final int connectTimeout; private final int defaultTimeout; private final boolean fixBrokenConnection; private final int maxBufferSize; @@ -84,6 +85,7 @@ public HttpClient(HttpClientConfiguration configuration, SocketFactory socketFac this.nf = configuration.getNetworkFacade(); this.socket = socketFactory.newInstance(nf, LOG); this.defaultTimeout = configuration.getTimeout(); + this.connectTimeout = configuration.getConnectTimeout(); this.bufferSize = configuration.getInitialRequestBufferSize(); this.maxBufferSize = configuration.getMaximumRequestBufferSize(); this.responseParserBufSize = configuration.getResponseBufferSize(); @@ -617,10 +619,16 @@ private void connect(CharSequence host, int port) { throw new HttpClientException("could not resolve host ").put("[host=").put(host).put("]"); } - if (nf.connectAddrInfo(fd, addrInfo) != 0) { + final int connectResult = connectTimeout > 0 + ? nf.connectAddrInfoTimeout(fd, addrInfo, connectTimeout) + : nf.connectAddrInfo(fd, addrInfo); + if (connectResult != 0) { int errno = nf.errno(); nf.freeAddrInfo(addrInfo); disconnect(); + if (connectResult == NetworkFacade.CONNECT_TIMEOUT) { + throw new HttpClientException("connect timed out ").put("[host=").put(host).put(", port=").put(port).put(", timeout=").put(connectTimeout).put(']').flagAsTimeout(); + } throw new HttpClientException("could not connect to host ").put("[host=").put(host).put(", port=").put(port).put(", errno=").put(errno).put(']'); } nf.freeAddrInfo(addrInfo); @@ -631,9 +639,20 @@ private void connect(CharSequence host, int port) { throw new HttpClientException("could not configure socket to be non-blocking [fd=").put(fd).put(", errno=").put(errno).put(']'); } + // Register the fd with the event loop before the TLS handshake so the + // handshake can park on socket readiness via ioWait() instead of + // busy-spinning on the non-blocking socket. + setupIoWait(); + if (socket.supportsTls()) { + // Bound the TLS handshake by the connect budget (falling back to + // the request timeout when connect_timeout is unset), so a peer + // that completes TCP but stalls mid-handshake cannot hang or pin a + // CPU. + final long tlsHandshakeStartNanos = System.nanoTime(); + final int tlsHandshakeBudgetMillis = connectTimeout > 0 ? connectTimeout : defaultTimeout; try { - socket.startTlsSession(host); + socket.startTlsSession(host, op -> ioWait(remainingTime(tlsHandshakeBudgetMillis, tlsHandshakeStartNanos), op)); } catch (TlsSessionInitFailedException e) { int errno = nf.errno(); disconnect(); @@ -641,9 +660,15 @@ private void connect(CharSequence host, int port) { .put(", error=").put(e.getFlyweightMessage()) .put(", errno=").put(errno) .put(']'); + } catch (Throwable t) { + // ioWait() throws a timeout-flagged HttpClientException when the + // handshake budget is exhausted; any other error can also surface + // mid-handshake. Disconnect so the fd and native buffers do not + // leak, then propagate. + disconnect(); + throw t; } } - setupIoWait(); } private void doSend(long lo, long hi, int timeoutMillis) { diff --git a/core/src/main/java/io/questdb/client/cutlass/http/client/WebSocketClient.java b/core/src/main/java/io/questdb/client/cutlass/http/client/WebSocketClient.java index 81ad7c86..49ecaa6e 100644 --- a/core/src/main/java/io/questdb/client/cutlass/http/client/WebSocketClient.java +++ b/core/src/main/java/io/questdb/client/cutlass/http/client/WebSocketClient.java @@ -47,6 +47,7 @@ import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; import java.util.Base64; +import java.util.concurrent.atomic.AtomicBoolean; import static java.util.concurrent.TimeUnit.NANOSECONDS; @@ -99,8 +100,15 @@ public abstract class WebSocketClient implements QuietCloseable { private final int maxRecvBufSize; private final SecureRnd rnd; private final WebSocketSendBuffer sendBuffer; - // volatile: written by user thread in close(), read by I/O thread in checkConnected()/sendFrame()/receiveFrame() - private volatile boolean closed; + // Written by whichever closer wins the CAS in close(); read by the I/O + // thread in checkConnected()/sendFrame()/receiveFrame(). An AtomicBoolean + // (not a bare volatile check-then-act) so concurrent closers cannot both + // enter close() and double-run disconnect()/Unsafe.free. + private final AtomicBoolean closed = new AtomicBoolean(); + // Upper bound (ms) on the TCP connect. <= 0 disables the application-level + // timeout and falls back to the OS connect timeout. Seeded from the + // configuration; the QWP sender may override it via setConnectTimeout(). + private int connectTimeoutMillis; private int fragmentBufPos; private long fragmentBufPtr; // native buffer for accumulating fragment payloads private int fragmentBufSize; @@ -168,6 +176,7 @@ public WebSocketClient(HttpClientConfiguration configuration, SocketFactory sock this.nf = configuration.getNetworkFacade(); this.socket = socketFactory.newInstance(nf, LOG); this.defaultTimeout = configuration.getTimeout(); + this.connectTimeoutMillis = configuration.getConnectTimeout(); int sendBufSize = Math.max(configuration.getInitialRequestBufferSize(), DEFAULT_SEND_BUFFER_SIZE); int maxSendBufSize = Math.max(configuration.getMaximumRequestBufferSize(), sendBufSize); @@ -192,7 +201,7 @@ public WebSocketClient(HttpClientConfiguration configuration, SocketFactory sock this.frameParser = new WebSocketFrameParser(); this.rnd = new SecureRnd(); this.upgraded = false; - this.closed = false; + this.closed.set(false); } catch (Throwable t) { if (recvBufPtr != 0) { Unsafe.free(recvBufPtr, recvBufSize, MemoryTag.NATIVE_DEFAULT); @@ -207,8 +216,12 @@ public WebSocketClient(HttpClientConfiguration configuration, SocketFactory sock @Override public void close() { - if (!closed) { - closed = true; + // CAS gate: exactly one closer runs the teardown below. Closers can be + // the owner thread, the I/O thread's exit path, or a stale duplicate + // reference (see CursorWebSocketSendLoop) -- a bare volatile + // check-then-act here would let two concurrent closers both enter and + // double-run disconnect()/Unsafe.free (native double-free). + if (closed.compareAndSet(false, true)) { // Try to send close frame if (upgraded && !socket.isClosed()) { @@ -242,7 +255,7 @@ public void close() { * @param port the server port */ public void connect(CharSequence host, int port) { - if (closed) { + if (closed.get()) { throw new HttpClientException("WebSocket client is closed"); } @@ -375,7 +388,7 @@ public int getUpgradeStatusCode() { * Returns whether the WebSocket is connected and upgraded. */ public boolean isConnected() { - return upgraded && !closed && !socket.isClosed(); + return upgraded && !closed.get() && !socket.isClosed(); } /** @@ -481,6 +494,16 @@ public void sendPing(int timeout) { } } + /** + * Overrides the TCP connect timeout (milliseconds) for subsequent + * {@link #connect} calls. {@code <= 0} disables the application-level + * timeout and falls back to the OS connect timeout. Must be called before + * {@link #connect}. + */ + public void setConnectTimeout(int connectTimeoutMillis) { + this.connectTimeoutMillis = connectTimeoutMillis; + } + /** * Sets the value sent as the {@code X-QWP-Accept-Encoding} upgrade header, * e.g. {@code "zstd;level=1,raw"}. Pass {@code null} to omit the header @@ -570,7 +593,7 @@ public boolean tryReceiveFrame(WebSocketFrameHandler handler) { * @param authorizationHeader the Authorization header value (e.g., "Basic ..."), or null */ public void upgrade(CharSequence path, int timeout, CharSequence authorizationHeader) { - if (closed) { + if (closed.get()) { throw new HttpClientException("WebSocket client is closed"); } if (socket.isClosed()) { @@ -877,7 +900,7 @@ private void appendToFragmentBuffer(long payloadPtr, int payloadLen) { } private void checkConnected() { - if (closed) { + if (closed.get()) { throw new HttpClientException("WebSocket client is closed"); } if (!upgraded) { @@ -922,10 +945,18 @@ private void doConnect(CharSequence host, int port) { throw new HttpClientException("could not resolve host [host=").put(host).put(']'); } - if (nf.connectAddrInfo(fd, addrInfo) != 0) { + final int connectResult = connectTimeoutMillis > 0 + ? nf.connectAddrInfoTimeout(fd, addrInfo, connectTimeoutMillis) + : nf.connectAddrInfo(fd, addrInfo); + if (connectResult != 0) { int errno = nf.errno(); nf.freeAddrInfo(addrInfo); disconnect(); + if (connectResult == NetworkFacade.CONNECT_TIMEOUT) { + throw new HttpClientException("connect timed out [host=").put(host) + .put(", port=").put(port) + .put(", timeout=").put(connectTimeoutMillis).put(']').flagAsTimeout(); + } throw new HttpClientException("could not connect [host=").put(host) .put(", port=").put(port) .put(", errno=").put(errno).put(']'); @@ -939,19 +970,35 @@ private void doConnect(CharSequence host, int port) { .put(", errno=").put(errno).put(']'); } + // Register the fd with the event loop before the TLS handshake so the + // handshake can park on socket readiness via ioWait() instead of + // busy-spinning on the non-blocking socket. + setupIoWait(); + if (socket.supportsTls()) { + // Bound the TLS handshake by the connect budget (falling back to the + // request timeout when connect_timeout is unset), so a peer that + // completes TCP but stalls mid-handshake cannot hang or pin a CPU. + final long tlsHandshakeStartNanos = System.nanoTime(); + final int tlsHandshakeBudgetMillis = connectTimeoutMillis > 0 ? connectTimeoutMillis : defaultTimeout; try { - socket.startTlsSession(host); + socket.startTlsSession(host, op -> ioWait(getRemainingTimeOrThrow(tlsHandshakeBudgetMillis, tlsHandshakeStartNanos), op)); } catch (TlsSessionInitFailedException e) { int errno = nf.errno(); disconnect(); throw new HttpClientException("could not start TLS session [fd=").put(fd) .put(", error=").put(e.getFlyweightMessage()) .put(", errno=").put(errno).put(']'); + } catch (Throwable t) { + // ioWait() throws a timeout-flagged HttpClientException when the + // handshake budget is exhausted; any other error can also surface + // mid-handshake. Disconnect so the fd and native buffers do not + // leak, then propagate. + disconnect(); + throw t; } } - setupIoWait(); if (LOG.isDebugEnabled()) { LOG.debug("Connected to [host={}, port={}]", host, port); } diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpHostHealthTracker.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpHostHealthTracker.java index 166c0331..7b61f957 100644 --- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpHostHealthTracker.java +++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpHostHealthTracker.java @@ -35,9 +35,19 @@ * so a known-good cross-zone host is picked before an untried local host. *

* Each method is internally synchronized, but pickNext + recordX is not atomic - * across the pair. Callers must externally serialize a pick → record sequence - * (the QWP clients do this via the sender's {@code synchronized buildAndConnect} - * and the query client's documented one-execute-at-a-time contract). + * across the pair. Callers of the SHARED-round API (pickNext / beginRound / + * isRoundExhausted) must externally serialize a pick → record sequence (the + * ingest sender does this by keeping its foreground connect walk single-file + * behind a lock; the query client via its documented one-execute-at-a-time + * contract). + *

+ * Concurrent walkers that must not consume or poison the shared round -- + * the ingest sender's background orphan drainers -- use a private + * {@link RoundCursor} ({@link #newRoundCursor()}) paired with the + * health-only record overloads ({@code markRoundAttempted=false}): the + * cursor's attempted set is walker-local (claim-at-pick, so concurrent + * cursors never race on the pick → record pair), while state/zone updates + * flow into the shared health ledger that orders everyone's picks. */ public final class QwpHostHealthTracker { public enum HostState { @@ -250,24 +260,113 @@ public void recordMidStreamFailure(int idx) { } public void recordRoleReject(int idx, boolean isTransient) { + recordRoleReject(idx, isTransient, true); + } + + /** + * Variant with an explicit round-bit policy. {@code markRoundAttempted = + * false} updates only the shared health ledger (state), leaving the + * shared round's attempted bit untouched — for walkers on a private + * {@link RoundCursor} whose attempts must stay invisible to the shared + * round (the ingest sender's background drainers). + */ + public void recordRoleReject(int idx, boolean isTransient, boolean markRoundAttempted) { synchronized (lock) { states[idx] = isTransient ? HostState.TRANSIENT_REJECT : HostState.TOPOLOGY_REJECT; - attemptedThisRound[idx] = true; + if (markRoundAttempted) { + attemptedThisRound[idx] = true; + } } } public void recordSuccess(int idx) { + recordSuccess(idx, true); + } + + /** + * Variant with an explicit round-bit policy; see + * {@link #recordRoleReject(int, boolean, boolean)}. The success epoch + * (sticky-Healthy recency) is recorded either way — a background + * walker's success is real health data. + */ + public void recordSuccess(int idx, boolean markRoundAttempted) { synchronized (lock) { states[idx] = HostState.HEALTHY; - attemptedThisRound[idx] = true; + if (markRoundAttempted) { + attemptedThisRound[idx] = true; + } lastSuccessEpoch[idx] = ++successEpoch; } } public void recordTransportError(int idx) { + recordTransportError(idx, true); + } + + /** + * Variant with an explicit round-bit policy; see + * {@link #recordRoleReject(int, boolean, boolean)}. + */ + public void recordTransportError(int idx, boolean markRoundAttempted) { synchronized (lock) { states[idx] = HostState.TRANSPORT_ERROR; - attemptedThisRound[idx] = true; + if (markRoundAttempted) { + attemptedThisRound[idx] = true; + } + } + } + + /** + * Creates a walker-private full-sweep cursor over the host list. Each + * {@link RoundCursor#next()} returns the highest-priority host this + * cursor has not yet returned — priority is the same live + * {@code (state, zone_tier)} tuple {@link #pickNext()} uses — and + * claims it at pick time in the cursor's OWN attempted set, so: + *

    + *
  • every cursor sweeps every host exactly once regardless of what + * other walkers do concurrently (no endpoint stealing);
  • + *
  • the pick → record pair needs no external serialization — the + * claim is cursor-local, and the health records are atomic;
  • + *
  • the shared round (attempted bits, {@link #beginRound}, + * {@link #isRoundExhausted}) is never consulted nor mutated.
  • + *
+ * Pair with the {@code markRoundAttempted=false} record overloads so the + * walker's results update shared health without touching the shared + * round. + */ + public RoundCursor newRoundCursor() { + return new RoundCursor(); + } + + /** See {@link #newRoundCursor()}. Not thread-safe for sharing a single + * instance across walkers; create one per walk. */ + public final class RoundCursor { + private final boolean[] attempted = new boolean[hostCount]; + + private RoundCursor() { + } + + /** + * Highest-priority host this cursor has not yet returned, claimed at + * pick time; -1 once the cursor has swept every host. Ordering reads + * the LIVE shared health state under the tracker lock, so a state + * change recorded by any walker between two calls re-ranks the + * remaining hosts. + */ + public int next() { + synchronized (lock) { + for (HostState p : PRIORITY_ORDER) { + for (ZoneTier z : ZONE_PRIORITY_ORDER) { + for (int i = 0; i < hostCount; i++) { + if (!attempted[i] && states[i] == p && zoneTiers[i] == z) { + attempted[i] = true; + return i; + } + } + } + } + return -1; + } } } diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpQueryClient.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpQueryClient.java index 1706401e..92b4f6a7 100644 --- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpQueryClient.java +++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpQueryClient.java @@ -165,6 +165,9 @@ public class QwpQueryClient implements QuietCloseable { private final Random failoverRandom = new Random(); private long authTimeoutMs = DEFAULT_AUTH_TIMEOUT_MS; private String authorizationHeader; + // Upper bound (ms) on each TCP connect attempt. 0 (default) falls back to + // the OS connect timeout. + private int connectTimeoutMs = 0; private int bufferPoolSize = DEFAULT_IO_BUFFER_POOL_SIZE; private String clientId; // Client-configured zone (failover.md §1.1), opaque case-insensitive @@ -387,6 +390,7 @@ public static QwpQueryClient fromConfig(CharSequence configurationString) { Long failoverMaxDurationMs = view.has("failover_max_duration_ms") ? view.getLong("failover_max_duration_ms", 0) : null; Long authTimeoutMs = view.has("auth_timeout_ms") ? view.getLong("auth_timeout_ms", 0) : null; + Integer connectTimeout = view.has("connect_timeout") ? (int) view.getLong("connect_timeout", 0) : null; Long initialCredit = view.has("initial_credit") ? view.getLong("initial_credit", 0) : null; int poolSize = view.getInt("buffer_pool_size", DEFAULT_IO_BUFFER_POOL_SIZE); String compression = view.getEnum("compression"); @@ -442,6 +446,9 @@ public static QwpQueryClient fromConfig(CharSequence configurationString) { if (authTimeoutMs != null) { client.withAuthTimeout(authTimeoutMs); } + if (connectTimeout != null) { + client.withConnectTimeout(connectTimeout); + } if (initialCredit != null) { client.withInitialCredit(initialCredit); } @@ -497,6 +504,7 @@ public static void validateConfig(ConfigView view, boolean tls) { view.getLong("failover_max_duration_ms", -1); view.getLong("initial_credit", -1); view.getLong("auth_timeout_ms", -1); + view.getLong("connect_timeout", -1); String username = view.getStr("username"); String password = view.getStr("password"); String token = view.getStr("token"); @@ -867,6 +875,7 @@ public java.util.Map configSnapshotForTest() { m.put("client_id", clientId); m.put("zone", clientZone); m.put("auth_timeout_ms", authTimeoutMs); + m.put("connect_timeout", connectTimeoutMs); m.put("authorization_header", authorizationHeader); m.put("tls_verify", tlsValidationMode); m.put("tls_roots", trustStorePath); @@ -994,6 +1003,22 @@ public QwpQueryClient withAuthTimeout(long authTimeoutMs) { return this; } + /** + * Upper bound, in milliseconds, on establishing the TCP connection to an + * endpoint. Unlike {@link #withAuthTimeout(long)} this DOES bound the TCP + * connect itself (via a non-blocking connect), so a routing blackhole that + * never returns SYN-ACK is aborted within this budget instead of riding the + * OS connect timeout. {@code 0} (default) keeps the OS connect timeout. + */ + public QwpQueryClient withConnectTimeout(int connectTimeoutMs) { + checkPreConnect("withConnectTimeout"); + if (connectTimeoutMs <= 0) { + throw new IllegalArgumentException("connectTimeoutMs must be > 0"); + } + this.connectTimeoutMs = connectTimeoutMs; + return this; + } + /** * Configures HTTP Basic authentication for the WebSocket upgrade request. * The server verifies the credentials against the same user store the @@ -1369,6 +1394,7 @@ private void connectToEndpoint(Endpoint ep) { webSocketClient.setQwpClientId(clientId != null ? clientId : defaultClientId()); webSocketClient.setQwpAcceptEncoding(buildAcceptEncodingHeader()); webSocketClient.setQwpMaxBatchRows(maxBatchRows); + webSocketClient.setConnectTimeout(connectTimeoutMs); runUpgradeWithTimeout(ep); negotiatedQwpVersion = webSocketClient.getServerQwpVersion(); negotiatedZstdLevel = webSocketClient.getServerNegotiatedZstdLevel(); @@ -1745,12 +1771,21 @@ private void reconnectViaTracker() { } private void runUpgradeWithTimeout(Endpoint ep) { + // Connect first, OUTSIDE the upgrade try. A connect-phase failure -- + // including a connect_timeout overage flagged via flagAsTimeout() -- must + // keep its own message ("connect timed out ...") and must NOT be relabeled + // as an auth_timeout overage below. doConnect() tears down its own socket + // on failure; the failover walker treats the propagated HttpClientException + // as a transport error and moves on to the next endpoint. + webSocketClient.connect(ep.host, ep.port); + int timeoutMs = (int) Math.min(authTimeoutMs, Integer.MAX_VALUE); try { - webSocketClient.connect(ep.host, ep.port); webSocketClient.upgrade(DEFAULT_ENDPOINT_PATH, timeoutMs, authorizationHeader); } catch (HttpClientException ex) { if (ex.isTimeout()) { + // Reachable only for an upgrade/auth-phase timeout now, so the + // auth_timeout attribution is accurate. HttpClientException timeout = new HttpClientException("WebSocket upgrade to ") .put(ep.host).put(':').put(ep.port) .put(" exceeded auth_timeout=").put(authTimeoutMs).put("ms"); diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpVersionMismatchException.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpVersionMismatchException.java index 5323f297..d03cf1ed 100644 --- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpVersionMismatchException.java +++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpVersionMismatchException.java @@ -31,10 +31,11 @@ * {@code X-QWP-Version} outside the client's supported range. Treated as * transient at every layer per sf-client.md section 13.3: the per-endpoint * round walks to the next host (rolling upgrade can leave one node ahead of - * or behind its peers), and a full round of mismatches consumes the per-outage - * reconnect budget. Only after the budget exhausts does the connect loop - * surface a terminal error -- as {@code PROTOCOL_VIOLATION} via the natural - * giveup path, not {@code SECURITY_ERROR}. + * or behind its peers). The background reconnect loop retries a full round + * of mismatches indefinitely (Invariant B: no wall-clock give-up); the + * blocking (sync) initial connect consumes its retry budget and surfaces a + * {@code LineSenderException} from {@code fromConfig} on exhaustion. Never + * classified as {@code SECURITY_ERROR}. */ public final class QwpVersionMismatchException extends HttpClientException { public QwpVersionMismatchException(int serverVersion, int clientMaxVersion) { diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java index 9b9cc45d..7d6dabe8 100644 --- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java +++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java @@ -39,6 +39,7 @@ import io.questdb.client.cutlass.line.LineSenderException; import io.questdb.client.cutlass.line.array.DoubleArray; import io.questdb.client.cutlass.line.array.LongArray; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener; import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerPool; import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine; import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop; @@ -72,6 +73,7 @@ import java.util.List; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicReference; +import java.util.concurrent.locks.ReentrantLock; /** * QWP v1 WebSocket client sender for streaming data to QuestDB. @@ -127,6 +129,9 @@ public class QwpWebSocketSender implements Sender { public static final int DEFAULT_AUTO_FLUSH_BYTES = 8 * 1024 * 1024; public static final long DEFAULT_AUTO_FLUSH_INTERVAL_NANOS = 100_000_000L; // 100ms public static final int DEFAULT_AUTO_FLUSH_ROWS = 1_000; + // Finite fallback (ms) for BACKGROUND (drainer) TCP connects when the + // user left connect_timeout unset. See effectiveConnectTimeoutMs. + public static final int DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS = 15_000; private static final int DEFAULT_BUFFER_SIZE = 8192; private static final int DEFAULT_MICROBATCH_BUFFER_SIZE = 1024 * 1024; // 1MB private static final Logger LOG = LoggerFactory.getLogger(QwpWebSocketSender.class); @@ -148,12 +153,29 @@ public class QwpWebSocketSender implements Sender { private final List endpoints; // Global symbol dictionary for delta encoding private final GlobalSymbolDictionary globalSymbolDictionary; + // Serializes FOREGROUND connect walks only (see buildAndConnect): the + // shared-round state in hostTracker (pickNext/beginRound/attempted + // bits), roundSeq, roundConnectAttemptSeq, and the foreground lifecycle + // commits (currentEndpointIdx, hasEverConnected, cap-derived sizing) + // all have exactly one writer -- the foreground walk -- and foreground + // walks cannot overlap by construction (the I/O loop is single-threaded + // and the user-thread initial connect completes before the loop + // starts); the lock is cheap insurance for that invariant. Background + // (drainer) walks take NO lock at all: they walk a private + // QwpHostHealthTracker.RoundCursor and record health-only results, so + // no network I/O ever runs under a sender-wide lock for background + // work, and neither the foreground's reconnect nor close() can queue + // behind a drainer's endpoint walk. + private final ReentrantLock connectWalkLock = new ReentrantLock(); private final QwpHostHealthTracker hostTracker; private final CharSequenceObjHashMap tableBuffers; // null means plain text (no TLS) private final ClientTlsConfiguration tlsConfig; private MicrobatchBuffer activeBuffer; private long authTimeoutMs = DEFAULT_AUTH_TIMEOUT_MS; + // Upper bound (ms) on each TCP connect attempt. 0 (default) falls back to + // the OS connect timeout. Applied to every WebSocketClient before connect. + private int connectTimeoutMs = 0; // Double-buffering for async I/O private MicrobatchBuffer buffer0; // Cached column references to avoid repeated hashmap lookups @@ -161,6 +183,12 @@ public class QwpWebSocketSender implements Sender { private QwpTableBuffer.ColumnBuffer cachedTimestampNanosColumn; // WebSocket client (zero-GC native implementation) private WebSocketClient client; + // Test seam: when non-null, buildAndConnect obtains its per-attempt + // client here instead of WebSocketClientFactory, so JVM-error cleanup + // tests can observe close() on a client whose connect() throws Error. + // Null in production; set reflectively by tests. + @TestOnly + private volatile java.util.function.Supplier clientFactoryOverride; // close() drain timeout in millis. Default applied at construction. // 0 or -1 means "fast close" (skip the drain); otherwise close blocks // up to this many millis for ackedFsn to catch up to publishedFsn. @@ -193,6 +221,11 @@ public class QwpWebSocketSender implements Sender { private CursorSendEngine cursorEngine; private CursorWebSocketSendLoop cursorSendLoop; private boolean deferCommit; + // User-supplied observer for background orphan-slot drainer events. + // Volatile: written by setDrainerListener (any thread, before or after + // startOrphanDrainers) and read at pool-creation time. Null -> drainers + // run without a listener. + private volatile BackgroundDrainerListener drainerListener; // Orphan-slot drainer pool. Non-null only when the builder requested // drain_orphans=true AND we have a slot path to scan against. Closed // alongside the cursor send loop in close(). @@ -208,7 +241,8 @@ public class QwpWebSocketSender implements Sender { // advertised X-QWP-Max-Batch-Size at handshake so the wire payload stays // under the server's cap even with encoding overhead. Volatile because the // I/O thread writes this inside buildAndConnect on every successful - // (re)connect while the producer thread reads it from sendRow without + // FOREGROUND (re)connect -- background drainer connects never touch it -- + // while the producer thread reads it from sendRow without // holding the sender monitor. private volatile int effectiveAutoFlushBytes; private SenderErrorDispatcher errorDispatcher; @@ -219,18 +253,20 @@ public class QwpWebSocketSender implements Sender { private int errorInboxCapacity = SenderErrorDispatcher.DEFAULT_CAPACITY; private long firstPendingRowTimeNanos; private boolean hasDeferredMessages; - // Stickys true once any successful connect has happened. Drives the + // Stickys true once any successful FOREGROUND connect has happened + // (background drainer connects never set it). Drives the // CONNECTED-vs-RECONNECTED-vs-FAILED_OVER classification at the success // point in buildAndConnect. private boolean hasEverConnected; // OFF → startup connect failure is immediately terminal (default). - // SYNC → startup connect goes through the same retry-with-backoff - // loop as in-flight reconnect; auth failures still terminal. + // SYNC → startup connect retries with backoff on the user thread, + // bounded by reconnect_max_duration_millis; auth failures + // still terminal. // ASYNC → user thread does not connect at all. The I/O thread runs - // the same retry loop in the background; terminal failures - // (auth/upgrade reject, budget exhaustion) are delivered - // to the SenderError dispatcher rather than thrown from the - // constructor. + // the reconnect loop in the background, indefinitely + // (Invariant B); terminal failures (auth/upgrade reject) + // are delivered to the SenderError dispatcher rather than + // thrown from the constructor. private Sender.InitialConnectMode initialConnectMode = Sender.InitialConnectMode.OFF; private boolean ownsCursorEngine; private long pendingBytes; @@ -255,8 +291,9 @@ public class QwpWebSocketSender implements Sender { CursorWebSocketSendLoop.DEFAULT_RECONNECT_MAX_DURATION_MILLIS; private boolean requestDurableAck; // Monotonic per-attempt counter snapshotted onto every connection event - // fired from buildAndConnect. Counts every endpoint try -- successes and - // failures alike -- across this sender's lifetime. + // fired from buildAndConnect. Counts every FOREGROUND endpoint try -- + // successes and failures alike -- across this sender's lifetime. + // Background (drainer) walks fire no events and do not advance it. private long roundConnectAttemptSeq; // Monotonic per-round counter incremented inside buildAndConnect on each // beginRound(true) call. roundSeq=1 is the first round; CONNECTED in the @@ -267,7 +304,8 @@ public class QwpWebSocketSender implements Sender { // arbitrarily large datasets that exceed the server's recv buffer. private boolean transactional; // Server-advertised hard cap on QWP ingest payload bytes, captured from - // X-QWP-Max-Batch-Size on each successful handshake. 0 when the server + // X-QWP-Max-Batch-Size on each successful FOREGROUND handshake (a + // background drainer's endpoint cap is irrelevant to the producer's wire). 0 when the server // did not advertise the header (older builds); the sender then falls back // to its locally configured budget. Volatile because buildAndConnect can // refresh this from the cursor I/O thread on a mid-stream reconnect while @@ -577,7 +615,7 @@ public static QwpWebSocketSender connect( reconnectInitialBackoffMillis, reconnectMaxBackoffMillis, initialConnectMode, errorHandler, errorInboxCapacity, durableAckKeepaliveIntervalMillis, authTimeoutMs, - null, SenderConnectionDispatcher.DEFAULT_CAPACITY); + 0, null, SenderConnectionDispatcher.DEFAULT_CAPACITY); } /** @@ -602,6 +640,7 @@ public static QwpWebSocketSender connect( int errorInboxCapacity, long durableAckKeepaliveIntervalMillis, long authTimeoutMs, + int connectTimeoutMs, SenderConnectionListener connectionListener, int connectionListenerInboxCapacity ) { @@ -613,6 +652,7 @@ public static QwpWebSocketSender connect( try { sender.requestDurableAck = requestDurableAck; sender.authTimeoutMs = authTimeoutMs; + sender.connectTimeoutMs = connectTimeoutMs; sender.closeFlushTimeoutMillis = closeFlushTimeoutMillis; sender.reconnectMaxDurationMillis = reconnectMaxDurationMillis; sender.reconnectInitialBackoffMillis = reconnectInitialBackoffMillis; @@ -918,6 +958,31 @@ public QwpWebSocketSender charColumn(CharSequence columnName, char value) { return this; } + /** + * Closes the sender: flushes user-thread state into the engine, drains + * acked data within {@code close_flush_timeout}, stops the I/O loop, + * closes the orphan-drainer pool, and frees buffers. + *

+ * Worst-case latency budget (dominant contributors, sequential): + *

    + *
  • bounded drain: up to {@code close_flush_timeout} when the server + * is slow or unreachable ({@code <= 0} opts out);
  • + *
  • I/O loop stop: the shutdown-latch await is untimed, but the loop + * exits promptly unless the I/O thread sits inside a blocking + * native connect — bounded by {@code connect_timeout}, or by the + * OS SYN-retry deadline (60-130s on Linux) when the default + * {@code 0} is in effect. Background drainer walks never delay + * this stop: they run lock-free on private round cursors and + * never hold anything the foreground waits on (see + * {@link #buildAndConnect});
  • + *
  • drainer pool: drainers still in their connect-retry phase are + * stop-signaled immediately (exit within ~50ms); drainers actively + * replaying frames get a 2.5s grace window plus a 0.5s stop window + * — worst case ~3s when a drainer sits in a blocking native + * connect (15s background deadline) and must be abandoned to exit + * on its own.
  • + *
+ */ @Override public void close() { if (!closed) { @@ -1014,10 +1079,14 @@ public void close() { terminalError = captureCloseError(terminalError, e); } } - // Drainer pool runs after the foreground I/O loop is wound - // down — drainers don't share state with the foreground, so - // ordering doesn't matter for correctness, just predictable - // shutdown. + // Drainer pool closes after the foreground I/O loop is wound + // down. Drainers share buildAndConnect's endpoint walk and + // hostTracker state with the foreground (never its observable + // connection state or event stream), but their + // connect gate is their own stop flag — NOT the foreground + // loop's liveness — so the pool's graceful-drain window below + // still lets in-flight drainers finish (including reconnects) + // even though cursorSendLoop is already stopped. if (drainerPool != null) { try { drainerPool.close(); @@ -1048,6 +1117,27 @@ public void close() { // The I/O thread may still be using the socket and microbatch // buffers. Freeing them would risk SIGSEGV. LOG.error("I/O thread is still running, leaking WebSocket client and microbatch buffers"); + // The engine, however, need not leak: delegate its close to + // the I/O thread's exit path, which runs it strictly after + // the thread's last engine access — the mapping and slot + // lock release as soon as the stuck wire call resolves + // (bounded by OS timeouts). slotLockReleased intentionally + // stays false: the lock is released only when the delegated + // close actually runs, so the pool must not reuse the slot + // meanwhile. A false return means the thread exited between + // the failed close() and now — then closing here is safe. + if (ownsCursorEngine && cursorEngine != null && cursorSendLoop != null + && !cursorSendLoop.delegateEngineClose()) { + try { + cursorEngine.close(); + } catch (Throwable t) { + LOG.error("Error closing owned CursorSendEngine: {}", String.valueOf(t)); + terminalError = captureCloseError(terminalError, t); + } + cursorEngine = null; + ownsCursorEngine = false; + slotLockReleased = true; + } rethrowTerminal(terminalError); return; } @@ -1953,6 +2043,30 @@ public CursorWebSocketSendLoop.ReconnectFactory newReconnectFactory() { return new ReconnectSupplier(); } + /** + * Test seam: a BACKGROUND reconnect factory identical to the ones + * {@link #startOrphanDrainers} hands to orphan drainers (abort gate = + * the supplied stop flag, {@code isBackground()=true}), so tests can + * exercise the background side of the connect-walk lock policy (see + * {@link #buildAndConnect}) without reflection. + */ + @TestOnly + public CursorWebSocketSendLoop.ReconnectFactory newBackgroundReconnectFactory( + java.util.function.BooleanSupplier stopFlag + ) { + return new ReconnectSupplier(stopFlag, "drainer stop requested during connect"); + } + + /** + * Test seam: installs the per-attempt WebSocket client factory override + * consulted by {@code newWebSocketClient()} inside the connect walk. + * Production code never sets it. + */ + @TestOnly + public void setClientFactoryOverride(java.util.function.Supplier factory) { + this.clientFactoryOverride = factory; + } + @Override public void reset() { checkNotClosed(); @@ -2035,6 +2149,33 @@ public void setCursorEngine(CursorSendEngine engine, boolean takeOwnership) { this.ownsCursorEngine = takeOwnership && engine != null; } + /** + * Register an async observer for background orphan-slot drainer events. + * May be called either before or after {@link #startOrphanDrainers} — + * when called before, the drainer pool picks it up as its submit-time + * default; when called after, it propagates to the pool AND to every + * live drainer (per-drainer re-assignment while running is explicitly + * permitted by the drainer's listener contract). Pass {@code null} to + * clear. {@code synchronized} to coordinate with + * {@code startOrphanDrainers}: a concurrent submit either observes the + * pool listener already set or is covered by the snapshot propagation. + */ + public synchronized void setDrainerListener(BackgroundDrainerListener listener) { + this.drainerListener = listener; + BackgroundDrainerPool pool = drainerPool; + if (pool != null) { + // Submit-time fallback for drainers not yet submitted... + pool.setListener(listener); + // ...and direct re-assignment for the ones already running (the + // pool listener is only applied at submit time, never after). + ObjList live = + pool.snapshot(); + for (int i = 0, n = live.size(); i < n; i++) { + live.getQuick(i).setListener(listener); + } + } + } + /** * Configure the user-supplied error handler. May be called either before * or after {@code connect()} — when called after, the change propagates @@ -2133,18 +2274,42 @@ public synchronized void startOrphanDrainers( if (drainerPool == null) { drainerPool = new io.questdb.client.cutlass.qwp.client.sf.cursor .BackgroundDrainerPool(maxBackgroundDrainers); + // Install the user listener as the pool's submit-time default so + // the drainers submitted below observe it from their first event. + drainerPool.setListener(this.drainerListener); } for (int i = 0, n = orphanSlotPaths.size(); i < n; i++) { String slot = orphanSlotPaths.get(i); + // The drainer's connects must NOT be gated on the foreground + // sender's lifecycle: close() stops the foreground I/O loop + // BEFORE the drainer pool's graceful-drain window, so a + // foreground-gated factory would reject every drainer + // (re)connect with "sender closed during connect" during that + // window, leaving the orphan slot un-drained (and Invariant B + // forbids quarantining it on a transport-shaped error). Gate + // each drainer's factory on the drainer's OWN stop flag + // instead. The one-element array breaks the construction cycle + // (the factory needs the drainer, the drainer's constructor + // needs the factory); the ref write happens-before the drainer + // runs because submit() publishes the task afterwards. + final io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer[] ref = + new io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer[1]; + ReconnectSupplier factory = new ReconnectSupplier( + () -> { + io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer d = ref[0]; + return d != null && d.isStopRequested(); + }, + "drainer stop requested during connect"); io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer drainer = new io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer( slot, segmentSizeBytes, sfMaxTotalBytes, - newReconnectFactory(), + factory, reconnectMaxDurationMillis, reconnectInitialBackoffMillis, reconnectMaxBackoffMillis, requestDurableAck, durableAckKeepaliveIntervalMillis); + ref[0] = drainer; drainerPool.submit(drainer); } } @@ -2282,7 +2447,7 @@ public QwpWebSocketSender uuidColumn(CharSequence columnName, long lo, long hi) * True iff this sender has at least once installed a live (connected * + upgraded) WebSocket. Sticky — once true, stays true even after a * subsequent disconnect. Lets a {@link SenderErrorHandler} - * disambiguate a "never reached the server" budget exhaustion (likely + * disambiguate a "never reached the server" terminal failure (likely * a config typo or firewall block) from a "lost connection after we * were up" failure (likely transient). Returns {@code false} if no * I/O loop is running. @@ -2389,25 +2554,136 @@ private void atNanos(long timestampNanos) { sendRow(); } - private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) { + /** + * Resolves the connect timeout for one {@code buildAndConnect} walk. + * Foreground connects honour the configured value verbatim: 0 (the + * default) keeps the historical untimed native connect, bounded only by + * the OS (SYN retries, 60-130s on Linux). Background (drainer) connects + * get a finite fallback instead: during an outage a drainer is routinely + * parked inside a blocking native connect that neither unpark nor + * interrupt cancels, so the drainer pool's shutdownNow path (~3s into + * sender.close()) reliably lands on the failed-stop protocol -- the + * WebSocket client and microbatch buffers are deliberately leaked and + * the slot lock is held until the OS deadline resolves the connect. A + * finite background deadline bounds that window to seconds without + * changing foreground semantics. Exposed for unit tests. + */ + @TestOnly + public static int effectiveConnectTimeoutMs(boolean background, int configuredMs) { + return background && configuredMs <= 0 ? DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS : configuredMs; + } + + /** + * Builds the per-attempt WebSocket client for {@link #buildAndConnect}. + * Production path delegates to {@link WebSocketClientFactory}; tests may + * install {@link #clientFactoryOverride} to substitute a stub. + */ + private WebSocketClient newWebSocketClient() { + java.util.function.Supplier override = clientFactoryOverride; + if (override != null) { + return override.get(); + } + return tlsConfig != null + ? WebSocketClientFactory.newTlsInstance(tlsConfig) + : WebSocketClientFactory.newPlainTextInstance(); + } + + /** + * Multi-endpoint connect walk shared by the foreground sender and the + * background orphan drainers. One invocation sweeps the endpoint list, + * performing a TCP/TLS connect plus a WebSocket upgrade per endpoint; + * worst-case sweep duration is + * {@code endpoints x (connect timeout + upgrade timeout)}: + *
    + *
  • foreground walk: {@code connect_timeout} verbatim -- the default + * {@code 0} keeps the untimed native connect, bounded only by the + * OS SYN-retry deadline (60-130s per endpoint on Linux) -- plus + * {@code auth_timeout_ms} (default 15s) for the upgrade;
  • + *
  • background walk: 15s connect fallback + * ({@link #DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS}) plus + * {@code auth_timeout_ms} -- see + * {@link #effectiveConnectTimeoutMs(boolean, int)}.
  • + *
+ *

+ * Concurrency policy -- no network I/O under a sender-wide lock for + * background work. FOREGROUND walks (the producer's initial connect and + * the I/O loop's reconnects) hold {@link #connectWalkLock} across the + * sweep: they own the shared round state and the lifecycle commits, and + * can only ever wait behind another foreground walk (which cannot + * happen by construction -- the lock is insurance). BACKGROUND (drainer) + * walks take NO lock: each sweeps a private + * {@link QwpHostHealthTracker.RoundCursor} -- full sweep, claim-at-pick, + * ordered by the live shared health state -- and records results with + * the health-only overloads ({@code markRoundAttempted=false}), so + * concurrent drainer sweeps proceed in parallel with each other and + * with the foreground, share health observations, and can neither + * consume nor poison the foreground's round. The foreground's + * reconnect and {@code close()} paths are therefore never queued + * behind a drainer's endpoint walk. + */ + private WebSocketClient buildAndConnect(ReconnectSupplier ctx) { + if (ctx.isBackground()) { + // Lock-free: the walk below touches only internally-synchronized + // hostTracker health state and walk-local/cursor-local state on + // the background path. + return connectWalk(ctx); + } + connectWalkLock.lock(); + try { + return connectWalk(ctx); + } finally { + connectWalkLock.unlock(); + } + } + + private WebSocketClient connectWalk(ReconnectSupplier ctx) { + // Background (drainer) factories share this connect walk -- endpoint + // list and hostTracker HEALTH state (never the shared round: a + // background sweep walks its own RoundCursor and records with + // markRoundAttempted=false, so it cannot consume the foreground's + // round or skew roundSeq) -- but must stay INVISIBLE + // in the foreground sender's observable state. SenderConnectionEvents + // describe the FOREGROUND connection's lifecycle, and the cap-derived + // sizing (serverMaxBatchSize / effectiveAutoFlushBytes) guards the + // FOREGROUND wire: a drainer connect that committed either would + // fabricate lifecycle transitions the foreground never had, steal the + // once-per-lifetime CONNECTED classification, and re-size the + // producer's batch guard for a connection the producer is not on + // (oversize batch -> ws-close[1009] -> producer-terminal HALT caused + // by background activity). + final boolean background = ctx.isBackground(); + // Private full-sweep cursor for background walks: claim-at-pick over + // cursor-local attempted bits makes the pick -> record pair safe + // without any walk-wide lock, and guarantees every sweep tries every + // endpoint exactly once regardless of concurrent walkers. + final QwpHostHealthTracker.RoundCursor cursor = + background ? hostTracker.newRoundCursor() : null; int previousIdx = ctx.previousIdx; if (previousIdx >= 0) { // Mid-stream wire failure -- the I/O loop just observed the active - // connection drop and called us via the reconnect factory. Surface - // a DISCONNECTED event identifying which endpoint just went away - // before we start the per-endpoint walk for a replacement. - Endpoint priorEp = endpoints.get(previousIdx); - dispatchConnectionEvent( - SenderConnectionEvent.Kind.DISCONNECTED, - priorEp.host, priorEp.port, - null, SenderConnectionEvent.NO_PORT, - SenderConnectionEvent.NO_ATTEMPT_NUMBER, - roundSeq, - null); + // connection drop and called us via the reconnect factory. Only a + // FOREGROUND drop surfaces DISCONNECTED: a drainer's wire drop is + // not a foreground outage, and reporting it would claim an outage + // against an endpoint the foreground may be healthily using. The + // hostTracker health penalty is recorded either way -- the drop + // was real, whichever loop observed it. + if (!background) { + Endpoint priorEp = endpoints.get(previousIdx); + dispatchConnectionEvent( + SenderConnectionEvent.Kind.DISCONNECTED, + priorEp.host, priorEp.port, + null, SenderConnectionEvent.NO_PORT, + SenderConnectionEvent.NO_ATTEMPT_NUMBER, + roundSeq, + null); + } hostTracker.recordMidStreamFailure(previousIdx); ctx.previousIdx = -1; } - if (hostTracker.isRoundExhausted()) { + // Shared-round lifecycle is foreground-only: a background walk must + // not advance the round (or roundSeq, which numbers foreground + // events) under the foreground's feet. + if (!background && hostTracker.isRoundExhausted()) { roundSeq++; hostTracker.beginRound(true); } @@ -2424,21 +2700,25 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) { QwpIngressRoleRejectedException lastRoleReject = null; Endpoint lastEndpoint = null; while (true) { - if (cursorSendLoop == null ? closed : !cursorSendLoop.isRunning()) { - throw new LineSenderException("sender closed during connect"); + if (ctx.isAborted()) { + throw new LineSenderException(ctx.abortMessage()); } - int idx = hostTracker.pickNext(); + int idx = background ? cursor.next() : hostTracker.pickNext(); if (idx < 0) break; Endpoint ep = endpoints.get(idx); lastEndpoint = ep; - long attemptNumber = ++roundConnectAttemptSeq; - WebSocketClient newClient = tlsConfig != null - ? WebSocketClientFactory.newTlsInstance(tlsConfig) - : WebSocketClientFactory.newPlainTextInstance(); + // Attempt numbers exist for foreground observability only. A + // background walk fires no events and must not skew the numbering + // the user sees on subsequent foreground events. + long attemptNumber = background + ? SenderConnectionEvent.NO_ATTEMPT_NUMBER + : ++roundConnectAttemptSeq; + WebSocketClient newClient = newWebSocketClient(); try { newClient.setQwpMaxVersion(QwpConstants.VERSION); newClient.setQwpClientId(QwpConstants.CLIENT_ID); newClient.setQwpRequestDurableAck(requestDurableAck); + newClient.setConnectTimeout(effectiveConnectTimeoutMs(background, connectTimeoutMs)); newClient.connect(ep.host, ep.port); int upgradeTimeoutMs = (int) Math.min(authTimeoutMs, Integer.MAX_VALUE); newClient.upgrade(WRITE_PATH, upgradeTimeoutMs, authorizationHeader); @@ -2447,13 +2727,15 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) { newClient.close(); if (classified instanceof QwpIngressRoleRejectedException) { QwpIngressRoleRejectedException re = (QwpIngressRoleRejectedException) classified; - hostTracker.recordRoleReject(idx, re.isTransient()); + hostTracker.recordRoleReject(idx, re.isTransient(), !background); lastError = re; lastRoleReject = re; - dispatchConnectionEvent( - SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED, - ep.host, ep.port, null, SenderConnectionEvent.NO_PORT, - attemptNumber, roundSeq, re); + if (!background) { + dispatchConnectionEvent( + SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED, + ep.host, ep.port, null, SenderConnectionEvent.NO_PORT, + attemptNumber, roundSeq, re); + } continue; } if (classified instanceof QwpAuthFailedException) { @@ -2463,10 +2745,12 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) { // moment the I/O thread gives up, ahead of the producer // thread learning via LineSenderException on the next // API call. - dispatchConnectionEvent( - SenderConnectionEvent.Kind.AUTH_FAILED, - ep.host, ep.port, null, SenderConnectionEvent.NO_PORT, - attemptNumber, roundSeq, classified); + if (!background) { + dispatchConnectionEvent( + SenderConnectionEvent.Kind.AUTH_FAILED, + ep.host, ep.port, null, SenderConnectionEvent.NO_PORT, + attemptNumber, roundSeq, classified); + } throw classified; } if (terminalUpgradeError == null && ( @@ -2475,41 +2759,76 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) { && !((WebSocketUpgradeException) classified).isRoleMismatch()))) { terminalUpgradeError = classified; } - hostTracker.recordTransportError(idx); + hostTracker.recordTransportError(idx, !background); lastError = classified; - dispatchConnectionEvent( - SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED, - ep.host, ep.port, null, SenderConnectionEvent.NO_PORT, - attemptNumber, roundSeq, classified); + if (!background) { + dispatchConnectionEvent( + SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED, + ep.host, ep.port, null, SenderConnectionEvent.NO_PORT, + attemptNumber, roundSeq, classified); + } continue; } catch (Exception e) { newClient.close(); - hostTracker.recordTransportError(idx); + hostTracker.recordTransportError(idx, !background); lastError = e; - dispatchConnectionEvent( - SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED, - ep.host, ep.port, null, SenderConnectionEvent.NO_PORT, - attemptNumber, roundSeq, e); + if (!background) { + dispatchConnectionEvent( + SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED, + ep.host, ep.port, null, SenderConnectionEvent.NO_PORT, + attemptNumber, roundSeq, e); + } continue; + } catch (Error e) { + // JVM failure (OOM, LinkageError, StackOverflowError) during + // connect/upgrade. Without this catch the half-built client + // escaped with its fd and native buffers open -- unreachable + // by GC, freed only in close(). Close it quietly: under OOM + // close() itself can throw, and a secondary failure must not + // mask the original Error. Deliberately NO hostTracker penalty + // and NO ENDPOINT_ATTEMPT_FAILED event -- a JVM failure is not + // endpoint health data, and misclassifying it would poison the + // walk. Rethrow: every retry loop upstream (connectWithRetry, + // the cursor reconnect loop, BackgroundDrainer) rethrows Error + // rather than retrying, so this stays a loud one-shot failure. + try { + newClient.close(); + } catch (Throwable ignored) { + // best-effort; the original Error is what must surface + } + throw e; } if (requestDurableAck && !newClient.isServerDurableAckEnabled()) { newClient.close(); - hostTracker.recordRoleReject(idx, false); + hostTracker.recordRoleReject(idx, false, !background); QwpDurableAckMismatchException ackErr = new QwpDurableAckMismatchException( ep.host, ep.port, null); if (terminalUpgradeError == null) { terminalUpgradeError = ackErr; } lastError = ackErr; - dispatchConnectionEvent( - SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED, - ep.host, ep.port, null, SenderConnectionEvent.NO_PORT, - attemptNumber, roundSeq, ackErr); + if (!background) { + dispatchConnectionEvent( + SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED, + ep.host, ep.port, null, SenderConnectionEvent.NO_PORT, + attemptNumber, roundSeq, ackErr); + } continue; } - int previousLiveIdx = currentEndpointIdx; - hostTracker.recordSuccess(idx); + hostTracker.recordSuccess(idx, !background); ctx.previousIdx = idx; + if (background) { + // Walk bookkeeping only: recordSuccess feeds the shared health + // tracker and ctx.previousIdx arms this factory's own + // mid-stream-failure handling on its next reconnect. No + // lifecycle event, no CONNECTED/RECONNECTED/FAILED_OVER + // classification state, no producer batch re-sizing -- the + // drainer's lifecycle is observable via + // BackgroundDrainerListener and the drainer counters, never + // the foreground connection-event stream. + return newClient; + } + int previousLiveIdx = currentEndpointIdx; currentEndpointIdx = idx; // Classify the success. CONNECTED only fires once per sender // lifetime; subsequent successes are RECONNECTED (same endpoint @@ -2550,7 +2869,7 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) { // which terminal branch fires next. The connectLoop wrapper retries, // and each retry that re-enters this method and fails again produces // its own ALL_ENDPOINTS_UNREACHABLE event. - if (lastEndpoint != null) { + if (!background && lastEndpoint != null) { dispatchConnectionEvent( SenderConnectionEvent.Kind.ALL_ENDPOINTS_UNREACHABLE, lastEndpoint.host, lastEndpoint.port, @@ -2561,21 +2880,23 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) { throw terminalUpgradeError; } if (lastRoleReject != null) { - // When the client opted into durable ack but every endpoint - // role-rejected the /write/v4 upgrade (typically a misconfigured - // address list pointing at replicas only), a primary that can - // serve durable ack will not appear by retrying. Throw the typed - // QwpDurableAckMismatchException -- the cursor send loop's terminal - // classifier recognises it by instanceof and suppresses retry, so - // the SYNC/ASYNC connect paths fail fast instead of burning the - // full reconnect_max_duration_millis budget walking the same - // replicas. - if (requestDurableAck) { - QwpDurableAckMismatchException ackErr = new QwpDurableAckMismatchException( - lastRoleReject.getHost(), lastRoleReject.getPort(), lastRoleReject.getRole()); - ackErr.initCause(lastRoleReject); - throw ackErr; - } + // Every endpoint role-rejected the /write/v4 upgrade: right now the + // reachable nodes are all replicas (or primary-catchup). That is a + // TRANSIENT failover window, not a terminal condition -- a replica + // can be promoted and a primary will reappear. Surface it as a + // retriable QwpRoleMismatchException so the SYNC/ASYNC connect and + // reconnect loops keep the rows in store-and-forward and retry + // within reconnect_max_duration_millis (for an SF sender the only + // terminal condition is SF exhaustion). + // + // This holds even when durable ack was requested: a replica that + // gets promoted serves durable ack, so an all-replica window must + // NOT be reported as a durable-ack mismatch. Doing so conflated a + // transient role state with a permanent capability gap and hard- + // failed HA senders that should have recovered on promotion. A + // genuine capability gap -- an endpoint that upgrades but does not + // advertise durable ack -- is still terminal: it is raised as + // terminalUpgradeError above, before this block. QwpRoleMismatchException ex = new QwpRoleMismatchException( QwpIngressRoleRejectedException.ROLE_PRIMARY, null, @@ -2811,8 +3132,9 @@ private void ensureConnected() { // version today). Frames written before the first successful // connect commit to V1 because cursor segments are immutable; // a future version bump must account for that. Auth/upgrade - // rejects and budget exhaustion are surfaced via the error - // inbox by the I/O thread, not thrown here. + // rejects are surfaced via the error inbox by the I/O + // thread, not thrown here; plain connect failures retry + // indefinitely (Invariant B). client = null; break; case OFF: @@ -2854,10 +3176,11 @@ private void ensureConnected() { } cursorSendLoop.setProgressDispatcher(progressDispatcher); // Connection-event dispatcher: lets the cursor I/O loop fire - // DISCONNECTED on outage entry and RECONNECT_BUDGET_EXHAUSTED on - // budget exit. Sender-side fire points (buildAndConnect) write - // directly to connectionDispatcher; this getter just shares the - // same instance with the loop. + // DISCONNECTED on outage entry. Sender-side fire points + // (buildAndConnect) write directly to connectionDispatcher; this + // getter just shares the same instance with the loop. (Invariant B: + // the loop no longer fires a terminal budget-exhaustion event -- it + // retries indefinitely.) cursorSendLoop.setConnectionDispatcher(connectionDispatcher); cursorSendLoop.start(); } catch (Throwable t) { @@ -3326,8 +3649,48 @@ public Endpoint(String host, int port) { } private final class ReconnectSupplier implements CursorWebSocketSendLoop.ReconnectFactory { + /** + * Optional caller-owned liveness gate. {@code null} means this factory + * serves the foreground sender and aborts when the foreground I/O loop + * stops. Non-null means the factory serves a {@code BackgroundDrainer}: + * the drainer must be able to (re)connect during the sender's close + * sequence (the drainer pool's graceful-drain window runs AFTER the + * foreground loop is stopped), so its gate is the drainer's own stop + * flag, supplied here, instead of the foreground loop's state. + */ + private final java.util.function.BooleanSupplier abortCheck; + private final String abortMessage; private int previousIdx = -1; + private ReconnectSupplier() { + this(null, null); + } + + private ReconnectSupplier(java.util.function.BooleanSupplier abortCheck, String abortMessage) { + this.abortCheck = abortCheck; + this.abortMessage = abortMessage; + } + + String abortMessage() { + return abortCheck != null ? abortMessage : "sender closed during connect"; + } + + /** + * True when this factory serves a background drainer. Background + * connects share buildAndConnect's endpoint walk and hostTracker + * health state, but commit none of the foreground sender's + * observable connection state and fire no connection events. + */ + boolean isBackground() { + return abortCheck != null; + } + + boolean isAborted() { + return abortCheck != null + ? abortCheck.getAsBoolean() + : (cursorSendLoop == null ? closed : !cursorSendLoop.isRunning()); + } + @Override public WebSocketClient reconnect() { return buildAndConnect(this); diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainer.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainer.java index d54a01dc..d3e42602 100644 --- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainer.java +++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainer.java @@ -25,7 +25,12 @@ package io.questdb.client.cutlass.qwp.client.sf.cursor; import io.questdb.client.cutlass.http.client.WebSocketClient; +import io.questdb.client.cutlass.http.client.WebSocketUpgradeException; +import io.questdb.client.cutlass.qwp.client.QwpAuthFailedException; import io.questdb.client.cutlass.qwp.client.QwpDurableAckMismatchException; +import io.questdb.client.cutlass.qwp.client.QwpIngressRoleRejectedException; +import io.questdb.client.cutlass.qwp.client.QwpRoleMismatchException; +import io.questdb.client.cutlass.qwp.client.QwpVersionMismatchException; import org.jetbrains.annotations.TestOnly; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -50,27 +55,48 @@ *

  • Close everything in reverse order; release the lock.
  • * *

    - * On terminal failure (auth-rejection on reconnect, reconnect-budget - * exhaustion, recovery error), the drainer drops a - * {@link OrphanScanner#FAILED_SENTINEL_NAME} sentinel into the slot - * before exiting. Future scans skip the slot until an operator clears - * the sentinel — bounded automatic retry, then human-in-the-loop. + * On terminal failure (auth-rejection on reconnect, a cluster-wide durable-ack + * capability gap that exhausts its settle budget, recovery error), the drainer + * drops a {@link OrphanScanner#FAILED_SENTINEL_NAME} sentinel into the slot + * before exiting. Future scans skip the slot until an operator clears the + * sentinel — bounded automatic retry, then human-in-the-loop. A transient + * all-replica failover window is NOT terminal: it is retried indefinitely + * (Invariant B), never quarantined on a wall-clock budget or attempt cap. */ public final class BackgroundDrainer implements Runnable { /** * Cap on consecutive {@link QwpDurableAckMismatchException} attempts at * initial connect before the drainer escalates to a {@code .failed} - * sentinel. The wall-clock budget {@code reconnectMaxDurationMillis} - * also caps the same loop; whichever is hit first triggers escalation. - * 16 attempts gives the cluster room to settle through a rolling - * upgrade (each attempt walks every endpoint internally) without - * letting a genuine cluster-wide misconfig hang the drainer forever. + * sentinel. Applies ONLY to a genuine cluster-wide durable-ack capability + * gap (a server that upgrades but does not advertise durable ack); a + * transient all-replica failover window (role reject) is retried + * indefinitely and is never subject to this cap (Invariant B). The + * wall-clock budget {@code reconnectMaxDurationMillis} also caps this + * capability-gap loop; whichever is hit first triggers escalation. Both + * halves of the budget measure a capability-gap episode: the + * wall clock accumulates only across uninterrupted gap-to-gap intervals + * (never before the first gap is observed, and never across an + * intervening transport window -- an unreachable cluster is not + * "failing to settle"), and an intervening role reject restarts the + * episode -- it proves the topology changed, so the next capability-gap + * error is a fresh episode against a newly promoted node. 16 + * attempts gives the cluster room to settle through a rolling upgrade + * (each attempt walks every endpoint internally) without letting a genuine + * cluster-wide misconfig hang the drainer forever. */ public static final int DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS = 16; private static final Logger LOG = LoggerFactory.getLogger(BackgroundDrainer.class); /** How often to wake and re-check ackedFsn vs target. */ private static final long POLL_NANOS = 50_000_000L; // 50 ms + /** + * Upper bound on a single backoff park so {@link #requestStop()} is + * honored promptly even without the unpark (e.g. a permit consumed by + * an earlier spurious wakeup). Keeps the pool's post-stop grace window + * ({@code BackgroundDrainerPool.STOP_GRACE_MILLIS}) meaningful: a + * stopping drainer wakes at least every 50ms to re-check the flag. + */ + private static final long STOP_CHECK_PARK_CHUNK_NANOS = 50_000_000L; // 50 ms private final CursorWebSocketSendLoop.ReconnectFactory clientFactory; private final long durableAckKeepaliveIntervalMillis; private final long reconnectInitialBackoffMillis; @@ -92,6 +118,13 @@ public final class BackgroundDrainer implements Runnable { */ private volatile BackgroundDrainerListener listener; private volatile DrainOutcome outcome = DrainOutcome.PENDING; + /** + * Thread currently executing {@link #run()} (or a direct + * {@link #connectWithDurableAckRetry()} call from tests). Lets + * {@link #requestStop()} unpark a drainer sleeping in a backoff or + * poll park instead of waiting for the park to elapse. + */ + private volatile Thread runnerThread; private volatile boolean stopRequested; public BackgroundDrainer( @@ -129,7 +162,10 @@ public BackgroundDrainer() { } /** - * Initial connect with retry on whole-cluster durable-ack unavailability. + * Budgeted connect with retry on whole-cluster durable-ack unavailability: + * the initial connect, and re-entered from {@link #run()} whenever a + * mid-drain reconnect sweep hits the same capability gap (each re-entry + * is a fresh episode -- a successful connect ended the previous one). * The wrapped {@code clientFactory.reconnect()} already walks every * configured endpoint per attempt and only throws * {@link QwpDurableAckMismatchException} when none of them advertise @@ -146,36 +182,137 @@ public BackgroundDrainer() { * budget, the drainer drops a {@code .failed} sentinel and exits * exactly as the original single-shot path did. *

    - * Other exceptions (auth failure, version mismatch, transport error, - * etc.) preserve the original behavior: mark failed, exit. They are - * either terminal in their own right or already retried inside - * {@code reconnect()}. + * The budget measures a capability-gap episode: consecutive + * {@link QwpDurableAckMismatchException} sweeps only. Transient + * conditions -- an all-replica failover window (role reject) or a + * transport error -- are retried indefinitely (Invariant B) and never + * consume the budget: the wall-clock half accumulates only across + * uninterrupted gap-to-gap intervals, so a mid-episode transport window + * pauses the clock (without touching the attempt count), and a role + * reject additionally restarts the episode, because it proves the + * topology changed under the rolling upgrade. + * Genuine terminals (auth failure, non-421 upgrade reject) preserve + * the original behavior: mark failed, exit. * * @return a fresh durable-ack-capable client, or {@code null} if * {@link #outcome} has been set to FAILED or STOPPED */ @TestOnly public WebSocketClient connectWithDurableAckRetry() { - long startNanos = System.nanoTime(); - long deadlineNanos = startNanos + reconnectMaxDurationMillis * 1_000_000L; + // run() already set runnerThread; setting it again here is a no-op + // on that path but wires up direct @TestOnly calls so requestStop() + // can unpark them too. + runnerThread = Thread.currentThread(); long backoffMillis = reconnectInitialBackoffMillis; - int mismatchAttempts = 0; + // Capability-gap settle budget. Counts ONLY consecutive + // QwpDurableAckMismatchException sweeps; the wall-clock half + // accumulates ONLY across uninterrupted gap-to-gap intervals, so + // transient churn (role reject, transport) can never burn the budget + // -- neither before the first gap is observed nor mid-episode (a + // cluster unreachable for longer than the whole budget that comes + // back still gapped has consumed none of it). An intervening role + // reject resets the episode (topology churn: the offending node is + // gone); a transport error neither increments nor resets the attempt + // count -- a dropped socket does not prove promotion churn, and + // resetting on it would let a flaky-but-misconfigured cluster evade + // the cap forever -- it only pauses the wall clock: the gap-to-gap + // interval spanning the transport window is not charged. + int capabilityGapAttempts = 0; + // Wall-clock time accumulated across uninterrupted gap-to-gap + // intervals of the current episode; escalates once it reaches + // capabilityGapBudgetNanos (or the attempt cap fires first). + long capabilityGapElapsedNanos = 0L; + // Timestamp of the previous capability-gap sweep; 0 = the next gap + // charges nothing (episode start, post-role-reject restart, or the + // interval was interrupted by a transport window). + long lastCapabilityGapNanos = 0L; + final long capabilityGapBudgetNanos = reconnectMaxDurationMillis * 1_000_000L; + // Observability-only counter for the transient all-replica window; + // never consulted for escalation (Invariant B). + int roleRejectAttempts = 0; + // Throttle the all-replica retry WARN to one per 5s: a real failover + // window can last minutes and (Invariant B) is retried indefinitely, so + // per-attempt logging would flood. Mirrors CursorWebSocketSendLoop. + long lastReplicaWarnNanos = 0L; + long lastTransportWarnNanos = 0L; while (!stopRequested) { + // True only for a genuine durable-ack CAPABILITY gap, which is + // bounded by the settle budget / attempt cap. A transient all-replica + // failover window (role reject) is retried indefinitely under + // Invariant B and leaves this false, so its backoff is never clamped + // to the deadline (which would otherwise busy-loop once past it). + boolean boundedByBudget = false; try { return clientFactory.reconnect(); + } catch (QwpAuthFailedException | WebSocketUpgradeException e) { + // Genuinely non-retriable across the cluster (auth 401/403, or a + // non-421 upgrade reject): waiting will not fix it, so quarantine + // immediately -- exactly as the live sender's background loop + // (CursorWebSocketSendLoop.connectLoop) halts on these errors. + String msg = e.getMessage(); + LOG.error("drainer terminal upgrade/auth error for slot {}: {}", slotPath, msg); + lastErrorMessage = msg; + OrphanScanner.markFailed(slotPath, "auth/upgrade: " + msg); + outcome = DrainOutcome.FAILED; + return null; + } catch (QwpRoleMismatchException | QwpIngressRoleRejectedException e) { + // INVARIANT B: every reachable endpoint is a REPLICA right now. + // A replica is promotable and a primary will reappear, so this is + // a TRANSIENT failover window, NOT a capability gap. The drainer + // must keep retrying (capped backoff) until a primary is reachable, + // stopRequested, or SF exhaustion -- it must NEVER quarantine the + // slot on a wall-clock budget or an attempt cap. Surface the + // per-attempt observability callback, then back off and retry. + roleRejectAttempts++; + // Topology is mid-churn: whatever node produced any earlier + // capability-gap errors is no longer the primary the next + // sweep hits, so the gap episode (attempts + wall clock) + // restarts and the next gap gets the full settle budget. + capabilityGapAttempts = 0; + capabilityGapElapsedNanos = 0L; + lastCapabilityGapNanos = 0L; + BackgroundDrainerListener l = listener; + if (l != null) { + try { + l.onPrimaryUnavailable(slotPath, roleRejectAttempts); + } catch (Throwable cb) { + LOG.warn("drainer listener onPrimaryUnavailable threw: {}", + cb.getMessage()); + } + } + long nowWarn = System.nanoTime(); + if (nowWarn - lastReplicaWarnNanos >= 5_000_000_000L) { + LOG.warn("drainer slot {} attempt {}: all endpoints are replicas " + + "(transient failover window), retrying after backoff", + slotPath, roleRejectAttempts); + lastReplicaWarnNanos = nowWarn; + } } catch (QwpDurableAckMismatchException e) { - mismatchAttempts++; + // Genuine cluster-wide durable-ack CAPABILITY gap: a server + // upgraded but does not advertise durable ack. Unlike a role + // reject this will not clear by waiting for a promotion, so it + // stays terminal for the drainer -- give the cluster a bounded + // settle budget (rolling upgrade), then quarantine the slot. + capabilityGapAttempts++; long now = System.nanoTime(); - long elapsedMs = (now - startNanos) / 1_000_000L; - boolean exhausted = mismatchAttempts >= DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS - || now >= deadlineNanos; + if (lastCapabilityGapNanos != 0L) { + // Charge only the interval since the PREVIOUS gap sweep, + // and only when no transient error interrupted it. Time + // spent in a transient window -- before the first gap or + // between two gaps -- is never charged to the episode. + capabilityGapElapsedNanos += now - lastCapabilityGapNanos; + } + lastCapabilityGapNanos = now; + long elapsedMs = capabilityGapElapsedNanos / 1_000_000L; + boolean exhausted = capabilityGapAttempts >= DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS + || capabilityGapElapsedNanos >= capabilityGapBudgetNanos; BackgroundDrainerListener l = listener; if (exhausted) { LOG.error("drainer giving up on slot {} after {} durable-ack-mismatch attempts ({}ms): {}", - slotPath, mismatchAttempts, elapsedMs, e.getMessage()); + slotPath, capabilityGapAttempts, elapsedMs, e.getMessage()); if (l != null) { try { - l.onDurableAckPersistentFailure(slotPath, mismatchAttempts, elapsedMs); + l.onDurableAckPersistentFailure(slotPath, capabilityGapAttempts, elapsedMs); } catch (Throwable cb) { LOG.warn("drainer listener onDurableAckPersistentFailure threw: {}", cb.getMessage()); @@ -184,36 +321,95 @@ public WebSocketClient connectWithDurableAckRetry() { lastErrorMessage = e.getMessage(); OrphanScanner.markFailed(slotPath, "durable-ack persistently unavailable after " - + mismatchAttempts + " attempts: " + e.getMessage()); + + capabilityGapAttempts + " attempts: " + e.getMessage()); outcome = DrainOutcome.FAILED; return null; } + boundedByBudget = true; if (l != null) { try { - l.onDurableAckUnavailable(slotPath, mismatchAttempts); + l.onDurableAckUnavailable(slotPath, capabilityGapAttempts); } catch (Throwable cb) { LOG.warn("drainer listener onDurableAckUnavailable threw: {}", cb.getMessage()); } } LOG.warn("drainer slot {} attempt {}: durable-ack unavailable, retrying after backoff", - slotPath, mismatchAttempts); + slotPath, capabilityGapAttempts); } catch (Throwable t) { - String msg = t.getMessage(); - LOG.error("drainer initial connect failed for slot {}: {}", slotPath, msg); - lastErrorMessage = msg; - OrphanScanner.markFailed(slotPath, "initial connect: " + msg); - outcome = DrainOutcome.FAILED; - return null; + if (t instanceof Error) { + // java.lang.Error (OOM, LinkageError, StackOverflowError) + // is a JVM/programming failure, not a transport outage: + // retrying cannot clear it, and spinning here would pin + // the slot .lock forever with no .failed sentinel and only + // a throttled, possibly-null-message WARN as a trace. + // Rethrow: run()'s outer catch quarantines the slot + // (markFailed + FAILED) and its finally releases the lock + // -- quarantine-and-exit, exactly as genuine terminals do. + throw (Error) t; + } + // INVARIANT B: a transport failure -- the whole cluster is + // unreachable right now (server down, network partition) -- is + // TRANSIENT, exactly as the live sender's background loop treats + // it. The server will come back; keep retrying (capped backoff) + // until it does, stopRequested, or SF exhaustion. NEVER quarantine + // the slot on a transport error. Genuine terminals (auth / + // non-421 upgrade / durable-ack capability gap) are handled by the + // catches above and still fail fast. A QWP version mismatch also + // reaches here (it extends HttpClientException, not + // WebSocketUpgradeException) and is intentionally retried under + // Invariant B -- but it is NOT a transport outage, so log it + // truthfully below rather than mislabelling it "cluster unreachable". + lastErrorMessage = t.getMessage(); + // Pause the episode wall clock: the gap-to-gap interval this + // window interrupts is never charged. Attempts and elapsed + // already accumulated are preserved (anti-evasion: see the + // budget comment above). + lastCapabilityGapNanos = 0L; + long nowWarn = System.nanoTime(); + if (nowWarn - lastTransportWarnNanos >= 5_000_000_000L) { + if (t instanceof QwpVersionMismatchException) { + // The cluster IS reachable: every endpoint completed the + // WebSocket upgrade but advertised a QWP protocol version + // this client cannot speak. A rolling upgrade clears this + // once peers converge, so Invariant B keeps retrying -- but + // if it persists the client binary is version-incompatible + // with the whole cluster and an operator must intervene + // (upgrade the client or the servers). Name the real + // condition so it is diagnosable, not hidden behind a + // network-outage message. + LOG.warn("drainer slot {}: every reachable endpoint advertises an unsupported " + + "QWP protocol version ({}); retrying (rolling-upgrade window) -- " + + "if this persists the client is version-incompatible with the cluster", + slotPath, t.getMessage()); + } else { + LOG.warn("drainer slot {}: cluster unreachable ({}), retrying after backoff", + slotPath, t.getMessage()); + } + lastTransportWarnNanos = nowWarn; + } } // Backoff before the next sweep. Honor stopRequested by parking in // small chunks rather than a single long park so close() doesn't - // wait for a full sleep to elapse. + // wait for a full sleep to elapse. Only the bounded (capability-gap) + // path clamps to the remaining budget (the post-gap sleep is charged + // to the episode by the next gap sweep) so it escalates promptly once + // the accumulated gap-time runs out; the transient failover path + // retries indefinitely and just backs off (capped exponential), + // never busy-looping past an exhausted budget. long jitter = ThreadLocalRandom.current().nextLong(Math.max(1L, backoffMillis)); - long sleepMillis = Math.min(backoffMillis + jitter, - Math.max(0L, (deadlineNanos - System.nanoTime()) / 1_000_000L)); + long sleepMillis = backoffMillis + jitter; + if (boundedByBudget) { + sleepMillis = Math.min(sleepMillis, + Math.max(0L, (capabilityGapBudgetNanos - capabilityGapElapsedNanos) / 1_000_000L)); + } if (sleepMillis > 0L && !stopRequested) { - LockSupport.parkNanos(sleepMillis * 1_000_000L); + long parkDeadlineNanos = System.nanoTime() + sleepMillis * 1_000_000L; + long remaining; + while (!stopRequested + && (remaining = parkDeadlineNanos - System.nanoTime()) > 0L) { + LockSupport.parkNanos(Math.min(remaining, STOP_CHECK_PARK_CHUNK_NANOS)); + } } backoffMillis = Math.min(backoffMillis * 2L, reconnectMaxBackoffMillis); } @@ -240,10 +436,24 @@ public DrainOutcome outcome() { public void requestStop() { stopRequested = true; + // Wake the drainer out of any backoff/poll park immediately so the + // pool's bounded stop-grace window is spent unwinding (release slot + // lock, close engine), not sleeping out the remainder of a capped + // exponential backoff. + Thread t = runnerThread; + if (t != null) { + LockSupport.unpark(t); + } + } + + /** True once {@link #requestStop()} has been called. */ + public boolean isStopRequested() { + return stopRequested; } @Override public void run() { + runnerThread = Thread.currentThread(); CursorSendEngine engine = null; WebSocketClient client = null; CursorWebSocketSendLoop loop = null; @@ -278,37 +488,92 @@ public void run() { // already dropped on the FAILED path. return; } - loop = new CursorWebSocketSendLoop( - client, engine, - 0L, CursorWebSocketSendLoop.DEFAULT_PARK_NANOS, - clientFactory, - reconnectMaxDurationMillis, - reconnectInitialBackoffMillis, - reconnectMaxBackoffMillis, - requestDurableAck, - durableAckKeepaliveIntervalMillis); - loop.start(); - + // One iteration per wire session. Re-entered ONLY when a mid-drain + // reconnect sweep hit a durable-ack CAPABILITY gap: that is the + // exact rolling-upgrade condition the settle budget in + // connectWithDurableAckRetry() exists for, so it must not + // quarantine on the first sweep the way the initial-connect path + // never does. The engine stays alive across sessions (it holds the + // slot lock; only loop + client are recycled), and target remains + // valid -- the slot is orphaned, nothing appends to it. + drain: while (!stopRequested) { - long acked = engine.ackedFsn(); - this.ackedFsn = acked; - if (acked >= target) { - outcome = DrainOutcome.SUCCESS; - LOG.info("drainer fully drained slot {} (target={}, acked={})", - slotPath, target, acked); - return; - } - try { - loop.checkError(); - } catch (Throwable t) { - String msg = t.getMessage(); - LOG.error("drainer wire error for slot {}: {}", slotPath, msg); - lastErrorMessage = msg; - OrphanScanner.markFailed(slotPath, "wire: " + msg); - outcome = DrainOutcome.FAILED; - return; + loop = new CursorWebSocketSendLoop( + client, engine, + 0L, CursorWebSocketSendLoop.DEFAULT_PARK_NANOS, + clientFactory, + reconnectMaxDurationMillis, + reconnectInitialBackoffMillis, + reconnectMaxBackoffMillis, + requestDurableAck, + durableAckKeepaliveIntervalMillis); + loop.start(); + + while (!stopRequested) { + long acked = engine.ackedFsn(); + this.ackedFsn = acked; + if (acked >= target) { + outcome = DrainOutcome.SUCCESS; + LOG.info("drainer fully drained slot {} (target={}, acked={})", + slotPath, target, acked); + return; + } + try { + loop.checkError(); + } catch (Throwable t) { + if (loop.capabilityGapTerminal() != null) { + // Capability gap mid-drain: recycle the wire, NOT + // the slot. connectWithDurableAckRetry() owns the + // episode budget (16 consecutive gap sweeps / + // wall clock) and drops the sentinel itself if the + // gap persists. The loop's own failed sweep is not + // counted toward the fresh episode -- an off-by-one + // that is immaterial at budget 16. + LOG.warn("drainer slot {}: durable-ack capability gap " + + "mid-drain ({}), re-entering settle budget", + slotPath, t.getMessage()); + try { + loop.close(); + } catch (Throwable closeFailure) { + // Interrupted shutdown mid-recycle (pool + // shutdownNow): the old I/O thread is still + // alive, so opening a new wire session against + // the same engine would race its exit — and + // closing the client under a possibly mid-send + // thread risks SEGV. Bail out; the finally + // re-runs loop.close(), which re-signals the + // failed stop and routes client/engine + // teardown to the delegation protocol there. + LOG.warn("drainer slot {}: stop requested mid-recycle and the " + + "I/O thread did not stop ({}); abandoning recycle", + slotPath, closeFailure.getMessage()); + outcome = stopRequested ? DrainOutcome.STOPPED : DrainOutcome.FAILED; + return; + } + try { + client.close(); + } catch (Throwable ignored) { + } + loop = null; + client = connectWithDurableAckRetry(); + if (client == null) { + // outcome already set (FAILED after budget + // exhaustion, or STOPPED); sentinel handled. + return; + } + continue drain; + } + String msg = t.getMessage(); + LOG.error("drainer wire error for slot {}: {}", slotPath, msg); + lastErrorMessage = msg; + OrphanScanner.markFailed(slotPath, "wire: " + msg); + outcome = DrainOutcome.FAILED; + return; + } + java.util.concurrent.locks.LockSupport.parkNanos(POLL_NANOS); } - java.util.concurrent.locks.LockSupport.parkNanos(POLL_NANOS); + // Inner loop exits only on stopRequested; fall through to the + // outer condition, which is false for the same reason. } outcome = DrainOutcome.STOPPED; } catch (Throwable t) { @@ -333,25 +598,56 @@ public void run() { } outcome = DrainOutcome.FAILED; } finally { + boolean ioThreadStopped = true; if (loop != null) { try { loop.close(); - } catch (Throwable ignored) { + } catch (Throwable e) { + // The loop's I/O thread would not stop — close() was + // interrupted (the pool's shutdownNow path) while the + // thread sat in a blocking native connect/send that + // neither unpark nor interrupt cancels. Freeing the + // client's buffers or unmapping the engine now would + // race the live thread (C5 SEGV); both are delegated to + // the thread's own exit path below. + ioThreadStopped = false; + LOG.warn("drainer slot {}: I/O thread did not stop during close ({}); " + + "delegating client/engine teardown to its exit path", + slotPath, e.getMessage()); } } - if (client != null) { + if (client != null && ioThreadStopped) { + // Skipped on a failed stop: the thread may be mid-send on + // this very client; ioLoop's finally closes the loop's + // current client (this one, unless a reconnect swapped it — + // in which case swapClient already closed this reference). try { client.close(); } catch (Throwable ignored) { } } if (engine != null) { - try { - // engine.close() releases the slot lock too. - engine.close(); - } catch (Throwable ignored) { + // Failed-stop hand-off: delegateEngineClose() makes the I/O + // thread run engine.close() strictly after its last engine + // access, releasing the slot lock as soon as the stuck wire + // call resolves — deferred teardown, never abandoned. The + // false return covers the race where the thread exited + // between the failed close() and now: then it is safe (and + // necessary) to close the engine here. + if (ioThreadStopped || !loop.delegateEngineClose()) { + try { + // engine.close() releases the slot lock too. + engine.close(); + } catch (Throwable ignored) { + } + } else { + LOG.warn("drainer slot {}: engine close delegated to the I/O thread; " + + "slot lock releases when it exits", slotPath); } } + // Don't let a later requestStop() unpark an unrelated task that + // the pool's executor may have scheduled onto this same thread. + runnerThread = null; } } diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerListener.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerListener.java index e5a298b9..55d1ae73 100644 --- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerListener.java +++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerListener.java @@ -43,27 +43,56 @@ public interface BackgroundDrainerListener { /** * Fired when the drainer has retried past its budget on consecutive - * durable-ack-unavailable failures. The drainer drops a {@code .failed} - * sentinel and exits. Treat as cluster-wide misconfiguration and - * surface to operators. + * durable-ack capability-gap failures. The drainer drops a + * {@code .failed} sentinel and exits. Treat as cluster-wide + * misconfiguration and surface to operators. * * @param slotPath slot the drainer was processing - * @param totalAttempts how many connect attempts hit the same failure - * @param elapsedMillis wall time spent on this failure mode + * @param totalAttempts capability-gap attempts in the final episode; + * transient sweeps (role reject, transport) are + * never counted + * @param elapsedMillis wall time of the final capability-gap episode, + * anchored at its first capability-gap error */ void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis); /** - * Fired when {@code clientFactory.reconnect()} threw - * {@code QwpDurableAckMismatchException} — i.e. every endpoint in the - * current sweep failed to advertise durable ack. The drainer will - * back off and retry; this callback is purely observability. Source - * data stays pinned regardless because the loop runs in + * Fired when a connect sweep hit a genuine durable-ack capability gap + * ({@code QwpDurableAckMismatchException}: an endpoint upgrades but does + * not advertise durable ack). The drainer will back off and retry within + * its settle budget; this callback is purely observability. Source data + * stays pinned regardless because the loop runs in * {@code durableAckMode=true} and only trims on STATUS_DURABLE_ACK. + * A transient all-replica failover window (role reject) never fires this + * callback — it is surfaced through {@link #onPrimaryUnavailable}. * * @param slotPath slot the drainer is processing - * @param attemptNumber 1-based count of consecutive durable-ack-unavailable - * failures for this drainer + * @param attemptNumber 1-based attempt number within the current + * capability-gap EPISODE. The counter restarts when + * an intervening role reject resets the episode — + * topology churn grants the next gap a fresh settle + * budget, which is correct behavior — and with the + * streams separated the reset's cause is visible as + * an {@link #onPrimaryUnavailable} delivery between + * the two episodes */ void onDurableAckUnavailable(String slotPath, int attemptNumber); + + /** + * Fired when a connect sweep found every reachable endpoint to be a + * REPLICA — a transient all-replica failover window (role reject). A + * replica is promotable and a primary will reappear, so the drainer + * retries indefinitely under Invariant B: this condition NEVER escalates + * and is never followed by {@link #onDurableAckPersistentFailure}. Runs + * on the drainer thread; implementations must not block. The no-op + * default keeps every implementor of the released 1.3.4 contract source- + * and binary-compatible. + * + * @param slotPath slot the drainer is processing + * @param attemptNumber 1-based running role-reject count within the + * current connect loop (resets across connect + * re-entries) + */ + default void onPrimaryUnavailable(String slotPath, int attemptNumber) { + } } diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerPool.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerPool.java index 458a4b9a..c3ca5dee 100644 --- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerPool.java +++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerPool.java @@ -46,10 +46,15 @@ * (no orphans submitted) costs one core thread; submitted-and-finished * drainers are GC'd after they complete. *

    - * Closing the pool requests every still-running drainer to stop and - * waits up to a few seconds for them to exit cleanly. Drainers that - * don't exit in time are left to finish on their own — the pool's - * underlying executor uses daemon threads so they don't block JVM exit. + * Closing the pool uses a split stop policy: drainers that never started + * draining (still inside their connect-retry loop — e.g. the cluster is + * unreachable) are stop-signaled immediately, because no grace window can + * help them finish; drainers actively replaying frames get a graceful + * window to reach {@code acked >= target} before being signaled. Drainers + * that don't exit in time (typically parked in a blocking native connect + * that neither unpark nor interrupt cancels) are left to finish on their + * own — the pool's underlying executor uses daemon threads so they don't + * block JVM exit. */ public final class BackgroundDrainerPool implements QuietCloseable { @@ -66,9 +71,12 @@ public final class BackgroundDrainerPool implements QuietCloseable { // either lands before close (and close waits for it to finish) or // sees the closed bit and throws. private static final int CLOSED_BIT = Integer.MIN_VALUE; - // Time we let drainers finish their drain naturally before signaling - // stop. awaitTermination returns as soon as the last drainer exits, - // so this only matters when something is genuinely stuck. + // Time we let ACTIVELY DRAINING drainers finish naturally before + // signaling stop. Connect-phase drainers are stop-signaled before this + // window even starts (see close()), so during an outage — when no + // drainer can be draining — close() does not pay this in full. + // awaitTermination returns as soon as the last drainer exits, so this + // only matters when something is genuinely stuck. private static final long GRACEFUL_DRAIN_MILLIS = 2_500L; private static final Logger LOG = LoggerFactory.getLogger(BackgroundDrainerPool.class); // After signaling stop, give drainers a brief window to unwind cleanly @@ -125,11 +133,33 @@ public void close() { while (state.get() != CLOSED_BIT) { Compat.onSpinWait(); } - // Reject new tasks but let in-flight drainers finish their drain - // naturally. Without this grace window a drainer that's seconds - // away from acked >= target gets requestStop()'d and exits as - // STOPPED — its engine.close() then sees fullyDrained=false and - // leaves the slot's .sfa files behind, defeating drain_orphans. + // Split stop policy. The graceful window below exists so a drainer + // that is seconds away from acked >= target is not cut down + // mid-drain (its engine.close() would see fullyDrained=false and + // leave the slot's .sfa files behind, defeating drain_orphans). A + // drainer that never started draining — still inside its + // connect-retry loop, e.g. the cluster is unreachable and + // Invariant B retries forever — cannot possibly use that window + // productively, so stop it NOW: it wakes from its backoff park + // within ~50ms (STOP_CHECK_PARK_CHUNK_NANOS) and exits as STOPPED, + // cutting close() latency during an outage from + // GRACEFUL_DRAIN_MILLIS + STOP_GRACE_MILLIS (~3s) to roughly one + // stop-check park chunk. ackedFsn stays -1 until the drain loop's + // first poll, so `< 0` discriminates "never connected/started + // draining" from "actively draining"; the moments-wide race with a + // just-connected drainer is benign — it exits as STOPPED and the + // slot is re-adopted by the next scan. A drainer parked inside a + // blocking native connect ignores the stop until its background + // connect deadline resolves; that one still burns the full grace + + // stop windows below and is then abandoned to exit on its own + // (daemon thread). + for (BackgroundDrainer d : active) { + if (d.outcome() == BackgroundDrainer.DrainOutcome.PENDING && d.getAckedFsn() < 0) { + d.requestStop(); + } + } + // Reject new tasks but let actively-draining drainers finish + // naturally. executor.shutdown(); try { if (!executor.awaitTermination(GRACEFUL_DRAIN_MILLIS, TimeUnit.MILLISECONDS)) { diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java index 94322e9c..8d0f71e5 100644 --- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java +++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java @@ -145,15 +145,30 @@ private CursorSendEngine(String sfDir, long segmentSizeBytes, SegmentManager man boolean memoryMode = sfDir == null; SlotLock acquiredLock = null; if (!memoryMode) { - if (sfDir.isEmpty()) { - throw new IllegalArgumentException("sfDir must not be empty"); + try { + if (sfDir.isEmpty()) { + throw new IllegalArgumentException("sfDir must not be empty"); + } + // Acquire the slot lock BEFORE we touch any *.sfa files. Two + // engines pointed at the same slot would otherwise race on + // recovery and create overlapping FSN ranges. SlotLock.acquire + // also creates the slot dir if it doesn't exist yet — no + // separate mkdir step needed here. + acquiredLock = SlotLock.acquire(sfDir); + } catch (Throwable t) { + // The delegating constructors evaluate `new SegmentManager(...)` + // BEFORE this body runs, so on a pre-try throw (e.g. slot lock + // collision) an owned manager is already alive and would leak + // its native path-scratch sink -- 256 bytes per failed + // construction attempt. Close it before propagating. + if (ownsManager) { + try { + manager.close(); + } catch (Throwable ignored) { + } + } + throw t; } - // Acquire the slot lock BEFORE we touch any *.sfa files. Two - // engines pointed at the same slot would otherwise race on - // recovery and create overlapping FSN ranges. SlotLock.acquire - // also creates the slot dir if it doesn't exist yet — no - // separate mkdir step needed here. - acquiredLock = SlotLock.acquire(sfDir); } this.slotLock = acquiredLock; this.sfDir = sfDir; @@ -168,7 +183,6 @@ private CursorSendEngine(String sfDir, long segmentSizeBytes, SegmentManager man // reference instead of orphaning the mmap'd segments + fds. SegmentRing ringInProgress = null; AckWatermark watermarkInProgress = null; - boolean managerStarted = false; try { // Disk mode: try to recover any *.sfa files left behind by a prior // session before deciding to start fresh. Without this the engine @@ -277,7 +291,6 @@ private CursorSendEngine(String sfDir, long segmentSizeBytes, SegmentManager man if (ownsManager) { manager.start(); - managerStarted = true; } manager.register(ringInProgress, sfDir, watermarkInProgress); // All construction succeeded — commit the ring and @@ -288,7 +301,10 @@ private CursorSendEngine(String sfDir, long segmentSizeBytes, SegmentManager man // Stop an owned manager before freeing the ring and watermark it may // touch, then release the slot lock. Each cleanup is in its own // try/catch so a single failure doesn't strand later cleanups. - if (ownsManager && managerStarted) { + // Closing an owned-but-never-started manager is safe (no worker to + // join) and required: skipping it leaked the manager's native + // path-scratch sink whenever construction failed before start(). + if (ownsManager) { try { manager.close(); } catch (Throwable ignored) { diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java index 2003aa08..94f929f1 100644 --- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java +++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java @@ -35,6 +35,7 @@ import io.questdb.client.cutlass.qwp.client.QwpDurableAckMismatchException; import io.questdb.client.cutlass.qwp.client.QwpIngressRoleRejectedException; import io.questdb.client.cutlass.qwp.client.QwpRoleMismatchException; +import io.questdb.client.cutlass.qwp.client.QwpVersionMismatchException; import io.questdb.client.cutlass.qwp.client.WebSocketResponse; import io.questdb.client.cutlass.qwp.websocket.WebSocketCloseCode; import io.questdb.client.std.CharSequenceLongHashMap; @@ -61,10 +62,11 @@ * cumulative wire sequence {@code N}, calls * {@code engine.acknowledge(fsnAtZero + N)} so the segment manager * can trim fully-acked segments. - *

  • On wire failure, runs the configured reconnect policy: backoff - * with jitter up to {@code reconnect_max_duration_millis}, with - * auth-style failures (401/403/non-101 upgrade reject) treated as - * terminal. On reconnect success, repositions the cursor at + *
  • On wire failure, runs the configured reconnect policy: capped + * exponential backoff with jitter, retried indefinitely (Invariant B -- + * a store-and-forward drainer never gives up on a wall-clock budget), + * with only auth-style failures (401/403/non-101 upgrade reject) treated + * as terminal. On reconnect success, repositions the cursor at * {@code ackedFsn+1} and replays.
  • * * No locks on the steady-state path. The producer thread (user) writes @@ -140,6 +142,11 @@ public final class CursorWebSocketSendLoop implements QuietCloseable { private final ReconnectFactory reconnectFactory; private final long reconnectInitialBackoffMillis; private final long reconnectMaxBackoffMillis; + // Retained for constructor symmetry and passed in by callers, but NOT + // consulted by the background loop: Invariant B removed the wall-clock + // give-up from connectLoop. The budget still bounds the blocking (non-lazy) + // initial connect via QwpWebSocketSender -> connectWithRetry, which takes it + // as an explicit argument rather than reading this field. private final long reconnectMaxDurationMillis; private final WebSocketResponse response = new WebSocketResponse(); private final ResponseHandler responseHandler = new ResponseHandler(); @@ -171,11 +178,11 @@ public final class CursorWebSocketSendLoop implements QuietCloseable { // alike) is offered to the dispatcher for async delivery to the user's // handler. Null disables async delivery entirely; the producer-side // typed-throw path is unaffected. - // Optional: when non-null, RECONNECT_BUDGET_EXHAUSTED is offered to the - // dispatcher for async delivery to the user's listener at the moment - // connectLoop gives up. Sender-side fire points (CONNECTED, FAILED_OVER, - // ENDPOINT_ATTEMPT_FAILED, AUTH_FAILED, ALL_ENDPOINTS_UNREACHABLE) write - // directly to the same dispatcher from QwpWebSocketSender. + // Optional: when non-null, sender-side connection events (CONNECTED, + // FAILED_OVER, ENDPOINT_ATTEMPT_FAILED, AUTH_FAILED, ALL_ENDPOINTS_UNREACHABLE) + // are written to this dispatcher from QwpWebSocketSender. connectLoop itself + // no longer emits a terminal budget-exhaustion event (Invariant B: it retries + // indefinitely and never gives up on a wall-clock budget). private volatile SenderConnectionDispatcher connectionDispatcher; private volatile SenderErrorDispatcher errorDispatcher; // The send cursor has two coordinate systems: @@ -194,11 +201,28 @@ public final class CursorWebSocketSendLoop implements QuietCloseable { // Sticky flag: false until the very first time a live client is installed // (either via the constructor in SYNC/OFF mode or via swapClient on a // successful connect attempt in any mode). Once true, stays true. Used to - // distinguish "never reached the server" budget exhaustion (looks like a + // distinguish a "never reached the server" terminal failure (looks like a // config typo or firewall block) from "lost connection after we were // up" (looks transient). private volatile boolean hasEverConnected; private volatile Thread ioThread; + // Typed marker for a durable-ack CAPABILITY-GAP terminal: set (before the + // terminalError latch, so a checkError() caller that observes the latch is + // guaranteed to observe this marker too) when a reconnect sweep threw + // QwpDurableAckMismatchException. The orphan drainer consults it to route + // a mid-drain capability gap into its budgeted settle-retry + // (BackgroundDrainer.connectWithDurableAckRetry) instead of quarantining + // the slot on the first sweep; the foreground sender ignores it and keeps + // its spec'd loud-fail (sf-client.md section 8.1). Write-once alongside + // terminalError: the only writer runs on the I/O thread under the same + // first-writer-wins latch. + private volatile QwpDurableAckMismatchException capabilityGapTerminal; + // Failed-stop hand-off flag: set by delegateEngineClose() when an owner's + // close() could not stop the I/O thread and the engine close is therefore + // performed by the I/O thread's exit path. Write-once, owner thread only; + // read by the I/O thread strictly after its shutdown-latch countdown (see + // the handshake contract on delegateEngineClose). + private volatile boolean engineCloseDelegated; // The latched terminal failure — THE exception every checkError() call // rethrows. Write-once for the loop's lifetime: the only writer is // recordFatal on the I/O thread (first-writer-wins). The whole @@ -249,9 +273,10 @@ public final class CursorWebSocketSendLoop implements QuietCloseable { * {@code client} may be {@code null} only if {@code reconnectFactory} * is non-null — this is the async-initial-connect path: the I/O thread * runs the same retry loop on its first iteration to obtain a live - * client, and a terminal failure (auth/upgrade reject or budget - * exhaustion) is delivered through the dispatcher rather than thrown - * to the constructor's caller. + * client, and a terminal failure (auth/upgrade reject) is delivered + * through the dispatcher rather than thrown to the constructor's + * caller; plain connect failures are retried indefinitely + * (Invariant B: no wall-clock budget give-up). */ public CursorWebSocketSendLoop(WebSocketClient client, CursorSendEngine engine, long fsnAtZero, long parkNanos, @@ -349,11 +374,13 @@ public static SenderError.Category classify(byte status) { } /** - * Same retry-with-exponential-backoff-and-jitter loop the I/O thread - * uses on a wire failure, but reusable from {@code ensureConnected} to - * implement {@code initial_connect_retry=true}. Returns the connected - * client on success; throws on terminal upgrade error (won't retry) or - * budget exhaustion. + * Same exponential-backoff-with-jitter machinery as the I/O thread's + * {@code connectLoop}, but reusable from {@code ensureConnected} to + * implement {@code initial_connect_retry=true}. Unlike {@code connectLoop} + * (which retries indefinitely under Invariant B), this blocking variant + * IS bounded by {@code maxDurationMillis}: it returns the connected + * client on success and throws on terminal upgrade error (won't retry) + * or budget exhaustion. *

    * Caller-supplied {@code factory} is invoked once per attempt and * should produce a fresh, connected, upgraded client (or throw). The @@ -399,11 +426,28 @@ public static WebSocketClient connectWithRetry( contextLabel, e.getMessage()); throw e; } catch (Throwable e) { + if (e instanceof Error) { + // JVM/programming failure (OOM, LinkageError): not a + // transport outage, retrying cannot clear it. Propagate + // to the caller instead of burning the connect budget. + throw (Error) e; + } lastError = e; long now = System.nanoTime(); if (now - lastLogNanos >= RECONNECT_LOG_THROTTLE_NANOS) { - LOG.warn("{} attempt {} failed: {}", - contextLabel, attempts, e.getMessage()); + if (e instanceof QwpVersionMismatchException) { + // Reachable but protocol-incompatible: consumes the connect + // budget (walks the cluster across a rolling-upgrade window) + // and, on exhaustion, surfaces as the terminal + // LineSenderException below. Name the condition so a version + // skew is diagnosable, not read as a generic connect failure. + LOG.warn("{} attempt {}: every reachable endpoint advertises an unsupported " + + "QWP protocol version ({}); retrying within connect budget", + contextLabel, attempts, e.getMessage()); + } else { + LOG.warn("{} attempt {} failed: {}", + contextLabel, attempts, e.getMessage()); + } lastLogNanos = now; } } @@ -487,6 +531,23 @@ public void checkError() { } } + /** + * The typed durable-ack capability-gap terminal, or {@code null} if the + * loop's terminal (if any) is a different failure class. Non-null only + * after {@link #checkError()} started throwing: the marker is written + * before the {@code terminalError} latch, both on the I/O thread. + *

    + * Consumer contract: the orphan drainer ({@code BackgroundDrainer}) + * checks this after a {@code checkError()} throw to decide between + * re-entering its budgeted settle-retry (capability gap: the rolling + * upgrade may still settle) and quarantining the slot (every other + * terminal). Package-private on purpose -- the foreground sender must + * not branch on it (spec'd loud-fail, sf-client.md section 8.1). + */ + QwpDurableAckMismatchException capabilityGapTerminal() { + return capabilityGapTerminal; + } + /** * Safety-net variant of {@link #checkError()} for * {@code QwpWebSocketSender.close()}: rethrows the latched terminal error @@ -524,8 +585,31 @@ public synchronized void close() { if (t.isAlive()) { try { shutdownLatch.await(); - } catch (InterruptedException ignored) { + } catch (InterruptedException e) { + // Re-assert the flag for the caller's stack, then decide. + // If the I/O thread has genuinely not exited (latch still + // up — it may be inside a blocking native connect/send + // that neither unpark nor interrupt cancels), touching the + // client here would free native buffers under a possibly + // mid-send thread, and returning quietly would let the + // owner unmap the engine under it (C5 SEGV). Signal the + // failed stop loudly instead: QwpWebSocketSender.close() + // keys its ioThreadStopped guard on this throw, and + // BackgroundDrainer switches to delegateEngineClose(). + // The I/O thread's own exit path (ioLoop's finally) + // disposes of the client either way. ioThread stays set, + // so a duplicate close() re-signals rather than silently + // succeeding against a still-live thread. Thread.currentThread().interrupt(); + if (shutdownLatch.getCount() != 0L) { + throw new LineSenderException( + "cursor I/O thread did not stop: close() was interrupted " + + "while awaiting shutdown; client/engine teardown " + + "is delegated to the I/O thread's exit path"); + } + // Latch hit zero concurrently with the interrupt: the + // thread is past its last client/engine access — proceed + // with normal teardown. } } ioThread = null; @@ -534,9 +618,11 @@ public synchronized void close() { // replaced the original (and closed it); the owner only retains // the stale pre-reconnect reference. Without closing the live // client here, its native socket and fds leak past sender.close() - // every time the loop reconnected at least once. close() is - // idempotent, so the owner's duplicate close on its stale - // reference is still safe. + // every time the loop reconnected at least once. ioLoop's finally + // also closes the current client on I/O-thread exit, so this read + // matters chiefly when the loop never started (SYNC construction, + // close() before start()) — and doubles as a safety net. close() + // is idempotent, so duplicate closes on any path are safe. WebSocketClient c = client; if (c != null) { try { @@ -548,6 +634,34 @@ public synchronized void close() { } } + /** + * Failed-stop hand-off for the engine. Called by an owner whose + * {@link #close()} threw because the I/O thread would not stop: the owner + * must not free the engine (munmap/Unsafe.free of segment memory) while + * the thread may still touch it with raw {@code Unsafe} reads. Setting + * the delegation flag makes the I/O thread run {@code engine.close()} on + * its exit path, strictly after its last engine access and after the + * shutdown-latch countdown — releasing the slot lock as soon as the + * stuck wire call resolves (bounded by OS timeouts) instead of leaking + * the mapping and lock forever. + *

    + * Returns {@code true} when the I/O thread is still live and has adopted + * the engine close; {@code false} when the thread has already exited — + * the caller must close the engine itself. + *

    + * Memory model — the classic store/load handshake: this method writes the + * volatile flag, then reads the latch count; the exit path counts the + * latch down, then reads the flag. Under the sequential consistency of + * volatile (and AQS latch state) accesses, if this method observes the + * latch still up, the exit path is guaranteed to observe the flag — no + * missed close. If both sides act, {@link CursorSendEngine#close()} is + * synchronized and idempotent, so the double close is benign. + */ + public boolean delegateEngineClose() { + engineCloseDelegated = true; + return shutdownLatch.getCount() != 0L; + } + /** * Typed server-rejection payload of the latched terminal error, or * {@code null} when the loop latched a wire-level failure (or nothing). @@ -647,7 +761,7 @@ public long getTotalServerErrors() { * True iff the I/O loop has at least once installed a live (connected * + upgraded) WebSocket client. Sticky — once true, stays true even * after a subsequent disconnect. Lets a {@code SenderErrorHandler} - * disambiguate a "never reached the server" budget exhaustion (likely + * disambiguate a "never reached the server" terminal failure (likely * a config typo or firewall block) from a "lost connection after we * were up" failure (likely transient). */ @@ -661,10 +775,10 @@ public boolean isRunning() { /** * Plug an async-delivery sink for {@link SenderConnectionEvent} - * notifications. The loop fires {@code RECONNECT_BUDGET_EXHAUSTED} - * through this sink when {@code connectLoop} gives up; other connection - * events fire from {@code QwpWebSocketSender.buildAndConnect} directly - * into the same dispatcher. Same lifecycle contract as + * notifications. Connection events fire from + * {@code QwpWebSocketSender.buildAndConnect} directly into this dispatcher; + * {@code connectLoop} no longer emits a terminal budget-exhaustion event + * (Invariant B: it retries indefinitely). Same lifecycle contract as * {@link #setErrorDispatcher}. */ public void setConnectionDispatcher(SenderConnectionDispatcher dispatcher) { @@ -786,8 +900,9 @@ private void applyDurableAck() { * Drives the very first connect attempt on the I/O thread, used in the * async-initial-connect mode (constructed with {@code client == null}). * Reuses the same retry+backoff machinery as {@link #fail(Throwable)} — - * a terminal upgrade reject or budget exhaustion is delivered through - * the dispatcher, not thrown to the producer. + * connect failures are retried indefinitely (Invariant B), and a + * terminal upgrade reject is delivered through the dispatcher, not + * thrown to the producer. */ private void attemptInitialConnect() { connectLoop(new LineSenderException( @@ -824,17 +939,48 @@ private void connectLoop(Throwable initial, String phase) { LOG.warn("cursor I/O loop entering {} loop: {}", phase, initial.getMessage()); long outageStartNanos = System.nanoTime(); - long deadlineNanos = outageStartNanos + reconnectMaxDurationMillis * 1_000_000L; + // INVARIANT B: a store-and-forward drainer must NEVER terminate on a + // wall-clock reconnect budget. A replica-only / all-endpoints-replica + // window is TRANSIENT -- a replica gets promoted, a primary reappears -- + // so this background loop retries for as long as it is running, backing + // off between attempts. The ONLY terminal conditions are a genuinely + // non-retriable upgrade (auth / non-421 upgrade / durable-ack capability + // gap), which return directly below, or the sender being stopped. SF + // exhaustion is surfaced to the PRODUCER as append backpressure, never + // here. reconnect_max_duration_millis is intentionally NOT consulted: it + // bounds only the blocking (non-lazy) initial connect in + // QwpWebSocketSender.buildAndConnect, never this background loop. long backoffMillis = reconnectInitialBackoffMillis; int attempts = 0; long lastLogNanos = 0L; Throwable lastReconnectError = initial; - while (running && System.nanoTime() < deadlineNanos) { + while (running) { attempts++; totalReconnectAttempts.incrementAndGet(); try { WebSocketClient newClient = reconnectFactory.reconnect(); if (newClient != null) { + if (!running) { + // close() ran while this connect attempt was in + // flight. Its latch await may have been interrupted + // (BackgroundDrainerPool.close()'s shutdownNow path) + // and returned already — the owner's teardown, + // including the engine unmap in BackgroundDrainer's + // finally, can be complete. Installing the client now + // would (a) touch engine memory via positionCursorAt + // after a possible unmap and (b) abandon a live socket + // in a loop nothing will revisit — close() has run, + // its client read saw the pre-connect field. The + // attempt owns the client until it is installed, so + // dispose of it here, on the I/O thread, and exit + // through the quiet stopped path below. + try { + newClient.close(); + } catch (Throwable ignored) { + // best-effort + } + break; + } swapClient(newClient); totalReconnects.incrementAndGet(); long elapsedMs = (System.nanoTime() - outageStartNanos) / 1_000_000L; @@ -879,6 +1025,13 @@ private void connectLoop(Throwable initial, String phase) { // not SECURITY_ERROR -- this is not an auth failure. LOG.error("durable-ack mismatch during {} -- won't retry: {}", phase, e.getMessage()); + if (terminalError == null) { + // Mirror recordFatal's first-writer-wins latch: only the + // sweep that owns the terminal may mark the gap, and the + // marker must be visible before the terminalError volatile + // write that checkError() keys on. + capabilityGapTerminal = e; + } long fromFsn = engine.ackedFsn() + 1L; long toFsn = Math.max(fromFsn, engine.publishedFsn()); SenderError err = new SenderError( @@ -897,100 +1050,81 @@ private void connectLoop(Throwable initial, String phase) { dispatchError(err); return; } catch (QwpRoleMismatchException | QwpIngressRoleRejectedException e) { - // Role mismatch: cluster reconfigured during this connect, the - // previously-writable endpoint is now read-only. Reset backoff - // (don't double on each role reject -- failover usually clears - // within seconds) and park for the initial interval before the - // next attempt. - backoffMillis = reconnectInitialBackoffMillis; + // Role mismatch: every reachable endpoint role-rejected the + // upgrade -- right now they are all replicas / primary-catchup. + // This is a TRANSIENT failover window (a replica is promotable), + // so keep retrying with no wall-clock deadline (Invariant B). + // Do NOT reset the backoff or pin it at the initial interval: + // fall through to the shared capped exponential backoff-with- + // jitter block below. Pinning at reconnectInitialBackoffMillis + // turned a persistent all-replica window (e.g. an address list + // pointing at replicas only, now surfaced here as a retriable + // role reject rather than a terminal durable-ack mismatch) into + // a fixed ~10/s storm of fresh TLS handshakes -- new + // WebSocketClient, new SSLContext, trust-store re-read -- per + // endpoint, forever. Growing to reconnectMaxBackoffMillis + // mirrors the orphan drainer's role-reject path and honours the + // documented capped-exponential-backoff contract. lastReconnectError = e; - if (running) { - long remainingNanos = deadlineNanos - System.nanoTime(); - if (remainingNanos <= 0L) { - break; - } - long parkNanos = Math.min(reconnectInitialBackoffMillis * 1_000_000L, remainingNanos); - LockSupport.parkNanos(parkNanos); + long now = System.nanoTime(); + if (now - lastLogNanos >= RECONNECT_LOG_THROTTLE_NANOS) { + LOG.warn("{} attempt {}: every reachable endpoint is a replica " + + "(transient failover window); retrying with capped backoff -- " + + "if this persists the configured address list may point at replicas only", + phase, attempts); + lastLogNanos = now; } - continue; + // fall through to the shared capped-backoff block } catch (Throwable e) { + if (e instanceof Error) { + // JVM/programming failure (OOM, LinkageError): retrying + // cannot clear it -- Invariant B covers transport outages + // only. Latch it as terminal FIRST so a producer parked in + // checkError() observes the failure and `running` flips + // false, then rethrow so the I/O thread dies loudly + // instead of reconnect-looping. The fail() call site sits + // inside ioLoop's catch, so ioLoop's finally still counts + // down the shutdown latch and close() cannot hang. + recordFatal(e); + throw (Error) e; + } lastReconnectError = e; long now = System.nanoTime(); if (now - lastLogNanos >= RECONNECT_LOG_THROTTLE_NANOS) { - LOG.warn("{} attempt {} failed: {}", phase, attempts, e.getMessage()); + if (e instanceof QwpVersionMismatchException) { + // Not a transport failure: the server completed the WS + // upgrade but advertised a QWP version this client cannot + // speak. Retried indefinitely under Invariant B (a rolling + // upgrade clears it once peers converge), but log the real + // condition so a persistent client/cluster version skew is + // diagnosable instead of reading as a generic connect fail. + LOG.warn("{} attempt {}: every reachable endpoint advertises an unsupported " + + "QWP protocol version ({}); retrying (rolling-upgrade window) -- " + + "if this persists the client is version-incompatible with the cluster", + phase, attempts, e.getMessage()); + } else { + LOG.warn("{} attempt {} failed: {}", phase, attempts, e.getMessage()); + } lastLogNanos = now; } } if (running) { long jitter = ThreadLocalRandom.current().nextLong(backoffMillis); long sleepMillis = backoffMillis + jitter; - long remainingMillis = (deadlineNanos - System.nanoTime()) / 1_000_000L; - if (remainingMillis <= 0) { - break; - } - if (sleepMillis > remainingMillis) { - sleepMillis = remainingMillis; - } LockSupport.parkNanos(sleepMillis * 1_000_000L); backoffMillis = Math.min(backoffMillis * 2, reconnectMaxBackoffMillis); } } + // The loop exits ONLY because running == false, i.e. the sender is + // closing / stopping. Under Invariant B this is NOT a budget give-up + // (there is no wall-clock terminal): we retried until asked to stop, so + // we return quietly and let close() drive shutdown. Un-acked rows remain + // in on-disk SF for this sender's next run or an orphan drainer to ship. long elapsedMs = (System.nanoTime() - outageStartNanos) / 1_000_000L; - String lastMsg = lastReconnectError.getMessage(); - LOG.error("cursor I/O loop giving up {} after {}ms, {} attempts; last error: {}", + String lastMsg = lastReconnectError == null ? "n/a" : lastReconnectError.getMessage(); + LOG.info("cursor I/O loop {} stopped after {}ms, {} attempts (sender closing); " + + "un-acked rows remain in SF for retry; last error: {}", phase, elapsedMs, attempts, lastMsg); - long fromFsn = engine.ackedFsn() + 1L; - long toFsn = Math.max(fromFsn, engine.publishedFsn()); - // Disambiguate by what the sender saw on the wire: if we never got - // a successful upgrade, the user is most likely looking at a config - // problem (typo in addr, wrong port, firewall, server not deployed - // yet); if we connected at least once and then exhausted the budget, - // it's a transient connectivity issue (server down, network flap). - // Tag and free-text hint encode the same signal so both grep-the-logs - // and read-the-message users get it without parsing. - String connectivityTag; - String connectivityHint; - if (hasEverConnected) { - connectivityTag = "connection-lost-budget-exhausted"; - connectivityHint = "server unreachable since last connect (transient)"; - } else { - connectivityTag = "never-connected-budget-exhausted"; - connectivityHint = "never reached the server (check addr/port/firewall)"; - } - SenderError err = new SenderError( - SenderError.Category.PROTOCOL_VIOLATION, - SenderError.Policy.HALT, - SenderError.NO_STATUS_BYTE, - connectivityTag + ": " + elapsedMs + "ms / " + attempts - + " attempts; " + connectivityHint - + "; last error: " + lastMsg, - SenderError.NO_MESSAGE_SEQUENCE, - fromFsn, - toFsn, - null, - System.nanoTime() - ); - totalServerErrors.incrementAndGet(); - // recordFatal MUST run before dispatchError so the producer-observable - // terminal error is latched before the handler is invoked. - recordFatal(new LineSenderServerException(err)); - dispatchError(err); - // Surface the terminal classification through the connection-event - // dispatcher too. Listeners learn about budget exhaustion without - // having to also subscribe to SenderError. Fire AFTER recordFatal so - // a listener that immediately checks the producer-side terminal state - // sees a consistent picture. - SenderConnectionDispatcher cd = connectionDispatcher; - if (cd != null) { - cd.offer(new SenderConnectionEvent( - SenderConnectionEvent.Kind.RECONNECT_BUDGET_EXHAUSTED, - null, SenderConnectionEvent.NO_PORT, - null, SenderConnectionEvent.NO_PORT, - attempts, - SenderConnectionEvent.NO_ROUND_NUMBER, - lastReconnectError, - System.currentTimeMillis())); - } } /** @@ -1064,12 +1198,12 @@ private void enqueuePendingOk(long wireSeq) { /** * Surface a wire failure. With reconnect plumbing wired (factory + - * listener both non-null), enters the per-outage retry loop: - * exponential backoff with jitter, time-capped at - * {@code reconnectMaxDurationMillis}, terminal on auth/upgrade - * rejections (so the budget isn't burned on errors that won't fix - * themselves). On the first successful reconnect within the budget, - * the I/O loop resumes with reset wire state and replays from + * listener both non-null), enters the per-outage retry loop: capped + * exponential backoff with jitter, retried for as long as the loop is + * running -- there is NO wall-clock give-up (Invariant B: a store-and- + * forward drainer only terminates on SF exhaustion or a genuinely non- + * retriable auth/upgrade reject). On the first successful reconnect the + * I/O loop resumes with reset wire state and replays from * {@code engine.ackedFsn() + 1}. *

    * Without reconnect plumbing, the failure is immediately terminal @@ -1097,9 +1231,10 @@ private void ioLoop() { // a reconnect factory is wired. Drive the very first connect on // this thread so the producer thread never blocks on it. // attemptInitialConnect either sets `client` (success) or records - // a terminal failure and clears `running` (auth/upgrade reject or - // budget exhaustion). Either way, the main loop below sees the - // outcome via the `running` and `client` fields. + // a terminal failure and clears `running` (auth/upgrade reject; + // plain connect failures retry indefinitely under Invariant B). + // Either way, the main loop below sees the outcome via the + // `running` and `client` fields. if (client == null && running) { attemptInitialConnect(); } @@ -1127,9 +1262,51 @@ private void ioLoop() { } } } catch (Throwable t) { + if (t instanceof Error) { + // Never funnel a JVM Error into the reconnect loop: latch it + // as terminal so checkError() surfaces it to the producer, + // then rethrow so the thread dies loudly. The finally still + // counts down the shutdown latch, so close() cannot hang. + recordFatal(t); + throw (Error) t; + } fail(t); } finally { + // Last act of the I/O thread: dispose of whatever client it + // holds. This is the airtight half of the close()-vs-reconnect + // race — when close()'s latch await is interrupted (drainer pool + // shutdownNow), close() returns before this thread has exited, + // and its own client close saw the pre-reconnect field. A client + // swapped in by the tail of an in-flight connect attempt (running + // flipped false between connectLoop's check and swapClient) would + // be abandoned live without this. Runs BEFORE the latch countdown + // so a non-interrupted close() observes a fully disposed loop. + // Duplicate closes — loop.close()'s own, owners' stale references + // — stay safe: WebSocketClient.close() is idempotent. + WebSocketClient c = client; + if (c != null) { + try { + c.close(); + } catch (Throwable ignored) { + // best-effort + } + } shutdownLatch.countDown(); + // Failed-stop hand-off (see delegateEngineClose): the owner could + // not free the engine safely while this thread was alive, so the + // engine close — and with it the slot-lock release — happens + // here, strictly after this thread's last engine access. The flag + // is read only after the countDown: the store/load pairing with + // delegateEngineClose's flag-write-then-latch-read guarantees + // either this branch or the owner's fallback runs (or both — + // engine.close() is idempotent). + if (engineCloseDelegated) { + try { + engine.close(); + } catch (Throwable ignored) { + // best-effort + } + } } } @@ -1192,7 +1369,7 @@ private void positionCursorInSegment(MmapSegment seg, long targetFsn) { /** * Mark the loop as fatally failed. Caller has decided no reconnect - * is possible (or it ran out of budget) — latch the error so + * is possible — latch the error so * {@link #checkError} can surface it to the producer thread, then * stop the loop. First-writer-wins: only the first failure latches. * The check-then-latch is unsynchronized and is safe ONLY because @@ -1279,7 +1456,7 @@ private void swapClient(WebSocketClient newClient) { this.client = newClient; // Sticky: once the wire is up, we've reached the server at least // once for this sender's lifetime. Used downstream to classify a - // subsequent budget exhaustion as transient vs config-likely. + // subsequent terminal failure as transient vs config-likely. this.hasEverConnected = true; if (old != null) { try { diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/DefaultSenderConnectionListener.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/DefaultSenderConnectionListener.java index adfb27f7..07213342 100644 --- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/DefaultSenderConnectionListener.java +++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/DefaultSenderConnectionListener.java @@ -36,9 +36,8 @@ * transition so silence is never the default -- connect-string-only users * still see failover and outage signals in their logs. * - *

    Terminal kinds ({@code AUTH_FAILED}, {@code RECONNECT_BUDGET_EXHAUSTED}) - * and {@code ALL_ENDPOINTS_UNREACHABLE} fire at WARN level; everything else - * fires at INFO. + *

    Terminal kind {@code AUTH_FAILED} and {@code ALL_ENDPOINTS_UNREACHABLE} + * fire at WARN level; everything else fires at INFO. */ public final class DefaultSenderConnectionListener implements SenderConnectionListener { @@ -52,7 +51,6 @@ private DefaultSenderConnectionListener() { public void onEvent(@NotNull SenderConnectionEvent e) { switch (e.getKind()) { case AUTH_FAILED: - case RECONNECT_BUDGET_EXHAUSTED: case ALL_ENDPOINTS_UNREACHABLE: LOG.warn("connection event {}", e); break; diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/SegmentManager.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/SegmentManager.java index 2519a002..d96b8627 100644 --- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/SegmentManager.java +++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/SegmentManager.java @@ -143,10 +143,16 @@ public SegmentManager(long segmentSizeBytes, long pollNanos) { * hold an initial active plus one hot spare. */ public SegmentManager(long segmentSizeBytes, long pollNanos, long maxTotalBytes) { + // The pathScratch field initializer has already allocated its native + // buffer by the time this body runs, so a validation throw must free + // it or every failed construction leaks 256 bytes of native memory + // (e.g. a drainer retry loop hitting the same bad config). if (segmentSizeBytes < MmapSegment.HEADER_SIZE + MmapSegment.FRAME_HEADER_SIZE + 1) { + pathScratch.close(); throw new IllegalArgumentException("segmentSizeBytes too small: " + segmentSizeBytes); } if (maxTotalBytes < segmentSizeBytes) { + pathScratch.close(); throw new IllegalArgumentException( "maxTotalBytes (" + maxTotalBytes + ") must allow at least one segment of " + segmentSizeBytes + " bytes"); diff --git a/core/src/main/java/io/questdb/client/impl/ConfigSchema.java b/core/src/main/java/io/questdb/client/impl/ConfigSchema.java index b36f3207..0508428e 100644 --- a/core/src/main/java/io/questdb/client/impl/ConfigSchema.java +++ b/core/src/main/java/io/questdb/client/impl/ConfigSchema.java @@ -56,6 +56,7 @@ public final class ConfigSchema { str("tls_roots", Side.COMMON); str("tls_roots_password", Side.COMMON); longRange("auth_timeout_ms", Side.COMMON, 0, OPEN_MAX, true, false); // > 0 + longRange("connect_timeout", Side.COMMON, 0, OPEN_MAX, true, false); // > 0 // INGRESS -- the WebSocket Sender applies. STRING in the registry; the // Sender parses suffix/mode values (off/on, 64k, durability) with its @@ -108,9 +109,11 @@ public final class ConfigSchema { intRange("query_pool_min", Side.POOL, OPEN, OPEN_MAX, false, false); intRange("query_pool_max", Side.POOL, OPEN, OPEN_MAX, false, false); longRange("acquire_timeout_ms", Side.POOL, OPEN, OPEN_MAX, false, false); + longRange("query_close_timeout_ms", Side.POOL, OPEN, OPEN_MAX, false, false); longRange("idle_timeout_ms", Side.POOL, OPEN, OPEN_MAX, false, false); longRange("max_lifetime_ms", Side.POOL, OPEN, OPEN_MAX, false, false); longRange("housekeeper_interval_ms", Side.POOL, OPEN, OPEN_MAX, false, false); + boolOnOff("lazy_connect", Side.POOL); // facade flag: tolerant non-blocking startup (async ingest + lazy reads) // RESERVED -- accepted no-op (error-policy keys reserved by the spec). str("on_internal_error", Side.RESERVED); diff --git a/core/src/main/java/io/questdb/client/impl/ConfigView.java b/core/src/main/java/io/questdb/client/impl/ConfigView.java index 1160c2d6..74621eef 100644 --- a/core/src/main/java/io/questdb/client/impl/ConfigView.java +++ b/core/src/main/java/io/questdb/client/impl/ConfigView.java @@ -95,6 +95,25 @@ public static String relocatedHint(String key) { return RELOCATED_HINTS.get(key); } + /** + * A boolean flag accepting {@code true}/{@code false} (and {@code on}/{@code off} + * for consistency with the rest of the connect-string surface). Returns + * {@code dflt} when the key is absent; throws on any other value. + */ + public boolean getBool(String key, boolean dflt) { + String v = getStr(key); + if (v == null) { + return dflt; + } + if ("true".equals(v) || "on".equals(v)) { + return true; + } + if ("false".equals(v) || "off".equals(v)) { + return false; + } + throw new IllegalArgumentException("invalid " + key + ": " + v + " (expected true, false, on, off)"); + } + public boolean getBoolOnOff(String key, boolean dflt) { String v = getStr(key); if (v == null) { diff --git a/core/src/main/java/io/questdb/client/impl/PooledSender.java b/core/src/main/java/io/questdb/client/impl/PooledSender.java index 61d89296..e36a8384 100644 --- a/core/src/main/java/io/questdb/client/impl/PooledSender.java +++ b/core/src/main/java/io/questdb/client/impl/PooledSender.java @@ -37,123 +37,112 @@ import java.time.temporal.ChronoUnit; /** - * Decorator that lends a real {@link Sender} from {@link SenderPool}. The - * decorator is pre-allocated once per pool slot and reused for every borrow. + * Thin per-borrow handle returned by {@link SenderPool#borrow()}. A fresh + * instance is created on every borrow, capturing the immutable lease + * {@code generation} stamped by {@code borrow()}; it forwards every + * {@link Sender} call to the reused {@link SenderSlot}'s delegate, validating + * that generation first via {@link SenderSlot#live(long)}. *

    - * Behavior difference from a raw Sender: {@link #close()} on a pooled Sender - * flushes the buffer and returns the decorator to the pool. The underlying - * Sender is only truly closed when {@link io.questdb.client.QuestDB#close()} - * shuts down the pool. + * Behaviour difference from a raw Sender: {@link #close()} flushes the buffer + * and returns the slot to the pool. The underlying Sender is only truly closed + * when {@link io.questdb.client.QuestDB#close()} shuts the pool down. + *

    + * Because the slot is reused across borrows, this wrapper -- not the slot -- + * carries the lease identity. A stale handle (held after {@link #close()}, with + * the slot since re-borrowed) fails its generation check: data calls throw and + * {@link #close()} is a no-op, so it can never flush into, release, or be + * enqueued twice for a slot a different borrower now owns. This mirrors the + * egress {@code QueryLease} guard. */ public final class PooledSender implements Sender { - private final long createdAtMillis; - private final Sender delegate; - private final SenderPool pool; - // Index of the store-and-forward slot this wrapper owns within the pool, - // or -1 when SF is disabled. Stable for the wrapper's whole life; the - // pool returns it to the free set only when the wrapper is evicted from - // {@code all} (discardBroken / reapIdle). Used to derive a distinct - // {@code sender_id} per pooled sender so concurrent SF senders sharing - // one {@code sf_dir} never collide on the slot {@code flock}. - private final int slotIndex; - private volatile long idleSinceMillis; - private volatile boolean inUse; - private volatile boolean invalidated; - - PooledSender(Sender delegate, SenderPool pool, int slotIndex) { - this.delegate = delegate; - this.pool = pool; - this.slotIndex = slotIndex; - this.createdAtMillis = System.currentTimeMillis(); - this.idleSinceMillis = this.createdAtMillis; + private final long generation; + private final SenderSlot slot; + + PooledSender(SenderSlot slot, long generation) { + this.slot = slot; + this.generation = generation; } @Override public void at(long timestamp, ChronoUnit unit) { - delegate.at(timestamp, unit); + slot.live(generation).at(timestamp, unit); } @Override public void at(Instant timestamp) { - delegate.at(timestamp); + slot.live(generation).at(timestamp); } @Override public void atNow() { - delegate.atNow(); + slot.live(generation).atNow(); } @Override public boolean awaitAckedFsn(long targetFsn, long timeoutMillis) { - return delegate.awaitAckedFsn(targetFsn, timeoutMillis); + return slot.live(generation).awaitAckedFsn(targetFsn, timeoutMillis); } @Override public Sender binaryColumn(CharSequence name, byte[] value) { - delegate.binaryColumn(name, value); + slot.live(generation).binaryColumn(name, value); return this; } @Override public Sender binaryColumn(CharSequence name, long ptr, long len) { - delegate.binaryColumn(name, ptr, len); + slot.live(generation).binaryColumn(name, ptr, len); return this; } @Override public Sender binaryColumn(CharSequence name, DirectByteSlice slice) { - delegate.binaryColumn(name, slice); + slot.live(generation).binaryColumn(name, slice); return this; } @Override public Sender boolColumn(CharSequence name, boolean value) { - delegate.boolColumn(name, value); + slot.live(generation).boolColumn(name, value); return this; } @Override public DirectByteSlice bufferView() { - return delegate.bufferView(); + return slot.live(generation).bufferView(); } @Override public Sender byteColumn(CharSequence name, byte value) { - delegate.byteColumn(name, value); + slot.live(generation).byteColumn(name, value); return this; } @Override public void cancelRow() { - delegate.cancelRow(); + slot.live(generation).cancelRow(); } @Override public Sender charColumn(CharSequence name, char value) { - delegate.charColumn(name, value); + slot.live(generation).charColumn(name, value); return this; } /** - * Flushes pending rows and returns this decorator to the pool. Does not - * actually close the underlying {@link Sender}; that only happens when - * the owning {@code QuestDB} is closed. - *

    - * Idempotent: a second call after a return is a no-op. + * Flushes pending rows and returns this lease's slot to the pool. Does not + * actually close the underlying {@link Sender}; that only happens when the + * owning {@code QuestDB} is closed. *

    - * Clears the current thread's pin (if any) before the slot becomes - * borrowable again. Without this step a thread that pinned this - * wrapper and then closed it via the public {@link Sender#close()} - * (the natural try-with-resources idiom) would still hold the pin - * in its {@link ThreadLocal}; a subsequent {@code QuestDB.sender()} - * call on that thread would return the cached wrapper even though - * another consumer has since borrowed the slot, and the two - * consumers would write to the same underlying delegate. + * Idempotent: a stale generation (the lease was already returned and the + * slot possibly re-borrowed) is a no-op, so a double close cannot flush + * into, or re-enqueue, a slot a different borrower now owns. The pool + * re-checks the generation under its lock. */ @Override public void close() { - if (!inUse) { + if (generation != slot.generation()) { return; } // Track normal completion rather than catching a specific throwable @@ -163,257 +152,222 @@ public void close() { // abnormal exit as unrecyclable, which is the fail-safe default. boolean flushed = false; try { - delegate.flush(); + slot.delegate().flush(); flushed = true; } finally { - inUse = false; - // Clear the pin BEFORE returning the slot. If we cleared - // after giveBack(), a concurrent borrower could grab the - // slot while this thread's pin still references it, and a - // re-pin on this thread would return the (now in-use) - // wrapper -- the same race this clear is meant to close. - pool.clearPinIfCurrent(this); if (flushed) { - pool.giveBack(this); + slot.pool().giveBack(this); } else { - // flush() did not complete normally. Sender does not clear - // its buffer on flush failure (see Sender Javadoc), and - // WebSocket transport latches the failure for good. Either - // way the wrapper is unsafe to recycle: the next borrower - // would inherit the failed rows or a dead connection. The - // original throwable propagates naturally once this finally - // returns -- no explicit rethrow needed. - pool.discardBroken(this); + // flush() did not complete normally. Sender does not clear its + // buffer on flush failure (see Sender Javadoc), and WebSocket + // transport latches the failure for good. Either way the slot + // is unsafe to recycle: the next borrower would inherit the + // failed rows or a dead connection. The original throwable + // propagates naturally once this finally returns -- no explicit + // rethrow needed. + slot.pool().discardBroken(this); } } } @Override public Sender decimalColumn(CharSequence name, Decimal256 value) { - delegate.decimalColumn(name, value); + slot.live(generation).decimalColumn(name, value); return this; } @Override public Sender decimalColumn(CharSequence name, Decimal128 value) { - delegate.decimalColumn(name, value); + slot.live(generation).decimalColumn(name, value); return this; } @Override public Sender decimalColumn(CharSequence name, Decimal64 value) { - delegate.decimalColumn(name, value); + slot.live(generation).decimalColumn(name, value); return this; } @Override public Sender decimalColumn(CharSequence name, CharSequence value) { - delegate.decimalColumn(name, value); + slot.live(generation).decimalColumn(name, value); return this; } @Override public Sender doubleArray(@NotNull CharSequence name, double[] values) { - delegate.doubleArray(name, values); + slot.live(generation).doubleArray(name, values); return this; } @Override public Sender doubleArray(@NotNull CharSequence name, double[][] values) { - delegate.doubleArray(name, values); + slot.live(generation).doubleArray(name, values); return this; } @Override public Sender doubleArray(@NotNull CharSequence name, double[][][] values) { - delegate.doubleArray(name, values); + slot.live(generation).doubleArray(name, values); return this; } @Override public Sender doubleArray(CharSequence name, DoubleArray array) { - delegate.doubleArray(name, array); + slot.live(generation).doubleArray(name, array); return this; } @Override public Sender doubleColumn(CharSequence name, double value) { - delegate.doubleColumn(name, value); + slot.live(generation).doubleColumn(name, value); return this; } @Override public boolean drain(long timeoutMillis) { - return delegate.drain(timeoutMillis); + return slot.live(generation).drain(timeoutMillis); } @Override public Sender floatColumn(CharSequence name, float value) { - delegate.floatColumn(name, value); + slot.live(generation).floatColumn(name, value); return this; } @Override public void flush() { - delegate.flush(); + slot.live(generation).flush(); } @Override public long flushAndGetSequence() { - return delegate.flushAndGetSequence(); + return slot.live(generation).flushAndGetSequence(); } @Override public Sender geoHashColumn(CharSequence name, long bits, int precisionBits) { - delegate.geoHashColumn(name, bits, precisionBits); + slot.live(generation).geoHashColumn(name, bits, precisionBits); return this; } @Override public Sender geoHashColumn(CharSequence name, CharSequence value) { - delegate.geoHashColumn(name, value); + slot.live(generation).geoHashColumn(name, value); return this; } @Override public long getAckedFsn() { - return delegate.getAckedFsn(); + return slot.live(generation).getAckedFsn(); } @Override public Sender intColumn(CharSequence name, int value) { - delegate.intColumn(name, value); + slot.live(generation).intColumn(name, value); return this; } @Override public Sender ipv4Column(CharSequence name, int address) { - delegate.ipv4Column(name, address); + slot.live(generation).ipv4Column(name, address); return this; } @Override public Sender ipv4Column(CharSequence name, CharSequence address) { - delegate.ipv4Column(name, address); + slot.live(generation).ipv4Column(name, address); return this; } @Override public Sender long256Column(CharSequence name, long l0, long l1, long l2, long l3) { - delegate.long256Column(name, l0, l1, l2, l3); + slot.live(generation).long256Column(name, l0, l1, l2, l3); return this; } @Override public Sender longArray(@NotNull CharSequence name, long[] values) { - delegate.longArray(name, values); + slot.live(generation).longArray(name, values); return this; } @Override public Sender longArray(@NotNull CharSequence name, long[][] values) { - delegate.longArray(name, values); + slot.live(generation).longArray(name, values); return this; } @Override public Sender longArray(@NotNull CharSequence name, long[][][] values) { - delegate.longArray(name, values); + slot.live(generation).longArray(name, values); return this; } @Override public Sender longArray(@NotNull CharSequence name, LongArray values) { - delegate.longArray(name, values); + slot.live(generation).longArray(name, values); return this; } @Override public Sender longColumn(CharSequence name, long value) { - delegate.longColumn(name, value); + slot.live(generation).longColumn(name, value); return this; } @Override public void reset() { - delegate.reset(); + slot.live(generation).reset(); } @Override public Sender shortColumn(CharSequence name, short value) { - delegate.shortColumn(name, value); + slot.live(generation).shortColumn(name, value); return this; } @Override public Sender stringColumn(CharSequence name, CharSequence value) { - delegate.stringColumn(name, value); + slot.live(generation).stringColumn(name, value); return this; } @Override public Sender symbol(CharSequence name, CharSequence value) { - delegate.symbol(name, value); + slot.live(generation).symbol(name, value); return this; } @Override public Sender table(CharSequence table) { - delegate.table(table); + slot.live(generation).table(table); return this; } @Override public Sender timestampColumn(CharSequence name, long value, ChronoUnit unit) { - delegate.timestampColumn(name, value, unit); + slot.live(generation).timestampColumn(name, value, unit); return this; } @Override public Sender timestampColumn(CharSequence name, Instant value) { - delegate.timestampColumn(name, value); + slot.live(generation).timestampColumn(name, value); return this; } @Override public Sender uuidColumn(CharSequence name, long lo, long hi) { - delegate.uuidColumn(name, lo, hi); + slot.live(generation).uuidColumn(name, lo, hi); return this; } - long createdAtMillis() { - return createdAtMillis; - } - - int slotIndex() { - return slotIndex; - } - - Sender delegate() { - return delegate; - } - - long idleSinceMillis() { - return idleSinceMillis; - } - - boolean isInUse() { - return inUse; - } - - boolean isInvalidated() { - return invalidated; - } - - void markIdleAt(long nowMillis) { - idleSinceMillis = nowMillis; - } - - void markInUse() { - inUse = true; + long generation() { + return generation; } - void markInvalidated() { - invalidated = true; + SenderSlot slot() { + return slot; } } diff --git a/core/src/main/java/io/questdb/client/impl/QueryClientPool.java b/core/src/main/java/io/questdb/client/impl/QueryClientPool.java index a6365dfa..cbbc150a 100644 --- a/core/src/main/java/io/questdb/client/impl/QueryClientPool.java +++ b/core/src/main/java/io/questdb/client/impl/QueryClientPool.java @@ -26,6 +26,7 @@ import io.questdb.client.QueryException; import io.questdb.client.cutlass.qwp.client.QwpQueryClient; +import org.jetbrains.annotations.TestOnly; import java.util.ArrayDeque; import java.util.ArrayList; @@ -49,6 +50,12 @@ */ public final class QueryClientPool implements AutoCloseable { + // Default upper bound, in milliseconds, on how long Query.close() waits for + // an in-flight query to drain (after issuing a cancel) before discarding the + // worker. Mirrors the ingest side's close_flush_timeout_millis default so a + // close() can never block the caller unbounded. Tunable per pool via + // closeQueryTimeoutMillis(long). + static final long DEFAULT_CLOSE_QUERY_TIMEOUT_MILLIS = 5_000; private final long acquireTimeoutMillis; private final ArrayList all; private final ArrayDeque available; @@ -75,6 +82,10 @@ public final class QueryClientPool implements AutoCloseable { private final AtomicInteger nextSlotIndex = new AtomicInteger(); private final Condition workerReleased; private volatile boolean closed; + // Upper bound on the Query.close() drain wait; see + // DEFAULT_CLOSE_QUERY_TIMEOUT_MILLIS. Volatile because QuestDBImpl sets it + // once at build time on a different thread than the borrowers that read it. + private volatile long closeQueryTimeoutMillis = DEFAULT_CLOSE_QUERY_TIMEOUT_MILLIS; private int inFlightCreations; public QueryClientPool( @@ -89,11 +100,12 @@ public QueryClientPool( idleTimeoutMillis, maxLifetimeMillis, null); } - // Package-private constructor exposing the connectHook test seam: production - // passes null (-> the real QwpQueryClient.connect()). White-box tests in - // io.questdb.client.test.impl reach this by reflection to inject a hook that - // throws a non-RuntimeException Throwable from the native connect path. - QueryClientPool( + // Constructor exposing the connectHook seam. Production (QuestDBImpl) passes + // null -> the real QwpQueryClient.connect(); white-box tests pass a hook that + // throws a non-RuntimeException Throwable from the native connect path. This + // is the construction path QuestDBImpl uses, so it is a real (public) ctor, + // not test-only. + public QueryClientPool( String configurationString, int minSize, int maxSize, @@ -106,13 +118,12 @@ public QueryClientPool( idleTimeoutMillis, maxLifetimeMillis, connectHook, null); } - // Package-private constructor exposing both the connectHook and startHook - // test seams: production passes null for each (-> the real - // QwpQueryClient.connect() and QueryWorker.start()). White-box tests in - // io.questdb.client.test.impl reach this by reflection to inject a hook that - // throws a Throwable from either the native connect path (connectHook) or - // the worker thread-start path (startHook). - QueryClientPool( + // Constructor exposing both the connectHook and startHook seams. Production + // reaches it via the overload above (both null -> the real + // QwpQueryClient.connect() and QueryWorker.start()); white-box tests pass a + // hook that throws a Throwable from either the native connect path + // (connectHook) or the worker thread-start path (startHook). + public QueryClientPool( String configurationString, int minSize, int maxSize, @@ -197,7 +208,12 @@ public QueryWorker acquire() { throw new QueryException((byte) 0, "QuestDB handle is closed"); } if (!available.isEmpty()) { - return available.pollFirst(); + QueryWorker w = available.pollFirst(); + // Stamp a fresh lease id under the lock so the QueryLease + // about to be handed out can be distinguished from any + // prior, now-stale borrow of the same worker. + w.bumpGeneration(); + return w; } if (all.size() + inFlightCreations < maxSize) { inFlightCreations++; @@ -248,6 +264,8 @@ public QueryWorker acquire() { throw new QueryException((byte) 0, "QuestDB handle is closed"); } all.add(created); + // Stamp the first lease id for this freshly built worker. + created.bumpGeneration(); return created; } if (remainingNanos <= 0) { @@ -297,6 +315,87 @@ public void close() { } } + /** + * Cancels the in-flight query on {@code w} only while its lease generation + * still equals {@code gen}, holding the pool lock across both the check and + * the wire cancel. acquire() and release() bump the generation under this + * same lock, so once this method holds it the generation cannot change: a + * cancel whose lease has already gone stale (the worker was released and + * re-borrowed) is dropped instead of aborting the new borrower's query. The + * cancel itself is non-blocking -- a volatile flag plus an AtomicLong set -- + * so the lock is held only briefly. + */ + void cancelIfCurrent(QueryWorker w, long gen) { + lock.lock(); + try { + if (closed) { + return; + } + if (w.generation() != gen) { + return; + } + w.cancelInFlight(); + } finally { + lock.unlock(); + } + } + + long closeQueryTimeoutMillis() { + return closeQueryTimeoutMillis; + } + + void closeQueryTimeoutMillis(long millis) { + this.closeQueryTimeoutMillis = millis; + } + + /** + * Evicts a worker whose lease {@link QueryImpl#close(long)} could not drain + * the in-flight query within {@link #closeQueryTimeoutMillis} (the cancel + * was not honored in time, or the caller was interrupted). The worker's + * connection is left in an unknown protocol state -- a late {@code RESULT_*} + * frame for the abandoned query could corrupt the next borrower's stream -- + * so it must NOT return to the pool. Removes it from {@code all} (freeing + * capacity for a fresh worker) and tears it down outside the lock via + * {@link QueryWorker#shutdown()}, which interrupts the dispatch thread so a + * stuck {@code execute()} returns promptly. + *

    + * Bails when the pool is already closed: {@link #close()} owns the teardown + * of every worker via its snapshot loop, so mutating {@code all} here would + * race that iteration on a non-thread-safe {@code ArrayList}. Also bails on a + * stale generation -- the worker was already released/discarded and possibly + * re-borrowed, so discarding it would evict a worker a different borrower now + * owns. Mirrors {@link SenderPool#discardBroken} on the ingest side. + */ + void discard(QueryWorker w, long gen) { + lock.lock(); + try { + if (closed) { + return; + } + if (w.generation() != gen) { + return; + } + // Invalidate the lease so a duplicate close()/release with the same + // generation is dropped and the in-flight handle can no longer drive + // this worker. + w.bumpGeneration(); + all.remove(w); + // Capacity freed -- a waiter in acquire() may now create a fresh + // worker in this slot's place. + workerReleased.signal(); + } finally { + lock.unlock(); + } + // Tear down outside the lock so a slow join doesn't keep the pool + // latched. shutdown() is best-effort and idempotent. + try { + w.shutdown(); + } catch (Throwable ignored) { + // Best-effort: a teardown Error (e.g. an -ea AssertionError) must + // not propagate out of Query.close(). + } + } + void reapIdle() { if (closed) { return; @@ -340,14 +439,30 @@ void reapIdle() { } } - void release(QueryWorker w) { - long now = System.currentTimeMillis(); - w.markIdleAt(now); + void release(QueryWorker w, long gen) { lock.lock(); try { if (closed) { return; } + if (w.generation() != gen) { + // Stale release: this lease was already returned and the worker + // has since been re-borrowed (or this is a duplicate close of an + // already-released lease). Dropping it is what makes + // Query.close() idempotent even under a concurrent re-borrow -- + // without this guard a double close would enqueue the worker + // twice and hand it to two borrowers at once, corrupting the + // whole pool. The flag a stale close() reads is no longer its + // own lease's, so a non-validated release path could not catch + // this; the generation captured at borrow time can. + return; + } + // Invalidate the just-returned lease so a duplicate release with the + // same generation is also dropped and the in-flight handle can no + // longer drive this worker. + w.bumpGeneration(); + w.markIdleAt(System.currentTimeMillis()); + assert !available.contains(w) : "worker already present in available deque on release"; available.addLast(w); workerReleased.signal(); } finally { @@ -355,11 +470,12 @@ void release(QueryWorker w) { } } - // Package-private white-box accessor for tests: reports the current - // in-flight creation count under the pool lock. A non-zero value after a - // failed acquire() means the slot reservation was never released -- the - // capacity-shrink bug this guards against. - int inFlightCreations() { + // White-box accessor for tests: reports the current in-flight creation count + // under the pool lock. A non-zero value after a failed acquire() means the + // slot reservation was never released -- the capacity-shrink bug this guards + // against. + @TestOnly + public int inFlightCreations() { lock.lock(); try { return inFlightCreations; diff --git a/core/src/main/java/io/questdb/client/impl/QueryImpl.java b/core/src/main/java/io/questdb/client/impl/QueryImpl.java index fc80d263..baf483ea 100644 --- a/core/src/main/java/io/questdb/client/impl/QueryImpl.java +++ b/core/src/main/java/io/questdb/client/impl/QueryImpl.java @@ -24,8 +24,6 @@ package io.questdb.client.impl; -import io.questdb.client.Completion; -import io.questdb.client.Query; import io.questdb.client.QueryException; import io.questdb.client.cutlass.qwp.client.QwpBindSetter; import io.questdb.client.cutlass.qwp.client.QwpBindValues; @@ -40,39 +38,54 @@ import java.util.concurrent.locks.ReentrantLock; /** - * Per-thread implementation of {@link Query}. Holds the configured query - * state (SQL, optional binds, handler), an inner {@link Completion}, and a - * wrapping {@link QwpColumnBatchHandler} that forwards callbacks to the user - * handler and signals the Completion on terminal events. + * Reusable per-{@link QueryWorker} query state: the configured SQL, optional + * binds, handler, terminal-event signalling, and a wrapping + * {@link QwpColumnBatchHandler} that forwards callbacks to the user handler and + * signals completion on terminal events. One instance is pre-allocated per + * worker in the constructor and reused across every borrow. *

    - * Lifecycle: {@link QuestDBImpl#query()} returns a per-thread instance, reset - * to empty if it was in a terminal state. {@link #submit()} acquires a - * worker, dispatches, and returns the cached {@link Completion}. + * Because the instance is shared across borrows, it must never be handed to a + * caller directly -- a stale reference would leak into a later borrow's + * lifecycle. Callers instead receive a thin, per-borrow {@link QueryLease} that + * carries the lease {@code generation} stamped at borrow time and passes it + * into every operation here. Each operation validates that generation against + * {@link QueryWorker#generation()}: + *

      + *
    • builder/await operations on a stale generation throw + * {@code IllegalStateException} ("query handle is closed"),
    • + *
    • {@link #close(long)} and {@link #cancel(long)} on a stale generation are + * no-ops -- this is what makes {@code Query.close()} idempotent and + * prevents a stale handle from releasing, or cancelling the in-flight + * query of, a worker a different borrower now owns.
    • + *
    + *

    + * Lifecycle: {@link QueryWorker#lease()} resets this state and wraps it in a + * fresh {@link QueryLease} when {@link QuestDBImpl#borrowQuery()} acquires the + * worker. {@link #submit(long)} dispatches on the held worker (single-flight); + * {@link #close(long)} returns the worker to the pool. */ -final class QueryImpl implements Query { +final class QueryImpl { - private final InnerCompletion completion = new InnerCompletion(); private final Condition doneCondition; private final ReentrantLock doneLock = new ReentrantLock(); - private final QueryClientPool pool; private final StringSink sqlBuffer = new StringSink(); + private final QueryWorker worker; + private final QwpBindSetter wireBinds = this::applyBinds; private final WrappingHandler wrappingHandler = new WrappingHandler(); - private volatile QueryWorker currentWorker; private volatile boolean done = true; private volatile String resultMessage; private volatile byte resultStatus; private volatile Throwable unexpectedError; private QwpBindSetter userBinds; - private final QwpBindSetter wireBinds = this::applyBinds; private QwpColumnBatchHandler userHandler; - QueryImpl(QueryClientPool pool) { - this.pool = pool; + QueryImpl(QueryWorker worker) { + this.worker = worker; this.doneCondition = doneLock.newCondition(); } - @Override - public void abandon() { + void abandon(long gen) { + checkLive(gen); if (!done) { throw new IllegalStateException("a previous submit() is still in flight; await the Completion first"); } @@ -81,27 +94,113 @@ public void abandon() { sqlBuffer.clear(); } - @Override - public Query binds(QwpBindSetter binds) { + void await(long gen) throws InterruptedException { + rejectHandlerReentry("await"); + checkLive(gen); + doneLock.lock(); + try { + while (!done) { + doneCondition.await(); + } + } finally { + doneLock.unlock(); + } + throwIfFailed(); + } + + boolean await(long gen, long timeout, TimeUnit unit) throws InterruptedException { + rejectHandlerReentry("await"); + checkLive(gen); + long remaining = unit.toNanos(timeout); + doneLock.lock(); + try { + while (!done) { + if (remaining <= 0) { + return false; + } + remaining = doneCondition.awaitNanos(remaining); + } + } finally { + doneLock.unlock(); + } + throwIfFailed(); + return true; + } + + void cancel(long gen) { + // Fast-path drop of an obviously-stale or already-finished cancel, + // without taking the pool lock. This is only a hint -- the + // authoritative re-check runs under the pool lock inside + // worker.cancelInFlight(gen). + if (gen != worker.generation() || done) { + return; + } + // Re-check the lease generation and issue the wire cancel atomically + // under the pool lock (the same lock acquire()/release() bump the + // generation under). An unlocked check followed by an unlocked cancel + // is a TOCTOU: a cross-thread watchdog can pass the check, get + // preempted while this lease is released and the worker re-borrowed by + // another caller, then resume and abort that caller's in-flight query. + worker.cancelInFlight(gen); + } + + void close(long gen) { + rejectHandlerReentry("close"); + // A stale generation means this lease was already released and the + // worker may now be owned by another borrower. Dropping the call is + // what keeps close() idempotent without releasing someone else's + // worker or cancelling their in-flight query. release() re-checks the + // generation under the pool lock, so the worker can never be enqueued + // twice even if two threads race a close on the same live lease. + if (gen != worker.generation()) { + return; + } + // If a submit is still in flight (the caller did not await, or its + // await timed out), cancel it and wait for the terminal event so the + // leased worker is idle before it returns to the pool -- otherwise the + // next borrower would inherit a running execute(). + // + // The wait is bounded (closeQueryTimeoutMillis) and interruptible, so a + // caller that bounded its own await() is never pinned to the full + // remaining query duration here. If the query does NOT drain in time (a + // server slow to honor the cancel, or the caller interrupting), the + // worker is still running execute() on a connection whose protocol state + // is now uncertain -- a late RESULT_* for the abandoned query could + // corrupt the next borrower's stream -- so it is discarded rather than + // returned. The pool grows a fresh worker on the next borrow. + if (!done) { + worker.cancelInFlight(gen); + if (!awaitDone(worker.closeQueryTimeoutMillis())) { + worker.discardFromPool(gen); + return; + } + } + worker.releaseToPool(gen); + } + + boolean isDone(long gen) { + checkLive(gen); + return done; + } + + void setBinds(long gen, QwpBindSetter binds) { + checkLive(gen); this.userBinds = binds; - return this; } - @Override - public Query handler(QwpColumnBatchHandler handler) { + void setHandler(long gen, QwpColumnBatchHandler handler) { + checkLive(gen); this.userHandler = handler; - return this; } - @Override - public Query sql(CharSequence sql) { + void setSql(long gen, CharSequence sql) { + checkLive(gen); sqlBuffer.clear(); sqlBuffer.put(sql); - return this; } - @Override - public Completion submit() { + void submit(long gen) { + checkLive(gen); if (sqlBuffer.length() == 0) { throw new IllegalStateException("sql is required"); } @@ -111,7 +210,6 @@ public Completion submit() { if (!done) { throw new IllegalStateException("a previous submit() is still in flight; await the Completion first"); } - QueryWorker w = pool.acquire(); // Reset terminal state under the lock so a stale signal from a prior // run can't be observed by the upcoming await(). doneLock.lock(); @@ -120,12 +218,10 @@ public Completion submit() { resultStatus = 0; resultMessage = null; unexpectedError = null; - currentWorker = w; } finally { doneLock.unlock(); } - w.dispatch(this); - return completion; + worker.dispatch(this); } private void applyBinds(QwpBindValues binds) { @@ -135,6 +231,56 @@ private void applyBinds(QwpBindValues binds) { } } + /** + * Waits up to {@code timeoutMillis} for the in-flight query's terminal + * event. Returns {@code true} once {@code done} is set, {@code false} on + * timeout or interrupt. Unlike an uninterruptible drain, an interrupt aborts + * the wait and re-raises the thread's interrupt flag, so {@code close()} + * stays responsive to a caller that wants to give up. + */ + private boolean awaitDone(long timeoutMillis) { + long remaining = TimeUnit.MILLISECONDS.toNanos(timeoutMillis); + doneLock.lock(); + try { + while (!done) { + if (remaining <= 0) { + return false; + } + try { + remaining = doneCondition.awaitNanos(remaining); + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + return false; + } + } + return true; + } finally { + doneLock.unlock(); + } + } + + private void checkLive(long gen) { + if (gen != worker.generation()) { + throw new IllegalStateException("query handle is not borrowed (closed or never leased)"); + } + } + + private void rejectHandlerReentry(String op) { + // Result handlers (onBatch/onEnd/onError) run inline on the worker's + // dispatch thread. A blocking lease op called from there would wait for + // a terminal event that only this same thread can deliver -- a + // permanent, uninterruptible self-deadlock plus a leaked worker. Fail + // loudly at the call site instead. cancel() is the non-blocking stop. + if (worker.isCurrentThreadWorker()) { + throw new IllegalStateException( + op + "() must not be called from a result handler. Handlers " + + "(onBatch/onEnd/onError) run on the worker thread, so " + op + + "() would block forever waiting for a terminal event that only " + + "this same thread can deliver. To stop a query from inside a " + + "handler, call cancel() (non-blocking)."); + } + } + private void signalDone(byte status, String message, Throwable unexpected) { doneLock.lock(); try { @@ -145,27 +291,38 @@ private void signalDone(byte status, String message, Throwable unexpected) { this.resultMessage = message; this.unexpectedError = unexpected; this.done = true; - this.currentWorker = null; doneCondition.signalAll(); } finally { doneLock.unlock(); } } + private void throwIfFailed() { + Throwable unexpected = unexpectedError; + if (unexpected != null) { + throw new QueryException(resultStatus, resultMessage, unexpected); + } + if (resultStatus != 0) { + throw new QueryException(resultStatus, resultMessage); + } + } + /** - * Drops any prior builder state (SQL, binds, handler) if no submit is - * currently in flight. {@link QuestDBImpl#query()} invokes this before - * returning the per-thread instance so callers see the "reset to empty" - * contract documented on {@link io.questdb.client.Query} regardless of - * whether the previous use ended at a terminal handler callback or at - * {@link #abandon()}. + * Resets builder and terminal state to empty. Called by + * {@link QueryWorker#lease()} when {@link QuestDBImpl#borrowQuery()} hands a + * freshly stamped {@link QueryLease} out, so each borrow starts from the + * documented "reset to empty" contract on {@link io.questdb.client.Query}. + * The leased worker is idle at this point (just acquired from the pool), so + * the reset is unconditional. */ - void resetIfDone() { - if (done) { - userBinds = null; - userHandler = null; - sqlBuffer.clear(); - } + void resetForBorrow() { + userBinds = null; + userHandler = null; + sqlBuffer.clear(); + resultStatus = 0; + resultMessage = null; + unexpectedError = null; + done = true; } void runOn(QwpQueryClient client) { @@ -185,63 +342,6 @@ void signalUnexpected(Throwable t) { signalDone((byte) 0, t.getMessage() != null ? t.getMessage() : t.getClass().getSimpleName(), t); } - private final class InnerCompletion implements Completion { - - @Override - public void await() throws InterruptedException { - doneLock.lock(); - try { - while (!done) { - doneCondition.await(); - } - } finally { - doneLock.unlock(); - } - throwIfFailed(); - } - - @Override - public boolean await(long timeout, TimeUnit unit) throws InterruptedException { - long remaining = unit.toNanos(timeout); - doneLock.lock(); - try { - while (!done) { - if (remaining <= 0) { - return false; - } - remaining = doneCondition.awaitNanos(remaining); - } - } finally { - doneLock.unlock(); - } - throwIfFailed(); - return true; - } - - @Override - public void cancel() { - QueryWorker w = currentWorker; - if (w != null && !done) { - w.cancelInFlight(); - } - } - - @Override - public boolean isDone() { - return done; - } - - private void throwIfFailed() { - Throwable unexpected = unexpectedError; - if (unexpected != null) { - throw new QueryException(resultStatus, resultMessage, unexpected); - } - if (resultStatus != 0) { - throw new QueryException(resultStatus, resultMessage); - } - } - } - private final class WrappingHandler implements QwpColumnBatchHandler { @Override diff --git a/core/src/main/java/io/questdb/client/impl/QueryLease.java b/core/src/main/java/io/questdb/client/impl/QueryLease.java new file mode 100644 index 00000000..6083b802 --- /dev/null +++ b/core/src/main/java/io/questdb/client/impl/QueryLease.java @@ -0,0 +1,110 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.impl; + +import io.questdb.client.Completion; +import io.questdb.client.Query; +import io.questdb.client.cutlass.qwp.client.QwpBindSetter; +import io.questdb.client.cutlass.qwp.client.QwpColumnBatchHandler; + +import java.util.concurrent.TimeUnit; + +/** + * Thin per-borrow handle returned by {@link QuestDBImpl#borrowQuery()}. A fresh + * instance is created on every borrow, capturing the immutable lease + * {@code generation} stamped by {@link QueryClientPool#acquire()}; it delegates + * every {@link Query} and {@link Completion} operation to the worker's reused + * {@link QueryImpl}, threading that generation through so a stale handle cannot + * disturb a later borrow on the same worker (see {@link QueryImpl}). + *

    + * It implements {@link Completion} as well as {@link Query} so {@link #submit()} + * can return {@code this} -- the per-submit path stays allocation-free, and the + * single small allocation happens once per borrow (and is routinely + * scalar-replaced by the JIT in the common try-with-resources case). + */ +final class QueryLease implements Query, Completion { + + private final long generation; + private final QueryImpl impl; + + QueryLease(QueryImpl impl, long generation) { + this.impl = impl; + this.generation = generation; + } + + @Override + public void abandon() { + impl.abandon(generation); + } + + @Override + public void await() throws InterruptedException { + impl.await(generation); + } + + @Override + public boolean await(long timeout, TimeUnit unit) throws InterruptedException { + return impl.await(generation, timeout, unit); + } + + @Override + public Query binds(QwpBindSetter binds) { + impl.setBinds(generation, binds); + return this; + } + + @Override + public void cancel() { + impl.cancel(generation); + } + + @Override + public void close() { + impl.close(generation); + } + + @Override + public Query handler(QwpColumnBatchHandler handler) { + impl.setHandler(generation, handler); + return this; + } + + @Override + public boolean isDone() { + return impl.isDone(generation); + } + + @Override + public Query sql(CharSequence sql) { + impl.setSql(generation, sql); + return this; + } + + @Override + public Completion submit() { + impl.submit(generation); + return this; + } +} diff --git a/core/src/main/java/io/questdb/client/impl/QueryWorker.java b/core/src/main/java/io/questdb/client/impl/QueryWorker.java index f4f641c8..040ade63 100644 --- a/core/src/main/java/io/questdb/client/impl/QueryWorker.java +++ b/core/src/main/java/io/questdb/client/impl/QueryWorker.java @@ -24,6 +24,7 @@ package io.questdb.client.impl; +import io.questdb.client.Query; import io.questdb.client.QueryException; import io.questdb.client.cutlass.qwp.client.QwpQueryClient; @@ -39,7 +40,11 @@ * The pooled query client's own I/O thread continues to drive the wire; the * worker thread exists only to keep {@code execute()} off the application's * submitting thread. Handler callbacks ({@code onBatch}, {@code onEnd}, - * {@code onError}) still run on the client's I/O thread. + * {@code onError}) run on this worker's own dispatch thread, which consumes the + * I/O thread's event queue inline -- not on the I/O thread itself. A handler + * must therefore never call the lease's blocking {@code close()}/{@code await()} + * (it would self-deadlock waiting for a terminal event only this thread can + * deliver); use the non-blocking {@code cancel()} to stop from inside a handler. */ public final class QueryWorker { @@ -47,16 +52,38 @@ public final class QueryWorker { private final QwpQueryClient client; private final long createdAtMillis; private final QueryClientPool pool; + private final QueryImpl query; private final Condition signalCondition; private final ReentrantLock signalLock = new ReentrantLock(); private final Thread thread; private volatile QueryImpl current; + // Test-only deterministic barrier for the busy-worker shutdown-drop race + // fixed in df6f7ca (while (!shuttingDown) -> while (true)). Null in + // production -- the only cost is the null check in runLoop(). A regression + // test installs a hook that runs ON THE WORKER THREAD right after a job + // returns from runOn() and before the loop re-enters the strand check, to + // re-arm current with a re-dispatched job and flip shuttingDown -- exactly + // the window where the old top-of-loop check dropped a pending job. The + // classes involved (QueryWorker, QueryImpl) are final and QwpQueryClient + // has no test seam, so this is the only race-free reproduction point. See + // QueryWorkerTest.testBusyWorkerShutdownStrandsReDispatchedCurrent. + volatile Runnable busyWorkerTestHook; + // Monotonic lease id. Mutated only under the QueryClientPool lock + // (bumped once in acquire() when the worker is handed out and once in + // release() when it is returned), so successive borrows of the same + // worker get distinct ids. A QueryLease captures the value live during + // its borrow; once the worker is released or re-borrowed the captured id + // no longer matches, which is how a stale handle's close()/cancel()/ + // submit() are detected and dropped. Volatile so a stale handle on another + // thread observes the latest value without taking the pool lock. + private volatile long generation; private volatile long idleSinceMillis; private volatile boolean shuttingDown; public QueryWorker(QwpQueryClient client, QueryClientPool pool, int slotIndex) { this.client = client; this.pool = pool; + this.query = new QueryImpl(this); this.signalCondition = signalLock.newCondition(); this.thread = new Thread(this::runLoop, "questdb-query-worker-" + slotIndex); this.thread.setDaemon(true); @@ -68,17 +95,48 @@ long createdAtMillis() { return createdAtMillis; } + /** + * Advances the lease generation. Called by {@link QueryClientPool} under + * the pool lock when this worker is handed out (acquire) and when it is + * returned (release). + */ + void bumpGeneration() { + generation++; + } + + /** + * Current lease generation. See {@link #generation} for the visibility and + * mutation contract. + */ + long generation() { + return generation; + } + long idleSinceMillis() { return idleSinceMillis; } + /** + * True when the calling thread is this worker's own dispatch thread -- i.e. + * a reentrant call from inside a result handler, which runs inline on this + * thread. Blocking lease operations ({@link QueryImpl#close}/ + * {@link QueryImpl#await}) use this to fail loudly instead of + * self-deadlocking. + */ + boolean isCurrentThreadWorker() { + return Thread.currentThread() == thread; + } + void markIdleAt(long nowMillis) { idleSinceMillis = nowMillis; } /** - * Cancels the in-flight query on this worker's client. Safe to call from - * any thread; harmless if the worker is idle. + * Issues an unconditional wire cancel against whatever query this worker's + * client is currently running. Callers must already own the worker for the + * current lease -- in practice this runs under the pool lock via + * {@link QueryClientPool#cancelIfCurrent}, which validates the lease + * generation first. Lease code must use {@link #cancelInFlight(long)}. */ void cancelInFlight() { try { @@ -88,6 +146,18 @@ void cancelInFlight() { } } + /** + * Cancels the in-flight query only if this worker's lease generation still + * equals {@code gen}. Delegates to the pool so the generation re-check and + * the wire cancel happen together under the pool lock that + * {@link QueryClientPool#acquire} and {@link QueryClientPool#release} bump + * the generation under. That atomicity stops a stale cross-thread cancel + * from aborting a later borrower's query on the same worker. + */ + void cancelInFlight(long gen) { + pool.cancelIfCurrent(this, gen); + } + /** * Returns the {@link QwpQueryClient} this worker drives. Exposed for * introspection and tests; callers must not invoke {@code execute()} on @@ -97,6 +167,44 @@ public QwpQueryClient client() { return client; } + /** + * Resets the worker's reused {@link QueryImpl} and returns a fresh + * {@link QueryLease} stamped with the current lease {@link #generation}. + * Called by {@link QuestDBImpl#borrowQuery()} right after + * {@link QueryClientPool#acquire()} hands this worker out (which bumped the + * generation under the pool lock). The lease is a small per-borrow handle; + * the heavy state stays on the reused {@link QueryImpl}, and the per-submit + * path remains allocation-free. + */ + Query lease() { + query.resetForBorrow(); + return new QueryLease(query, generation); + } + + long closeQueryTimeoutMillis() { + return pool.closeQueryTimeoutMillis(); + } + + /** + * Discards this worker from the pool instead of returning it. Called by + * {@link QueryImpl#close(long)} when the in-flight query could not be + * drained within the close budget, leaving the connection in an unknown + * protocol state. The captured lease {@code gen} lets the pool reject a + * stale discard whose worker has already been re-borrowed. + */ + void discardFromPool(long gen) { + pool.discard(this, gen); + } + + /** + * Returns this worker to the pool. Called by {@link QueryImpl#close(long)} + * when the borrowed lease is released; the captured lease {@code gen} lets + * the pool reject a stale release whose worker has already been re-borrowed. + */ + void releaseToPool(long gen) { + pool.release(this, gen); + } + void shutdown() { shuttingDown = true; signalLock.lock(); @@ -106,10 +214,19 @@ void shutdown() { signalLock.unlock(); } try { - // If a query is in flight on this worker, ask the client to abort so - // execute() returns promptly and the thread can exit before join - // times out. cancel() is documented as thread-safe and is a no-op - // when idle. + // If a query is in flight on this worker, force execute() to return + // promptly so the dispatch thread exits before the join below times + // out. Two nudges, strongest first: + // 1. Interrupt the dispatch thread. takeEvent() (QwpSpscQueue.take) + // is interrupt-aware, and executeOnce() turns the resulting + // InterruptedException into a terminal event -> signalDone. This + // releases a caller parked in Query.close() even when the I/O + // thread is wedged and client.close()'s synthetic terminal + // (closePool()) never runs -- the race that would otherwise + // strand the caller forever. + // 2. Ask the client to cancel on the wire so the server stops work. + // Best-effort and a no-op when idle. + thread.interrupt(); try { client.cancel(); } catch (Throwable ignored) { @@ -140,8 +257,10 @@ void start() { } /** - * Hands a configured {@link QueryImpl} to this worker. The caller must - * have just acquired this worker via QueryClientPool#acquire(long). + * Hands a configured {@link QueryImpl} to this worker for execution. The + * worker is held by an open {@link io.questdb.client.Query} lease (see + * {@link #lease()}), so a lease may dispatch repeatedly (single-flight) + * until it is closed. */ void dispatch(QueryImpl q) { signalLock.lock(); @@ -161,7 +280,18 @@ void dispatch(QueryImpl q) { } private void runLoop() { - while (!shuttingDown) { + // Loop unconditionally -- do NOT hoist the shuttingDown check up here as + // while (!shuttingDown). The sole exit is the "if (shuttingDown) return" + // inside the signalLock block below, which strands a pending current + // before returning. Exiting at the top instead would skip that strand on + // the busy-worker path: when a reused lease's submit() -> dispatch() sets + // current between the terminal callback and this check, and shutdown() + // then flips shuttingDown, the worker would return straight after + // runOn() without re-inspecting current -- the job is dropped, never + // run, never signalled, and its caller's await() hangs forever. + // Re-entering the lock every lap funnels every shutdown ordering through + // the single strand point. + while (true) { QueryImpl q; signalLock.lock(); try { @@ -181,6 +311,17 @@ private void runLoop() { return; } q = current; + // Clear the hand-off slot under signalLock, at the moment of + // consumption -- NOT after runOn() returns. A lease is + // single-flight but reused: the user thread loops submit() -> + // await() on the same handle. The terminal callback inside + // runOn() wakes the user thread, which can call submit() -> + // dispatch() (current = q; signal) before this worker thread + // returns from runOn(). Clearing current after runOn() would + // race that dispatch, clobber the freshly-set job, drop its + // already-consumed signal, and park the worker forever while + // the user thread waits on a Completion that never fires. + current = null; } finally { signalLock.unlock(); } @@ -188,9 +329,12 @@ private void runLoop() { q.runOn(client); } catch (Throwable t) { q.signalUnexpected(t); - } finally { - current = null; - pool.release(this); + } + // Test-only barrier: deterministically reproduce the busy-worker + // shutdown-drop race (df6f7ca) at its exact site. Null in production. + Runnable hook = busyWorkerTestHook; + if (hook != null) { + hook.run(); } } } diff --git a/core/src/main/java/io/questdb/client/impl/QuestDBImpl.java b/core/src/main/java/io/questdb/client/impl/QuestDBImpl.java index 5bba8d46..e3da539b 100644 --- a/core/src/main/java/io/questdb/client/impl/QuestDBImpl.java +++ b/core/src/main/java/io/questdb/client/impl/QuestDBImpl.java @@ -24,27 +24,31 @@ package io.questdb.client.impl; -import io.questdb.client.Completion; import io.questdb.client.QuestDB; import io.questdb.client.Query; import io.questdb.client.Sender; -import io.questdb.client.cutlass.qwp.client.QwpColumnBatchHandler; +import io.questdb.client.SenderConnectionListener; +import io.questdb.client.SenderErrorHandler; import io.questdb.client.cutlass.qwp.client.QwpQueryClient; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener; +import org.jetbrains.annotations.TestOnly; import java.util.function.Consumer; import java.util.function.IntFunction; /** - * Implementation of {@link QuestDB}. Owns the elastic {@link SenderPool} - * and {@link QueryClientPool}, a {@link PoolHousekeeper} that reaps idle - * slots, and a {@link ThreadLocal} of {@link QueryImpl} instances so that - * {@link #query()} is allocation-free after the first call on each thread. + * Implementation of {@link QuestDB}. Owns the elastic {@link SenderPool} and + * {@link QueryClientPool} and a {@link PoolHousekeeper} that reaps idle slots. + * {@link #borrowQuery()} leases a pooled {@link QueryWorker} and hands back a + * thin {@link QueryLease} over its reused {@link QueryImpl}; the heavy per-query + * state is pre-allocated on the worker and the per-submit path is + * allocation-free, so only the small lease handle is created per borrow (and is + * routinely scalar-replaced by the JIT in the try-with-resources case). */ public final class QuestDBImpl implements QuestDB { private final PoolHousekeeper housekeeper; private final QueryClientPool queryPool; - private final ThreadLocal queryThreadLocal; private final SenderPool senderPool; private volatile boolean closed; @@ -58,20 +62,26 @@ public QuestDBImpl( long acquireTimeoutMillis, long idleTimeoutMillis, long maxLifetimeMillis, - long housekeeperIntervalMillis + long housekeeperIntervalMillis, + long queryCloseTimeoutMillis, + SenderErrorHandler errorHandler, + SenderConnectionListener connectionListener, + BackgroundDrainerListener drainerListener ) { this(ingestConfig, queryConfig, senderMin, senderMax, queryMin, queryMax, acquireTimeoutMillis, idleTimeoutMillis, maxLifetimeMillis, - housekeeperIntervalMillis, null, null); + housekeeperIntervalMillis, queryCloseTimeoutMillis, null, null, + errorHandler, connectionListener, drainerListener); } - // Package-private constructor exposing the senderFactory and connectHook test - // seams: production passes null for both (-> the real native build/connect - // paths). White-box tests in io.questdb.client.test.impl reach this by - // reflection (the main module is declared `open`) to make SenderPool prewarm - // an observable delegate while QueryClientPool construction throws an Error, + // Test-only constructor exposing the senderFactory and connectHook seams: + // production uses the public overload above, which passes null for both -> + // the real native build/connect paths. White-box error-safety tests in + // io.questdb.client.test.impl call this to make SenderPool prewarm an + // observable delegate while QueryClientPool construction throws an Error, // exercising the cleanup catch below. - QuestDBImpl( + @TestOnly + public QuestDBImpl( String ingestConfig, String queryConfig, int senderMin, @@ -84,6 +94,35 @@ public QuestDBImpl( long housekeeperIntervalMillis, IntFunction senderFactory, Consumer connectHook + ) { + this(ingestConfig, queryConfig, senderMin, senderMax, queryMin, queryMax, + acquireTimeoutMillis, idleTimeoutMillis, maxLifetimeMillis, + housekeeperIntervalMillis, QueryClientPool.DEFAULT_CLOSE_QUERY_TIMEOUT_MILLIS, + senderFactory, connectHook, null, null, null); + } + + // Full constructor adding the ingest-side errorHandler/connectionListener/ + // drainerListener, applied by SenderPool to every Sender it builds. The + // 12-arg overload above is the unchanged white-box test seam and delegates + // here with null callbacks; the public overload delegates here with null + // test seams. + QuestDBImpl( + String ingestConfig, + String queryConfig, + int senderMin, + int senderMax, + int queryMin, + int queryMax, + long acquireTimeoutMillis, + long idleTimeoutMillis, + long maxLifetimeMillis, + long housekeeperIntervalMillis, + long queryCloseTimeoutMillis, + IntFunction senderFactory, + Consumer connectHook, + SenderErrorHandler errorHandler, + SenderConnectionListener connectionListener, + BackgroundDrainerListener drainerListener ) { SenderPool builtSenderPool = null; QueryClientPool builtQueryPool = null; @@ -95,10 +134,12 @@ public QuestDBImpl( // Defer SF startup recovery to the PoolHousekeeper thread so // build() never blocks on a slow / reachable-but-not-acking // server; the housekeeper drives it via runStartupRecoveryStep(). - true); + true, + errorHandler, connectionListener, drainerListener); builtQueryPool = new QueryClientPool( queryConfig, queryMin, queryMax, acquireTimeoutMillis, idleTimeoutMillis, maxLifetimeMillis, connectHook); + builtQueryPool.closeQueryTimeoutMillis(queryCloseTimeoutMillis); builtHousekeeper = new PoolHousekeeper(builtSenderPool, builtQueryPool, housekeeperIntervalMillis); builtHousekeeper.start(); } catch (Throwable e) { @@ -128,7 +169,11 @@ public QuestDBImpl( this.senderPool = builtSenderPool; this.queryPool = builtQueryPool; this.housekeeper = builtHousekeeper; - this.queryThreadLocal = ThreadLocal.withInitial(() -> new QueryImpl(queryPool)); + } + + @Override + public Query borrowQuery() { + return queryPool.acquire().lease(); } @Override @@ -182,30 +227,4 @@ private static void closeQuietly(AutoCloseable closeable) { } } - @Override - public Completion executeSql(CharSequence sql, QwpColumnBatchHandler handler) { - return query().sql(sql).handler(handler).submit(); - } - - @Override - public Query newQuery() { - return new QueryImpl(queryPool); - } - - @Override - public Query query() { - QueryImpl q = queryThreadLocal.get(); - q.resetIfDone(); - return q; - } - - @Override - public void releaseSender() { - senderPool.releaseCurrentThread(); - } - - @Override - public Sender sender() { - return senderPool.pinToCurrentThread(); - } } diff --git a/core/src/main/java/io/questdb/client/impl/SenderPool.java b/core/src/main/java/io/questdb/client/impl/SenderPool.java index 8c9fda7a..2785f2eb 100644 --- a/core/src/main/java/io/questdb/client/impl/SenderPool.java +++ b/core/src/main/java/io/questdb/client/impl/SenderPool.java @@ -25,11 +25,15 @@ package io.questdb.client.impl; import io.questdb.client.Sender; +import io.questdb.client.SenderConnectionListener; +import io.questdb.client.SenderErrorHandler; import io.questdb.client.cutlass.line.LineSenderException; import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener; import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner; import io.questdb.client.std.Files; import io.questdb.client.std.IntList; +import org.jetbrains.annotations.TestOnly; import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -93,9 +97,14 @@ public final class SenderPool implements AutoCloseable { // transport has no application-level connect timeout to clamp it. private static final long RECOVERY_DRAIN_BUDGET_MILLIS = 1_000; private final long acquireTimeoutMillis; - private final ArrayList all; - private final ArrayDeque available; + private final ArrayList all; + private final ArrayDeque available; private final String configurationString; + // User-supplied ingest callbacks, shared across every pooled Sender this + // pool builds. Null -> each sender keeps its loud-not-silent default. + private final SenderConnectionListener connectionListener; + private final BackgroundDrainerListener drainerListener; + private final SenderErrorHandler errorHandler; private final long idleTimeoutMillis; // Test seam. Production builds delegates via defaultSender(); white-box // tests in io.questdb.client.test.impl reach the package-private @@ -132,7 +141,6 @@ public final class SenderPool implements AutoCloseable { private final Condition slotReleased; // True iff the configuration enables store-and-forward (sf_dir set). private final boolean storeAndForward; - private final ThreadLocal threadAffine = new ThreadLocal<>(); // Slots removed from `all` whose delegate is still releasing its flock. // They keep reserving capacity (and their slotInUse mark) until the // flock drops, so the cap check and the slot allocator stay consistent @@ -189,16 +197,17 @@ public SenderPool( long maxLifetimeMillis ) { this(configurationString, minSize, maxSize, acquireTimeoutMillis, - idleTimeoutMillis, maxLifetimeMillis, null); + idleTimeoutMillis, maxLifetimeMillis, null, false, null, null, null); } - // Package-private constructor exposing the senderFactory test seam: - // production passes null (-> the real defaultSender()). White-box tests in - // io.questdb.client.test.impl reach this by reflection to inject a factory - // that throws a non-RuntimeException Throwable mid-prewarm. Recovery runs - // inline here (deferStartupRecovery=false); the pooled QuestDB handle uses - // the 8-arg overload to defer it to the housekeeper thread. - SenderPool( + // Test-only constructor exposing the senderFactory seam: production builds + // via the full constructor below (senderFactory null -> the real + // defaultSender()). White-box tests inject a factory that throws a + // non-RuntimeException Throwable mid-prewarm. Recovery runs inline here + // (deferStartupRecovery=false); the pooled QuestDB handle uses the 8-arg + // overload to defer it to the housekeeper thread. + @TestOnly + public SenderPool( String configurationString, int minSize, int maxSize, @@ -211,14 +220,16 @@ public SenderPool( idleTimeoutMillis, maxLifetimeMillis, senderFactory, false); } - // Full constructor. deferStartupRecovery=true skips the inline, - // construction-time SF recovery (recoverOneSlotStep) so - // QuestDB.build() never blocks on a slow or reachable-but-not-acking - // server; the owner (QuestDBImpl) then drives recovery one slot per tick on - // the PoolHousekeeper thread via runStartupRecoveryStep(). The in-range - // recovery pass is concurrency-safe against borrow()/return on that + // Test-only constructor adding the deferStartupRecovery toggle. + // deferStartupRecovery=true skips the inline, construction-time SF recovery + // (recoverOneSlotStep) so QuestDB.build() never blocks on a slow or + // reachable-but-not-acking server; the owner (QuestDBImpl) then drives + // recovery one slot per tick on the PoolHousekeeper thread via + // runStartupRecoveryStep(). White-box SF tests call this directly; the + // in-range recovery pass is concurrency-safe against borrow()/return on the // deferred path -- see recoverOneSlotStep(). - SenderPool( + @TestOnly + public SenderPool( String configurationString, int minSize, int maxSize, @@ -227,10 +238,36 @@ public SenderPool( long maxLifetimeMillis, IntFunction senderFactory, boolean deferStartupRecovery + ) { + this(configurationString, minSize, maxSize, acquireTimeoutMillis, + idleTimeoutMillis, maxLifetimeMillis, senderFactory, + deferStartupRecovery, null, null, null); + } + + // Full constructor adding the user-supplied ingest callbacks (error + // handler, connection listener and background-drainer listener), applied + // to every Sender the pool builds (see buildManagedSlotSender). The public + // 6-arg ctor and the test-only senderFactory overloads above both delegate + // here with null callbacks; the pooled QuestDB handle calls this directly. + SenderPool( + String configurationString, + int minSize, + int maxSize, + long acquireTimeoutMillis, + long idleTimeoutMillis, + long maxLifetimeMillis, + IntFunction senderFactory, + boolean deferStartupRecovery, + SenderErrorHandler errorHandler, + SenderConnectionListener connectionListener, + BackgroundDrainerListener drainerListener ) { if (minSize < 0 || maxSize < 1 || minSize > maxSize) { throw new IllegalArgumentException("invalid pool sizing: min=" + minSize + ", max=" + maxSize); } + this.errorHandler = errorHandler; + this.connectionListener = connectionListener; + this.drainerListener = drainerListener; this.senderFactory = senderFactory != null ? senderFactory : this::defaultSender; // An injected factory (tests) drives recovery too, preserving the // white-box recovery seam; production recovery forces OFF-mode connects @@ -262,7 +299,7 @@ public SenderPool( if (storeAndForward) { slotInUse[i] = true; } - PooledSender ps = createUnlocked(storeAndForward ? i : -1); + SenderSlot ps = createUnlocked(storeAndForward ? i : -1); all.add(ps); available.add(ps); built++; @@ -571,7 +608,7 @@ private boolean drainCandidateSlotForRecovery(int slotIndex, String slotPath, // createRecoverer() takes the slot flock on -slotIndex, and // delegate().close() can early-return with the I/O thread still running // (flock still held). - PooledSender recoverer = null; + SenderSlot recoverer = null; boolean stopScan = false; try { if (!OrphanScanner.isCandidateOrphan(slotPath)) { @@ -597,7 +634,7 @@ private boolean drainCandidateSlotForRecovery(int slotIndex, String slotPath, // on a timeout: a server that fails to ack within the budget // will very likely do the same for every remaining slot -- the // same reasoning as the build-failure case above. - if (!recoverer.drain(remainingMillis)) { + if (!recoverer.delegate().drain(remainingMillis)) { LOG.warn("startup SF recovery: drain did not ack slot {} " + "within {}ms; skipping remaining slots", slotPath, remainingMillis); @@ -636,9 +673,12 @@ public PooledSender borrow() { throw new LineSenderException("QuestDB handle is closed"); } if (!available.isEmpty()) { - PooledSender s = available.pollFirst(); - s.markInUse(); - return s; + SenderSlot s = available.pollFirst(); + // Stamp a fresh lease id under the lock so the PooledSender + // wrapper handed out can be told apart from any prior, + // now-stale borrow of the same slot. + s.bumpGeneration(); + return new PooledSender(s, s.generation()); } if (all.size() + inFlightCreations + closingSlots + leakedSlots + recoveringSlots < maxSize) { inFlightCreations++; @@ -647,7 +687,7 @@ public PooledSender borrow() { // SF is off (no per-slot identity needed). int slotIndex = storeAndForward ? allocateSlotIndex() : -1; lock.unlock(); - PooledSender created; + SenderSlot created; try { created = createUnlocked(slotIndex); } catch (Throwable e) { @@ -685,8 +725,8 @@ public PooledSender borrow() { throw new LineSenderException("QuestDB handle is closed"); } all.add(created); - created.markInUse(); - return created; + created.bumpGeneration(); + return new PooledSender(created, created.generation()); } if (remainingNanos <= 0) { throw new LineSenderException( @@ -721,7 +761,7 @@ void markClosing() { @Override public void close() { - PooledSender[] snapshot; + SenderSlot[] snapshot; lock.lock(); try { if (closeStarted) { @@ -731,22 +771,13 @@ public void close() { // Raise the shutdown signal too (a direct, non-pooled caller may // close() without a prior markClosing()); harmless if already set. closed = true; - // Mark every pooled wrapper invalidated so pinToCurrentThread() - // on other threads -- which never takes this lock -- can detect - // that its cached entry no longer wraps a live delegate. Removing - // the calling thread's ThreadLocal only clears one slot; other - // threads' slots survive until they read the flag. - for (int i = 0; i < all.size(); i++) { - all.get(i).markInvalidated(); - } // Snapshot under the lock so the delegate-close loop below is // immune to concurrent mutation of `all`. discardBroken running // on another thread can still bail thanks to the `closed` check // it now performs; the snapshot is belt-and-braces for any // future code path that mutates `all` outside this lock's // happens-before chain. - snapshot = all.toArray(new PooledSender[0]); - threadAffine.remove(); + snapshot = all.toArray(new SenderSlot[0]); slotReleased.signalAll(); } finally { lock.unlock(); @@ -763,27 +794,11 @@ public void close() { } } - /** - * Clears the current thread's pin if it currently references {@code s}. - * Invoked from {@link PooledSender#close()} before the wrapper is - * returned to the pool, so a subsequent {@link #pinToCurrentThread()} - * on this thread cannot hand the wrapper back after another consumer - * has borrowed the slot. No-op when the caller never pinned, or pinned - * a different wrapper. - */ - void clearPinIfCurrent(PooledSender s) { - if (threadAffine.get() == s) { - threadAffine.remove(); - } - } - /** * Evicts a slot whose delegate has failed (typically a {@code flush()} - * failure observed in {@link PooledSender#close()}). The wrapper is - * marked invalidated so any thread-pinned reference gets rejected on the - * next {@link #pinToCurrentThread()} call; the slot is removed from - * {@code all} so the pool can grow back into a fresh slot on demand. The - * underlying delegate is closed outside the lock so a slow real-close + * failure observed in {@link PooledSender#close()}). The slot is removed + * from {@code all} so the pool can grow back into a fresh slot on demand. + * The underlying delegate is closed outside the lock so a slow real-close * does not stall other borrowers. *

    * Bails when the pool is already closed: {@link #close()} owns the @@ -792,14 +807,22 @@ void clearPinIfCurrent(PooledSender s) { * {@code ArrayList} and the {@code delegate.close()} below would be a * double-close on a delegate {@code close()} has already shut down. */ - void discardBroken(PooledSender s) { - s.markInvalidated(); + void discardBroken(PooledSender ps) { + SenderSlot s = ps.slot(); + long gen = ps.generation(); boolean reserved = false; lock.lock(); try { if (closed) { return; } + if (s.generation() != gen) { + // Stale discard: the slot was already returned/discarded and + // possibly re-borrowed. Dropping it avoids evicting a slot a + // different borrower now owns and double-closing its delegate. + return; + } + s.bumpGeneration(); boolean removed = all.remove(s); // For an SF slot, keep its index reserved (move the reservation // from `all` to `closingSlots`) until the delegate below releases @@ -844,15 +867,26 @@ void discardBroken(PooledSender s) { } } - public void giveBack(PooledSender s) { - long now = System.currentTimeMillis(); - s.markIdleAt(now); + public void giveBack(PooledSender ps) { + SenderSlot s = ps.slot(); + long gen = ps.generation(); lock.lock(); try { if (closed) { // Pool already shut down: don't requeue; let close() finish destroying. return; } + if (s.generation() != gen) { + // Stale return: this lease was already given back and the slot + // possibly re-borrowed (or this is a duplicate close). Dropping + // it keeps Sender.close() idempotent under a concurrent + // re-borrow -- without it a double close would enqueue the slot + // twice and hand it to two borrowers writing into one delegate. + return; + } + s.bumpGeneration(); + s.markIdleAt(System.currentTimeMillis()); + assert !available.contains(s) : "slot already present in available deque on giveBack"; available.addLast(s); slotReleased.signal(); } finally { @@ -860,19 +894,6 @@ public void giveBack(PooledSender s) { } } - public PooledSender pinToCurrentThread() { - PooledSender pinned = threadAffine.get(); - if (pinned != null && !pinned.isInvalidated()) { - return pinned; - } - if (pinned != null) { - threadAffine.remove(); - } - PooledSender s = borrow(); - threadAffine.set(s); - return s; - } - /** * Closes idle slots that have exceeded {@code idleTimeoutMillis} or that * have aged past {@code maxLifetimeMillis}. Never shrinks below @@ -883,15 +904,15 @@ public void reapIdle() { return; } long now = System.currentTimeMillis(); - ArrayList toClose = null; + ArrayList toClose = null; lock.lock(); try { if (closed) { return; } - Iterator it = available.iterator(); + Iterator it = available.iterator(); while (it.hasNext() && all.size() > minSize) { - PooledSender s = it.next(); + SenderSlot s = it.next(); boolean idleExpired = idleTimeoutMillis < Long.MAX_VALUE && (now - s.idleSinceMillis()) >= idleTimeoutMillis; boolean overAge = maxLifetimeMillis < Long.MAX_VALUE @@ -933,7 +954,7 @@ public void reapIdle() { lock.lock(); try { for (int i = 0, n = toClose.size(); i < n; i++) { - PooledSender s = toClose.get(i); + SenderSlot s = toClose.get(i); if (s.slotIndex() >= 0) { reclaimSlot(s, " during idle reaping"); } @@ -983,32 +1004,19 @@ public int leakedSlotCount() { } } - public void releaseCurrentThread() { - PooledSender pinned = threadAffine.get(); - if (pinned == null) { - return; - } - threadAffine.remove(); - if (pinned.isInvalidated()) { - // Pool was closed: delegate is already closed, skip flush/giveBack. - return; - } - pinned.close(); - } - - private PooledSender createUnlocked(int slotIndex) { - return new PooledSender(senderFactory.apply(slotIndex), this, slotIndex); + private SenderSlot createUnlocked(int slotIndex) { + return new SenderSlot(senderFactory.apply(slotIndex), this, slotIndex); } /** - * Builds a {@link PooledSender} for startup recovery of one stranded slot. + * Builds a {@link SenderSlot} for startup recovery of one stranded slot. * Routes through {@link #recoverySenderFactory}, which in production forces * a non-blocking initial connect ({@link #defaultRecoverySender}) so a * single recovery step stays bounded -- see that method and * {@link #drainCandidateSlotForRecovery}. */ - private PooledSender createRecoverer(int slotIndex) { - return new PooledSender(recoverySenderFactory.apply(slotIndex), this, slotIndex); + private SenderSlot createRecoverer(int slotIndex) { + return new SenderSlot(recoverySenderFactory.apply(slotIndex), this, slotIndex); } private Sender defaultSender(int slotIndex) { @@ -1035,9 +1043,24 @@ private Sender defaultRecoverySender(int slotIndex) { return buildManagedSlotSender(slotIndex, true); } + // Applies the user-supplied ingest callbacks to a sender builder. Null + // callbacks are skipped so the sender keeps its loud-not-silent default. + private Sender.LineSenderBuilder applyUserCallbacks(Sender.LineSenderBuilder builder) { + if (errorHandler != null) { + builder.errorHandler(errorHandler); + } + if (connectionListener != null) { + builder.connectionListener(connectionListener); + } + if (drainerListener != null) { + builder.drainerListener(drainerListener); + } + return builder; + } + private Sender buildManagedSlotSender(int slotIndex, boolean forRecovery) { if (!storeAndForward) { - return Sender.fromConfig(configurationString); + return applyUserCallbacks(Sender.builder(configurationString)).build(); } // Give this pooled sender its own slot dir /- // so concurrent SF senders sharing one sf_dir never collide on @@ -1091,7 +1114,9 @@ private Sender buildManagedSlotSender(int slotIndex, boolean forRecovery) { // returns). builder.drainOrphans(false); } - return builder.build(); + // Recovery delegates are internal, short-lived, OFF-mode drain senders; + // don't surface their connect/error events to the user's callbacks. + return (forRecovery ? builder : applyUserCallbacks(builder)).build(); } /** @@ -1130,7 +1155,7 @@ private void freeSlotIndex(int idx) { * {@link QwpWebSocketSender#isSlotLockReleased()} -- false means close() * bailed early with the I/O thread still running and the flock still held. */ - private static boolean flockReleased(PooledSender s) { + private static boolean flockReleased(SenderSlot s) { Sender d = s.delegate(); return !(d instanceof QwpWebSocketSender) || ((QwpWebSocketSender) d).isSlotLockReleased(); } @@ -1153,7 +1178,7 @@ private static boolean flockReleased(PooledSender s) { * path (e.g. {@code ""} or {@code " during idle reaping"}) * @return {@code true} if the index was freed, {@code false} if retired */ - private boolean reclaimSlot(PooledSender s, String context) { + private boolean reclaimSlot(SenderSlot s, String context) { closingSlots--; if (flockReleased(s)) { freeSlotIndex(s.slotIndex()); diff --git a/core/src/main/java/io/questdb/client/impl/SenderSlot.java b/core/src/main/java/io/questdb/client/impl/SenderSlot.java new file mode 100644 index 00000000..19c93671 --- /dev/null +++ b/core/src/main/java/io/questdb/client/impl/SenderSlot.java @@ -0,0 +1,118 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.impl; + +import io.questdb.client.Sender; + +/** + * One reusable {@link SenderPool} slot: owns a real {@link Sender} delegate, its + * store-and-forward slot index, and the idle/age bookkeeping the pool needs. + * Pre-allocated once per slot and held in the pool's {@code all}/{@code + * available} collections across borrows; it is never handed to callers + * directly. + *

    + * Each borrow wraps the slot in a fresh {@link PooledSender} stamped with the + * slot's current lease {@link #generation}. Because the slot is shared across + * borrows, a stale handle's {@code close()} or data write must not release, or + * write through, a slot a later borrower now owns. The generation -- mutated + * only under the pool lock when the slot is handed out and returned -- is what + * lets {@link #live(long)} and {@link SenderPool#giveBack}/{@link + * SenderPool#discardBroken} detect and drop such stale calls. This is the + * ingest-side mirror of the egress {@code QueryWorker} generation guard. + */ +final class SenderSlot { + + private final long createdAtMillis; + private final Sender delegate; + private final SenderPool pool; + private final int slotIndex; + // Monotonic lease id. Mutated only under the SenderPool lock (bumped in + // borrow() when the slot is handed out and in giveBack()/discardBroken() + // when it is returned). A PooledSender wrapper captures it live for its + // borrow; once the slot is released or re-borrowed the captured id no + // longer matches. Volatile so a stale handle on another thread observes + // the latest value without taking the pool lock. + private volatile long generation; + private volatile long idleSinceMillis; + + SenderSlot(Sender delegate, SenderPool pool, int slotIndex) { + this.delegate = delegate; + this.pool = pool; + this.slotIndex = slotIndex; + this.createdAtMillis = System.currentTimeMillis(); + this.idleSinceMillis = this.createdAtMillis; + } + + /** + * Advances the lease generation. Called by {@link SenderPool} under the + * pool lock when the slot is handed out (borrow) and when it is returned + * (giveBack/discardBroken). + */ + void bumpGeneration() { + generation++; + } + + long createdAtMillis() { + return createdAtMillis; + } + + Sender delegate() { + return delegate; + } + + long generation() { + return generation; + } + + long idleSinceMillis() { + return idleSinceMillis; + } + + /** + * Validates the borrowing lease's {@code gen} and returns the underlying + * delegate for a data-plane call. Throws if the lease is stale (the slot + * was returned to the pool or re-borrowed), so a stale handle cannot write + * into a slot a later borrower owns. Called by {@link PooledSender} on + * every operation. + */ + Sender live(long gen) { + if (gen != generation) { + throw new IllegalStateException("sender handle is closed (returned to the pool)"); + } + return delegate; + } + + void markIdleAt(long nowMillis) { + idleSinceMillis = nowMillis; + } + + SenderPool pool() { + return pool; + } + + int slotIndex() { + return slotIndex; + } +} diff --git a/core/src/main/java/io/questdb/client/network/JavaTlsClientSocket.java b/core/src/main/java/io/questdb/client/network/JavaTlsClientSocket.java index 4d363fbb..c1b1eec7 100644 --- a/core/src/main/java/io/questdb/client/network/JavaTlsClientSocket.java +++ b/core/src/main/java/io/questdb/client/network/JavaTlsClientSocket.java @@ -307,91 +307,13 @@ public int send(long bufferPtr, int bufferLen) { } @Override - public void startTlsSession(CharSequence peerName) throws TlsSessionInitFailedException { + public void startTlsSession(CharSequence peerName, SocketReadinessWaiter waiter) throws TlsSessionInitFailedException { assert state == STATE_PLAINTEXT; prepareInternalBuffers(); try { this.sslEngine = createSslEngine(peerName); this.sslEngine.beginHandshake(); - SSLEngineResult.HandshakeStatus handshakeStatus = sslEngine.getHandshakeStatus(); - while (handshakeStatus != SSLEngineResult.HandshakeStatus.FINISHED) { - switch (handshakeStatus) { - case NEED_TASK: - Runnable task; - while ((task = sslEngine.getDelegatedTask()) != null) { - task.run(); - } - handshakeStatus = sslEngine.getHandshakeStatus(); - break; - case NEED_WRAP: { - SSLEngineResult result = sslEngine.wrap(wrapInputBuffer, wrapOutputBuffer); - handshakeStatus = result.getHandshakeStatus(); - switch (result.getStatus()) { - case BUFFER_UNDERFLOW: - // there cannot be underflow since wrap() during handshake does not read from the input buffer at all - throw new AssertionError("Buffer underflow during TLS handshake. This should not happen. please report as a bug"); - case BUFFER_OVERFLOW: - if (wrapOutputBuffer.position() != 0) { - // wrap() left bytes behind without producing a complete record. The OK - // branch is the only place that drains and clears, so a non-empty - // buffer here means we would re-enter NEED_WRAP with identical state - // and spin forever. Fail loudly instead. - throw new AssertionError("Buffer overflow during TLS handshake with non-empty output buffer. This should not happen, please report as a bug"); - } - // in theory, this can happen if the output buffer is too small to fit a single TLS handshake record, - // but that would indicate our starting buffer is too small. - growWrapOutputBuffer(); - break; - case OK: - // wrapOutputBuffer: write mode - int written = 0; - int bufferLimit = wrapOutputBuffer.position(); - while (written < bufferLimit) { - int n = delegate.send(wrapOutputBufferPtr + written, bufferLimit - written); - if (n < 0) { - throw TlsSessionInitFailedException.instance("socket write error"); - } - written += n; - } - wrapOutputBuffer.clear(); - break; - case CLOSED: - throw TlsSessionInitFailedException.instance("server closed connection unexpectedly"); - } - break; - } - case NEED_UNWRAP: { - int n = readFromSocket(); - if (n < 0) { - throw TlsSessionInitFailedException.instance("socket read error"); - } - SSLEngineResult result = sslEngine.unwrap(unwrapInputBuffer, unwrapOutputBuffer); - handshakeStatus = result.getHandshakeStatus(); - switch (result.getStatus()) { - case BUFFER_UNDERFLOW: - // we need to receive more data from a socket, let's try again - break; - case BUFFER_OVERFLOW: - if (unwrapOutputBuffer.position() != 0) { - // unwrap() produced plaintext but signalled overflow without consuming - // the next record. Nothing in the handshake loop drains this buffer, - // so re-entering NEED_UNWRAP would spin forever. Fail loudly. - throw new AssertionError("Buffer overflow during TLS handshake with non-empty output buffer. This should not happen, please report as a bug"); - } - // in theory, this can happen if the output buffer is too small to fit a single TLS handshake record, - // but that would indicate our starting buffer is too small. - growUnwrapOutputBuffer(); - break; - case OK: - // good, let's see what we need to do next - break; - case CLOSED: - throw TlsSessionInitFailedException.instance("server closed connection unexpectedly"); - } - } - break; - } - } + runHandshake(waiter); // unwrap input buffer: read mode and empty unwrapInputBuffer.position(0); unwrapInputBuffer.limit(0); @@ -583,6 +505,113 @@ private int readFromSocket() { return n; } + /** + * Drives the TLS handshake state machine to completion. When the + * non-blocking socket would block, hands control to {@code waiter} (which + * parks on the event loop bounded by the connect deadline) instead of + * busy-spinning on read/write. Extracted from {@link #startTlsSession} so a + * stub {@code sslEngine} can exercise the wait paths in isolation. + */ + private void runHandshake(SocketReadinessWaiter waiter) throws SSLException, TlsSessionInitFailedException { + SSLEngineResult.HandshakeStatus handshakeStatus = sslEngine.getHandshakeStatus(); + // Exit on NOT_HANDSHAKING as well as FINISHED: getHandshakeStatus() (used by the NEED_TASK + // branch) never returns FINISHED per the JSSE contract -- it returns NOT_HANDSHAKING once the + // handshake completes. Without this, a delegated task that is the terminal step would leave the + // loop on NOT_HANDSHAKING, match no case, and busy-spin forever with no deadline escape. + while (handshakeStatus != SSLEngineResult.HandshakeStatus.FINISHED + && handshakeStatus != SSLEngineResult.HandshakeStatus.NOT_HANDSHAKING) { + switch (handshakeStatus) { + case NEED_TASK: + Runnable task; + while ((task = sslEngine.getDelegatedTask()) != null) { + task.run(); + } + handshakeStatus = sslEngine.getHandshakeStatus(); + break; + case NEED_WRAP: { + SSLEngineResult result = sslEngine.wrap(wrapInputBuffer, wrapOutputBuffer); + handshakeStatus = result.getHandshakeStatus(); + switch (result.getStatus()) { + case BUFFER_UNDERFLOW: + // there cannot be underflow since wrap() during handshake does not read from the input buffer at all + throw new AssertionError("Buffer underflow during TLS handshake. This should not happen. please report as a bug"); + case BUFFER_OVERFLOW: + if (wrapOutputBuffer.position() != 0) { + // wrap() left bytes behind without producing a complete record. The OK + // branch is the only place that drains and clears, so a non-empty + // buffer here means we would re-enter NEED_WRAP with identical state + // and spin forever. Fail loudly instead. + throw new AssertionError("Buffer overflow during TLS handshake with non-empty output buffer. This should not happen, please report as a bug"); + } + // in theory, this can happen if the output buffer is too small to fit a single TLS handshake record, + // but that would indicate our starting buffer is too small. + growWrapOutputBuffer(); + break; + case OK: + // wrapOutputBuffer: write mode + int written = 0; + int bufferLimit = wrapOutputBuffer.position(); + while (written < bufferLimit) { + int n = delegate.send(wrapOutputBufferPtr + written, bufferLimit - written); + if (n < 0) { + throw TlsSessionInitFailedException.instance("socket write error"); + } + if (n == 0) { + // The non-blocking socket's send buffer is full. Wait for it to + // become writable -- bounded by the connect deadline -- instead of + // busy-spinning on send(). + waiter.awaitReady(IOOperation.WRITE); + } + written += n; + } + wrapOutputBuffer.clear(); + break; + case CLOSED: + throw TlsSessionInitFailedException.instance("server closed connection unexpectedly"); + } + break; + } + case NEED_UNWRAP: { + int n = readFromSocket(); + if (n < 0) { + throw TlsSessionInitFailedException.instance("socket read error"); + } + SSLEngineResult result = sslEngine.unwrap(unwrapInputBuffer, unwrapOutputBuffer); + handshakeStatus = result.getHandshakeStatus(); + switch (result.getStatus()) { + case BUFFER_UNDERFLOW: + // Not enough bytes for a complete TLS record yet. If the last read + // drained the socket (n == 0, would-block on the non-blocking fd), wait + // for it to become readable -- bounded by the connect deadline -- instead + // of busy-spinning. A positive n means we read a partial record, so loop + // immediately and read the rest. + if (n == 0) { + waiter.awaitReady(IOOperation.READ); + } + break; + case BUFFER_OVERFLOW: + if (unwrapOutputBuffer.position() != 0) { + // unwrap() produced plaintext but signalled overflow without consuming + // the next record. Nothing in the handshake loop drains this buffer, + // so re-entering NEED_UNWRAP would spin forever. Fail loudly. + throw new AssertionError("Buffer overflow during TLS handshake with non-empty output buffer. This should not happen, please report as a bug"); + } + // in theory, this can happen if the output buffer is too small to fit a single TLS handshake record, + // but that would indicate our starting buffer is too small. + growUnwrapOutputBuffer(); + break; + case OK: + // good, let's see what we need to do next + break; + case CLOSED: + throw TlsSessionInitFailedException.instance("server closed connection unexpectedly"); + } + } + break; + } + } + } + private int writeToSocket(int bytesToSend) { // wrapOutputBuffer is in the write mode int n = delegate.send(wrapOutputBufferPtr, bytesToSend); diff --git a/core/src/main/java/io/questdb/client/network/Net.java b/core/src/main/java/io/questdb/client/network/Net.java index 040a2cb7..f649d330 100644 --- a/core/src/main/java/io/questdb/client/network/Net.java +++ b/core/src/main/java/io/questdb/client/network/Net.java @@ -36,6 +36,11 @@ public final class Net { + // Sentinel returned by connectAddrInfoTimeout when the connect did not + // complete within the supplied budget. Distinct from -1 (generic error) and + // the disconnect codes so callers can flag a timeout without decoding errno. + @SuppressWarnings("unused") + public static final int CONNECT_TIMEOUT = -3; @SuppressWarnings("unused") public static final int EOTHERDISCONNECT = -2; @SuppressWarnings("unused") @@ -88,6 +93,14 @@ public static void configureKeepAlive(int fd) { public static native int connectAddrInfo(int fd, long lpAddrInfo); + /** + * Non-blocking connect bounded by {@code timeoutMillis}. Returns 0 on + * success, {@link #CONNECT_TIMEOUT} on timeout, or -1 on failure (errno set, + * readable via {@link io.questdb.client.std.Os#errno()}). The socket is left + * non-blocking on success. + */ + public static native int connectAddrInfoTimeout(int fd, long lpAddrInfo, int timeoutMillis); + public static void freeAddrInfo(long pAddrInfo) { if (pAddrInfo != 0) { ADDR_INFO_COUNTER.decrementAndGet(); diff --git a/core/src/main/java/io/questdb/client/network/NetworkFacade.java b/core/src/main/java/io/questdb/client/network/NetworkFacade.java index b2e97dad..d23824a5 100644 --- a/core/src/main/java/io/questdb/client/network/NetworkFacade.java +++ b/core/src/main/java/io/questdb/client/network/NetworkFacade.java @@ -27,6 +27,12 @@ import org.slf4j.Logger; public interface NetworkFacade { + /** + * Return value of {@link #connectAddrInfoTimeout(int, long, int)} when the + * connect did not complete within the supplied budget. + */ + int CONNECT_TIMEOUT = Net.CONNECT_TIMEOUT; + int close(int fd); void close(int fd, Logger logger); @@ -39,6 +45,13 @@ public interface NetworkFacade { int connectAddrInfo(int fd, long pAddrInfo); + /** + * Non-blocking connect bounded by {@code timeoutMillis}. Returns 0 on + * success, {@link #CONNECT_TIMEOUT} on timeout, or -1 on failure (with + * {@link #errno()} set). The socket is left non-blocking on success. + */ + int connectAddrInfoTimeout(int fd, long pAddrInfo, int timeoutMillis); + int errno(); void freeAddrInfo(long pAddrInfo); diff --git a/core/src/main/java/io/questdb/client/network/NetworkFacadeImpl.java b/core/src/main/java/io/questdb/client/network/NetworkFacadeImpl.java index 11195fc2..64ea0dc7 100644 --- a/core/src/main/java/io/questdb/client/network/NetworkFacadeImpl.java +++ b/core/src/main/java/io/questdb/client/network/NetworkFacadeImpl.java @@ -62,6 +62,11 @@ public int connectAddrInfo(int fd, long pAddrInfo) { return Net.connectAddrInfo(fd, pAddrInfo); } + @Override + public int connectAddrInfoTimeout(int fd, long pAddrInfo, int timeoutMillis) { + return Net.connectAddrInfoTimeout(fd, pAddrInfo, timeoutMillis); + } + @Override public int errno() { return Os.errno(); diff --git a/core/src/main/java/io/questdb/client/network/PlainSocket.java b/core/src/main/java/io/questdb/client/network/PlainSocket.java index 06e8c23e..555affd2 100644 --- a/core/src/main/java/io/questdb/client/network/PlainSocket.java +++ b/core/src/main/java/io/questdb/client/network/PlainSocket.java @@ -71,7 +71,7 @@ public int send(long bufferPtr, int bufferLen) { } @Override - public void startTlsSession(CharSequence peerName) { + public void startTlsSession(CharSequence peerName, SocketReadinessWaiter waiter) { throw new UnsupportedOperationException(); } diff --git a/core/src/main/java/io/questdb/client/network/Socket.java b/core/src/main/java/io/questdb/client/network/Socket.java index dec4db4e..0cdce517 100644 --- a/core/src/main/java/io/questdb/client/network/Socket.java +++ b/core/src/main/java/io/questdb/client/network/Socket.java @@ -84,9 +84,12 @@ public interface Socket extends QuietCloseable { * on server connections. * * @param peerName server name to use for SNI and certificate validation. + * @param waiter blocks until the socket is ready for the next handshake + * read/write (bounded by the connect deadline), so the + * handshake does not busy-spin on the non-blocking socket. * @throws TlsSessionInitFailedException if the call fails. */ - void startTlsSession(@Nullable CharSequence peerName) throws TlsSessionInitFailedException; + void startTlsSession(@Nullable CharSequence peerName, SocketReadinessWaiter waiter) throws TlsSessionInitFailedException; /** * @return true if the socket support TLS encryption; false otherwise. diff --git a/core/src/main/java/io/questdb/client/network/SocketReadinessWaiter.java b/core/src/main/java/io/questdb/client/network/SocketReadinessWaiter.java new file mode 100644 index 00000000..8543d3e6 --- /dev/null +++ b/core/src/main/java/io/questdb/client/network/SocketReadinessWaiter.java @@ -0,0 +1,46 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.network; + +/** + * Blocks until a non-blocking socket is ready for a given I/O operation, or + * throws a timeout-flagged exception once the caller's deadline passes. + *

    + * Used to drive the TLS handshake off the client's event loop: instead of + * busy-spinning on a non-blocking socket that returns "would block", the + * handshake hands control to this waiter, which parks on epoll/kqueue/select + * with the remaining connect budget. This bounds the handshake by the same + * deadline as the TCP connect and keeps a stalled peer from pinning a CPU. + */ +@FunctionalInterface +public interface SocketReadinessWaiter { + /** + * Blocks until the socket is ready for {@code ioOperation}, or throws a + * timeout-flagged exception when the connect deadline is exceeded. + * + * @param ioOperation {@link IOOperation#READ} or {@link IOOperation#WRITE} + */ + void awaitReady(int ioOperation); +} diff --git a/core/src/main/java/io/questdb/client/std/MemoryTag.java b/core/src/main/java/io/questdb/client/std/MemoryTag.java index 984f6fdb..643ceb58 100644 --- a/core/src/main/java/io/questdb/client/std/MemoryTag.java +++ b/core/src/main/java/io/questdb/client/std/MemoryTag.java @@ -38,4 +38,31 @@ public final class MemoryTag { public static final int NATIVE_TLS_RSS = NATIVE_TEXT_PARSER_RSS + 1; public static final int NATIVE_ND_ARRAY = NATIVE_TLS_RSS + 1; public static final int SIZE = NATIVE_ND_ARRAY + 1; + + public static String nameOf(int tag) { + switch (tag) { + case MMAP_DEFAULT: + return "MMAP_DEFAULT"; + case NATIVE_PATH: + return "NATIVE_PATH"; + case NATIVE_DEFAULT: + return "NATIVE_DEFAULT"; + case NATIVE_DIRECT_UTF8_SINK: + return "NATIVE_DIRECT_UTF8_SINK"; + case NATIVE_HTTP_CONN: + return "NATIVE_HTTP_CONN"; + case NATIVE_ILP_RSS: + return "NATIVE_ILP_RSS"; + case NATIVE_IO_DISPATCHER_RSS: + return "NATIVE_IO_DISPATCHER_RSS"; + case NATIVE_TEXT_PARSER_RSS: + return "NATIVE_TEXT_PARSER_RSS"; + case NATIVE_TLS_RSS: + return "NATIVE_TLS_RSS"; + case NATIVE_ND_ARRAY: + return "NATIVE_ND_ARRAY"; + default: + return "unknown[" + tag + "]"; + } + } } \ No newline at end of file diff --git a/core/src/main/resources/io/questdb/client/bin/darwin-aarch64/libquestdb.dylib b/core/src/main/resources/io/questdb/client/bin/darwin-aarch64/libquestdb.dylib deleted file mode 100644 index 82d21e59..00000000 Binary files a/core/src/main/resources/io/questdb/client/bin/darwin-aarch64/libquestdb.dylib and /dev/null differ diff --git a/core/src/main/resources/io/questdb/client/bin/darwin-x86-64/libquestdb.dylib b/core/src/main/resources/io/questdb/client/bin/darwin-x86-64/libquestdb.dylib deleted file mode 100644 index 647a12cb..00000000 Binary files a/core/src/main/resources/io/questdb/client/bin/darwin-x86-64/libquestdb.dylib and /dev/null differ diff --git a/core/src/main/resources/io/questdb/client/bin/linux-aarch64/libquestdb.so b/core/src/main/resources/io/questdb/client/bin/linux-aarch64/libquestdb.so deleted file mode 100644 index 94ad41c1..00000000 Binary files a/core/src/main/resources/io/questdb/client/bin/linux-aarch64/libquestdb.so and /dev/null differ diff --git a/core/src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so b/core/src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so deleted file mode 100644 index 15c0135d..00000000 Binary files a/core/src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so and /dev/null differ diff --git a/core/src/main/resources/io/questdb/client/bin/windows-x86-64/libquestdb.dll b/core/src/main/resources/io/questdb/client/bin/windows-x86-64/libquestdb.dll deleted file mode 100755 index e95dcecd..00000000 Binary files a/core/src/main/resources/io/questdb/client/bin/windows-x86-64/libquestdb.dll and /dev/null differ diff --git a/core/src/test/java/io/questdb/client/test/QuestDBBuilderTest.java b/core/src/test/java/io/questdb/client/test/QuestDBBuilderTest.java index 1734360b..5b06513c 100644 --- a/core/src/test/java/io/questdb/client/test/QuestDBBuilderTest.java +++ b/core/src/test/java/io/questdb/client/test/QuestDBBuilderTest.java @@ -51,150 +51,50 @@ public void testBuilderCallAfterFromConfigOverridesPoolKeysFromString() { Assert.assertEquals(150L, b.poolConfigSnapshotForTest().get("acquire_timeout_ms")); } - @Test - public void testConflictingIntPoolKeyAcrossSidesRejected() { - // Both sides carry sender_pool_max (an int pool key) with different - // values -> build fails via resolvePoolInt's conflict check. The long - // pool keys are covered by testConflictingPoolKeysAcrossSidesRejected; - // this guards the separate int code path. - try (QuestDB ignored = QuestDB.builder() - .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;sender_pool_max=2;") - .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;sender_pool_max=5;") - .build()) { - Assert.fail("expected conflicting pool config"); - } catch (IllegalArgumentException e) { - Assert.assertTrue(e.getMessage(), e.getMessage().contains("conflicting pool config: sender_pool_max")); - } - } - - @Test - public void testConflictingPoolKeysAcrossSidesRejected() { - // Both sides carry acquire_timeout_ms with different values -> build fails. - try (QuestDB ignored = QuestDB.builder() - .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;acquire_timeout_ms=1000;") - .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;acquire_timeout_ms=2000;") - .build()) { - Assert.fail("expected conflicting pool config"); - } catch (IllegalArgumentException e) { - Assert.assertTrue(e.getMessage(), e.getMessage().contains("conflicting pool config: acquire_timeout_ms")); - } - } - - @Test - public void testConnectRejectsNonWsSchemaOnSingleString() { - // QuestDB.connect(single string) must enforce the ws/wss schema, just - // like the builder's fromConfig(). - assertSchemaRejected(() -> QuestDB.connect("http::addr=h:9000;")); - } - - @Test - public void testConnectRejectsNonWsSchemaOnTwoArg() { - // QuestDB.connect(ingest, query) rejects a non-ws schema on either side. - assertSchemaRejected(() -> QuestDB.connect("tcp::addr=h:9009;", "ws::addr=h:9000;")); - assertSchemaRejected(() -> QuestDB.connect("ws::addr=h:9000;", "udp::addr=h:9009;")); - } - @Test public void testConnectSingleStringValidatesAndBuilds() { - // QuestDB.connect(single string) hands the same ws:: string to both the - // ingest and query sides. min=0 on both pools validates both clients - // without connecting, so build() returns a live handle. + // QuestDB.connect(single string) hands the same ws:: cluster string to + // both the ingest and query pools. min=0 on both pools validates both + // clients without connecting, so build() returns a live handle. try (QuestDB ignored = QuestDB.connect( "ws::addr=127.0.0.1:1;sender_pool_min=0;query_pool_min=0;")) { Assert.assertNotNull(ignored); } } - @Test - public void testConnectStringWithPoolKeysAppliedToBuilder() { - // Pool keys supplied via separate ingest/query strings are accepted; - // min=0 so nothing connects. - try (QuestDB ignored = QuestDB.builder() - .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;sender_pool_max=1;") - .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;query_pool_max=1;") - .build()) { - Assert.assertNotNull(ignored); - } - } - - @Test - public void testConnectTwoArgValidatesAndBuilds() { - // QuestDB.connect(ingest, query) sets the two sides independently; - // min=0 on each validates both clients without connecting. - try (QuestDB ignored = QuestDB.connect( - "ws::addr=127.0.0.1:1;sender_pool_min=0;", - "ws::addr=127.0.0.1:1;query_pool_min=0;")) { - Assert.assertNotNull(ignored); - } - } - - @Test - public void testExplicitPoolKeyWinsOverConflictingStrings() { - // The two strings disagree on acquire_timeout_ms, but an explicit builder - // call sets it: explicit wins and the conflict check is skipped, whether - // the explicit call comes after or before the config strings. The resolved - // value is the explicit 500, not either string's value. - QuestDBBuilder after = QuestDB.builder() - .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;acquire_timeout_ms=1000;") - .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;acquire_timeout_ms=2000;") - .acquireTimeoutMillis(500); - try (QuestDB ignored = after.build()) { - Assert.assertNotNull(ignored); - } - Assert.assertEquals(500L, after.poolConfigSnapshotForTest().get("acquire_timeout_ms")); - - QuestDBBuilder before = QuestDB.builder() - .acquireTimeoutMillis(500) - .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;acquire_timeout_ms=1000;") - .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;acquire_timeout_ms=2000;"); - try (QuestDB ignored = before.build()) { - Assert.assertNotNull(ignored); - } - Assert.assertEquals(500L, before.poolConfigSnapshotForTest().get("acquire_timeout_ms")); - } - - @Test - public void testHttpIngestConfigRejected() { - assertSchemaRejected(() -> QuestDB.builder().ingestConfig("http::addr=h:9000;")); - } - - @Test - public void testHttpSingleConfigRejected() { - assertSchemaRejected(() -> QuestDB.builder().fromConfig("http::addr=h:9000;")); - } - @Test public void testMalformedEgressConfigRejectedAtBuildWithMinZero() { // query_pool_min=0 pre-warms nothing, so build() never constructs a - // QwpQueryClient -- yet it must still reject a malformed query config up - // front via QwpQueryClient.validateConfig, mirroring the ingress side. + // QwpQueryClient -- yet it must still reject a malformed egress key in + // the single cluster config up front, mirroring the ingress side. // Covers a typed enum (compression) and a bounded int (compression_level). - assertEgressBuildRejected( - "ws::addr=127.0.0.1:1;compression=gzip;query_pool_min=0;query_pool_max=2;", "compression"); - assertEgressBuildRejected( - "ws::addr=127.0.0.1:1;compression_level=99;query_pool_min=0;query_pool_max=2;", "compression_level"); + assertBuildRejected( + "ws::addr=127.0.0.1:1;compression=gzip;sender_pool_min=0;query_pool_min=0;query_pool_max=2;", + "compression"); + assertBuildRejected( + "ws::addr=127.0.0.1:1;compression_level=99;sender_pool_min=0;query_pool_min=0;query_pool_max=2;", + "compression_level"); } @Test public void testMalformedIngressConfigRejectedAtBuildWithMinZero() { // sender_pool_min=0 pre-warms nothing, so build() never constructs a - // Sender -- yet it must still reject a malformed ingest config up front, - // matching the egress side. Covers a typed enum (tls_verify), a + // Sender -- yet it must still reject a malformed ingress key in the + // single cluster config up front. Covers a typed enum (tls_verify), a // registry-STRING value that only the real Sender parse validates - // (auto_flush_rows), and WebSocket build-time checks that only the full - // no-connect validation reaches: auto_flush=off and auto_flush_interval=off - // both disable auto-flush (unsupported on WebSocket), and sf_durability=flush - // is not yet supported. - assertIngressBuildRejected( - "wss::addr=127.0.0.1:1;tls_verify=strict;sender_pool_min=0;sender_pool_max=2;", "tls_verify"); - assertIngressBuildRejected( - "ws::addr=127.0.0.1:1;auto_flush_rows=abc;sender_pool_min=0;sender_pool_max=2;", "auto_flush_rows"); - assertIngressBuildRejected( - "ws::addr=127.0.0.1:1;auto_flush_interval=off;sender_pool_min=0;sender_pool_max=2;", "auto-flush"); - assertIngressBuildRejected( - "ws::addr=127.0.0.1:1;auto_flush=off;sender_pool_min=0;sender_pool_max=2;", "auto-flush"); - assertIngressBuildRejected( - "ws::addr=127.0.0.1:1;sf_durability=flush;sender_pool_min=0;sender_pool_max=2;", "not yet supported"); + // (auto_flush_rows), and WebSocket build-time checks: auto_flush=off and + // auto_flush_interval=off both disable auto-flush (unsupported on + // WebSocket), and sf_durability=flush is not yet supported. + assertBuildRejected( + "wss::addr=127.0.0.1:1;tls_verify=strict;sender_pool_min=0;query_pool_min=0;", "tls_verify"); + assertBuildRejected( + "ws::addr=127.0.0.1:1;auto_flush_rows=abc;sender_pool_min=0;query_pool_min=0;", "auto_flush_rows"); + assertBuildRejected( + "ws::addr=127.0.0.1:1;auto_flush_interval=off;sender_pool_min=0;query_pool_min=0;", "auto-flush"); + assertBuildRejected( + "ws::addr=127.0.0.1:1;auto_flush=off;sender_pool_min=0;query_pool_min=0;", "auto-flush"); + assertBuildRejected( + "ws::addr=127.0.0.1:1;sf_durability=flush;sender_pool_min=0;query_pool_min=0;", "not yet supported"); } @Test @@ -212,22 +112,12 @@ public void testMalformedPoolValueRejectedAtBuild() { } @Test - public void testMissingIngestConfigThrows() { - try { - QuestDB.builder().queryConfig("ws::addr=h:9000;").build().close(); - Assert.fail(); - } catch (IllegalStateException e) { - Assert.assertTrue(e.getMessage().contains("ingest")); - } - } - - @Test - public void testMissingQueryConfigThrows() { + public void testMissingConfigThrows() { try { - QuestDB.builder().ingestConfig("ws::addr=h:9000;").build().close(); + QuestDB.builder().build().close(); Assert.fail(); } catch (IllegalStateException e) { - Assert.assertTrue(e.getMessage().contains("query")); + Assert.assertTrue(e.getMessage(), e.getMessage().contains("configuration")); } } @@ -254,26 +144,37 @@ public void testNegativePoolSizesRejected() { } } + @Test + public void testNonWsSchemaRejected() { + // The single cluster config (and QuestDB.connect) must use ws/wss. + assertSchemaRejected(() -> QuestDB.builder().fromConfig("http::addr=h:9000;")); + assertSchemaRejected(() -> QuestDB.builder().fromConfig("tcp::addr=h:9009;")); + assertSchemaRejected(() -> QuestDB.builder().fromConfig("udp::addr=h:9009;")); + assertSchemaRejected(() -> QuestDB.connect("http::addr=h:9000;").close()); + } + @Test public void testQueryPoolBuildFailureUnwindsSenderPool() throws Exception { - // Sender pool builds against a healthy ws ingest endpoint; the query - // pool fails on a dead address. The handle must close the already-built - // sender pool (its connected senders) rather than leak them. - try (TestWebSocketServer ingest = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() { + // One server, one cluster config: the server accepts ingest write-path + // upgrades but rejects egress read-path upgrades, so the sender pool + // connects while the query pool's connect fails. The failed build() must + // close the already-built sender pool (its connected senders) rather than + // leak them. + try (TestWebSocketServer server = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() { })) { - ingest.start(); - Assert.assertTrue(ingest.awaitStart(5, TimeUnit.SECONDS)); - int port = ingest.getPort(); + server.setRejectReadUpgrade(true); + server.start(); + Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + int port = server.getPort(); try { QuestDB.builder() - .ingestConfig("ws::addr=localhost:" + port + ";") - .queryConfig("ws::addr=127.0.0.1:1;auth_timeout_ms=200;") + .fromConfig("ws::addr=localhost:" + port + ";auth_timeout_ms=200;") .senderPoolSize(2) .queryPoolSize(2) .acquireTimeoutMillis(500) .build() .close(); - Assert.fail("expected build to fail when query pool cannot connect"); + Assert.fail("expected build to fail when the query pool cannot connect"); } catch (RuntimeException expected) { // The exact exception comes from QwpQueryClient.connect(). The // build failing only proves the query pool gave up; the @@ -284,75 +185,51 @@ public void testQueryPoolBuildFailureUnwindsSenderPool() throws Exception { // saw two ingest handshakes (proving the senders connected and the // assertion below is not vacuous)... awaitTrue("sender pool should have connected two ingest senders", - () -> ingest.handshakeCount() >= 2); + () -> server.handshakeCount() >= 2); // ...and the failed build() must have closed every one of them, so // no sender connection is left live on the server. The server // observes the client-side socket close asynchronously, so poll. awaitTrue("failed build() must close the already-built sender pool, leaving no live connection", - () -> ingest.liveConnectionCount() == 0); - } - } - - @Test - public void testSamePoolKeyValueAcrossSidesOk() { - // The same key at the same value on both sides builds cleanly. - try (QuestDB ignored = QuestDB.builder() - .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;query_pool_min=0;acquire_timeout_ms=1500;") - .queryConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;query_pool_min=0;acquire_timeout_ms=1500;") - .build()) { - Assert.assertNotNull(ignored); + () -> server.liveConnectionCount() == 0); } } @Test public void testSharedVocabularyConnectsBothPoolsLive() throws Exception { - // The headline use case: one connect-string vocabulary carrying BOTH + // The headline use case: one cluster connect-string carrying BOTH // ingress-only keys (auto_flush_rows, sender_id) and egress-only keys - // (compression, max_batch_rows, target, failover) drives both LIVE - // clients through the facade -- each side applies the keys it owns and - // silently ignores the rest. Other tests cover this validate-only - // (min=0) or on a single side; this one pre-warms min=1 so both pools - // actually connect. - // - // The mock serves ingest (ACK) and query (SERVER_INFO) semantics on - // separate sockets, so ingest and query connect to separate servers. A - // single ws:: address serving both is exercised end-to-end against a - // real server in the parent repo. - try (TestWebSocketServer ingest = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() { - }); - TestWebSocketServer query = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() { - })) { - ingest.start(); - query.setSendServerInfo(true); // the egress client's connect() waits for SERVER_INFO - query.start(); - Assert.assertTrue(ingest.awaitStart(5, TimeUnit.SECONDS)); - Assert.assertTrue(query.awaitStart(5, TimeUnit.SECONDS)); - - // Identical vocabulary on both sides, differing only in addr -- the - // same mixed key set a single-string connect() would hand to both - // clients. The pool keys carry the same value on both sides, so the - // builder's cross-string conflict check passes. - String shared = "auto_flush_rows=100;sender_id=probe-1;" // ingress-only - + "compression=auto;max_batch_rows=512;target=any;failover=off;" // egress-only - + "auth_timeout_ms=2000;" // COMMON + // (compression, max_batch_rows, target, failover) drives both LIVE pools + // -- each side applies the keys it owns and silently ignores the rest. + // One mock server serves both: an ACK stream on the ingest write path and + // a SERVER_INFO frame on the egress read path (the read path is gated so + // the ingest connection's ACK stream is never disturbed). + try (TestWebSocketServer server = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() { + })) { + server.setSendServerInfo(true); // the egress client's connect() waits for SERVER_INFO + server.start(); + Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + + // A single cluster config carrying the mixed key set. The pools + // pre-warm min=1, so the shared vocabulary connects a live sender AND + // a live query client, not merely validates. + String cfg = "ws::addr=localhost:" + server.getPort() + ";" + + "auto_flush_rows=100;sender_id=probe-1;" // ingress-only + + "compression=auto;max_batch_rows=512;target=any;failover=off;" // egress-only + + "auth_timeout_ms=2000;" // common + "sender_pool_min=1;sender_pool_max=2;query_pool_min=1;query_pool_max=2;"; // pool - try (QuestDB db = QuestDB.builder() - .ingestConfig("ws::addr=localhost:" + ingest.getPort() + ";" + shared) - .queryConfig("ws::addr=localhost:" + query.getPort() + ";" + shared) - .build()) { - // build() returned, so both pools pre-warmed their min=1 slot: - // the shared vocabulary connected a live sender AND a live query - // client, not merely validated. + try (QuestDB db = QuestDB.builder().fromConfig(cfg).build()) { Assert.assertNotNull(db.borrowSender()); - Assert.assertNotNull(db.query()); + try (io.questdb.client.Query q = db.borrowQuery()) { + Assert.assertNotNull(q); + } } } } @Test public void testSharedWsConfigWithPoolKeys() { - // A shared ws:: string carries pool keys; min=0 so build does only - // parse-only validation (no connect). + // A cluster ws:: string carries pool keys for both pools; min=0 so build + // does only parse-only validation (no connect). try (QuestDB ignored = QuestDB.builder() .fromConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;sender_pool_max=3;" + "query_pool_min=0;query_pool_max=2;acquire_timeout_ms=1234;") @@ -361,41 +238,13 @@ public void testSharedWsConfigWithPoolKeys() { } } - @Test - public void testTcpIngestConfigRejected() { - assertSchemaRejected(() -> QuestDB.builder().ingestConfig("tcp::addr=h:9009;")); - } - - @Test - public void testUdpIngestConfigRejected() { - assertSchemaRejected(() -> QuestDB.builder().queryConfig("udp::addr=h:9009;")); - } - - private static void assertEgressBuildRejected(String query, String expectedFragment) { - try { - QuestDB.builder() - .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;sender_pool_max=2;") - .queryConfig(query) - .build() - .close(); - Assert.fail("expected build() to reject the malformed query config: " + query); - } catch (RuntimeException e) { - Assert.assertNotNull(e.getMessage()); - Assert.assertTrue(e.getMessage(), e.getMessage().contains(expectedFragment)); - } - } - - private static void assertIngressBuildRejected(String ingest, String expectedFragment) { + private static void assertBuildRejected(String config, String expectedFragment) { try { - QuestDB.builder() - .ingestConfig(ingest) - .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;query_pool_max=2;") - .build() - .close(); - Assert.fail("expected build() to reject the malformed ingest config: " + ingest); + QuestDB.builder().fromConfig(config).build().close(); + Assert.fail("expected build() to reject the malformed config: " + config); } catch (RuntimeException e) { - // Ingress value errors surface as LineSenderException; both it and the - // egress IllegalArgumentException are RuntimeException. + // Ingress value errors surface as LineSenderException; egress errors + // as IllegalArgumentException -- both are RuntimeException. Assert.assertNotNull(e.getMessage()); Assert.assertTrue(e.getMessage(), e.getMessage().contains(expectedFragment)); } diff --git a/core/src/test/java/io/questdb/client/test/QuestDBFacadeCallbacksTest.java b/core/src/test/java/io/questdb/client/test/QuestDBFacadeCallbacksTest.java new file mode 100644 index 00000000..3a8b96c1 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/QuestDBFacadeCallbacksTest.java @@ -0,0 +1,138 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test; + +import io.questdb.client.QuestDB; +import io.questdb.client.SenderConnectionEvent; +import io.questdb.client.SenderConnectionListener; +import io.questdb.client.SenderError; +import io.questdb.client.SenderErrorHandler; +import io.questdb.client.test.cutlass.qwp.client.TestPorts; +import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer; +import org.jetbrains.annotations.NotNull; +import org.junit.Assert; +import org.junit.Test; + +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicReference; + +/** + * Proves the ingest-side async callbacks exposed on the {@link QuestDB} facade + * ({@link io.questdb.client.QuestDBBuilder#errorHandler}/{@code connectionListener}) + * actually reach the pooled {@link io.questdb.client.Sender}s -- not merely the + * lower-level {@code Sender.builder()}. + *

    + * Each test eagerly prewarms one ingest sender ({@code sender_pool_min=1}) + * pointed at a dead port in {@code initial_connect_retry=async} mode with a + * tight reconnect budget: the pool's I/O thread exhausts the budget in the + * background and surfaces the failure through whichever facade-wired callback is + * under test. No server is required. + */ +public class QuestDBFacadeCallbacksTest { + + private static final TestWebSocketServer.WebSocketServerHandler NOOP_HANDLER = + new TestWebSocketServer.WebSocketServerHandler() { + @Override + public void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) { + } + }; + + @Test + public void testFacadeConnectionListenerReceivesEvents() throws Exception { + int port = TestPorts.findUnusedPort(); + CountDownLatch sawEvent = new CountDownLatch(1); + SenderConnectionListener listener = new SenderConnectionListener() { + @Override + public void onEvent(@NotNull SenderConnectionEvent event) { + sawEvent.countDown(); + } + }; + try (QuestDB ignored = QuestDB.builder() + .fromConfig(config(port)) + .connectionListener(listener) + .build()) { + Assert.assertTrue( + "facade-wired connectionListener must observe at least one connection event", + sawEvent.await(5, TimeUnit.SECONDS)); + } + } + + @Test + public void testFacadeErrorHandlerReceivesAsyncIngestError() throws Exception { + // A 401 server produces a genuine auth terminal that surfaces even in + // async mode; the facade-wired errorHandler must receive it. (Under + // Invariant B a mere connection error would retry forever and never + // surface -- only a genuine terminal like auth does.) + try (TestWebSocketServer server = new TestWebSocketServer(NOOP_HANDLER)) { + server.setRejectWithStatus(401, "Unauthorized"); + server.start(); + Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + ErrorInbox inbox = new ErrorInbox(); + try (QuestDB ignored = QuestDB.builder() + .fromConfig(config(server.getPort())) + .errorHandler(inbox) + .build()) { + Assert.assertTrue( + "facade-wired errorHandler must receive the async auth-terminal SenderError", + inbox.await(5, TimeUnit.SECONDS)); + Assert.assertNotNull("a SenderError must be delivered", inbox.get()); + } + } + } + + // One cluster config drives both pools. Eagerly prewarm one sender + // (sender_pool_min=1) so build() exercises the production + // buildManagedSlotSender path that applies the facade callbacks; async + a + // tight budget -> the I/O thread fails fast against the dead port. + // query_pool_min=0 -> the query pool never connects, so the test is isolated + // to the ingest callbacks. + private static String config(int port) { + return "ws::addr=localhost:" + port + ";sender_pool_min=1;sender_pool_max=1" + + ";query_pool_min=0;query_pool_max=1" + + ";initial_connect_retry=async;reconnect_max_duration_millis=400" + + ";reconnect_initial_backoff_millis=10;reconnect_max_backoff_millis=50" + + ";close_flush_timeout_millis=0;"; + } + + private static final class ErrorInbox implements SenderErrorHandler { + private final CountDownLatch latch = new CountDownLatch(1); + private final AtomicReference first = new AtomicReference<>(); + + boolean await(long timeout, TimeUnit unit) throws InterruptedException { + return latch.await(timeout, unit); + } + + SenderError get() { + return first.get(); + } + + @Override + public void onError(@NotNull SenderError error) { + first.compareAndSet(null, error); + latch.countDown(); + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/QuestDBFacadeDrainerListenerTest.java b/core/src/test/java/io/questdb/client/test/QuestDBFacadeDrainerListenerTest.java new file mode 100644 index 00000000..9dbfbc89 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/QuestDBFacadeDrainerListenerTest.java @@ -0,0 +1,465 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test; + +import io.questdb.client.QuestDB; +import io.questdb.client.Sender; +import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine; +import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner; +import io.questdb.client.std.Files; +import io.questdb.client.std.MemoryTag; +import io.questdb.client.std.Unsafe; +import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer; +import io.questdb.client.test.tools.TestUtils; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.ByteOrder; +import java.nio.charset.StandardCharsets; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.function.BooleanSupplier; + +/** + * Proves the {@link io.questdb.client.QuestDBBuilder#drainerListener} and + * {@link Sender.LineSenderBuilder#drainerListener} hooks actually reach the + * background orphan-slot drainers, end-to-end against a real + * {@link TestWebSocketServer} — and that the M10 stream split holds on the + * wire: a durable-ack capability gap (server upgrades but withholds + * {@code X-QWP-Durable-Ack}) lands on {@code onDurableAckUnavailable} while a + * transient all-replica failover window (421 + {@code X-QuestDB-Role: + * REPLICA}) lands on {@code onPrimaryUnavailable}, with the other stream + * staying silent. + *

    + * Fixture shape: an orphan slot is seeded under {@code sf_dir} with unacked + * frames; the config enables {@code drain_orphans} and + * {@code request_durable_ack=on}. The server starts in the failure condition + * under test (durable-ack header suppressed, or role-rejecting), so the + * drainer deterministically observes it — no race against the drainer's first + * connect. Once the listener has recorded the scripted attempts, the server + * "settles" (header restored / reject cleared) and the drain must run to + * completion: no escalation, no {@code .failed} sentinel, slot emptied. The + * foreground sender uses {@code initial_connect_retry=async} so build() never + * blocks or fails on the same scripted condition. + */ +public class QuestDBFacadeDrainerListenerTest { + + private static final int SEEDED_FRAMES = 5; + private static final long SEGMENT_SIZE_BYTES = 16384L; + + private String sfDir; + + @Before + public void setUp() { + sfDir = Paths.get(System.getProperty("java.io.tmpdir"), + "qdb-facade-drainer-listener-" + System.nanoTime()).toString(); + Assert.assertEquals("mkdir sf_dir", 0, Files.mkdir(sfDir, Files.DIR_MODE_DEFAULT)); + } + + @After + public void tearDown() { + if (sfDir != null) rmDirRec(sfDir); + } + + /** + * Facade plumbing E2E: the {@code QuestDB.builder().drainerListener(...)} + * hook must observe the pooled senders' drainer events. The server + * completes the WS upgrade WITHOUT advertising durable ack for the first + * attempts (capability gap), then advertises it; the listener must see + * {@code onDurableAckUnavailable} with attempts {@code 1..N} (one + * uninterrupted episode) and the drain must then succeed. + */ + @Test + public void testFacadeDrainerListenerObservesCapabilityGapThenDrainSucceeds() throws Exception { + TestUtils.assertMemoryLeak(() -> { + seedOrphanSlot("ghost"); + DurableAckAllHandler handler = new DurableAckAllHandler(); + try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) { + // Deterministic capability gap: withheld BEFORE the first + // drainer connect, restored only after the listener has + // recorded the gap episode. + server.setSuppressDurableAckHeader(true); + server.start(); + Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + + RecordingDrainerListener listener = new RecordingDrainerListener(); + try (QuestDB ignored = QuestDB.builder() + .fromConfig(facadeConfig(server.getPort())) + .drainerListener(listener) + .build()) { + awaitTrue(10_000, () -> listener.daAttempts.size() >= 3, + "facade-wired drainer listener must observe the capability-gap " + + "retries via onDurableAckUnavailable"); + // Cluster "settles": the next sweep connects and drains. + server.setSuppressDurableAckHeader(false); + awaitDrainedSlot("ghost"); + } + assertSingleGapEpisodeThenSilence(listener); + } + }); + } + + /** + * Role-reject discrimination E2E: with every handshake answered by 421 + + * {@code X-QuestDB-Role: REPLICA} (transient all-replica failover + * window), the facade-wired listener must receive + * {@code onPrimaryUnavailable} — and {@code onDurableAckUnavailable} must + * stay SILENT for the whole window (the released 1.3.4 contract fed both + * conditions to the DA callback; this pins the M10 split on the wire). + */ + @Test + public void testFacadeDrainerListenerDiscriminatesRoleRejectWindow() throws Exception { + TestUtils.assertMemoryLeak(() -> { + seedOrphanSlot("ghost"); + DurableAckAllHandler handler = new DurableAckAllHandler(); + try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) { + // Deterministic all-replica window: rejecting BEFORE the first + // drainer connect; the durable-ack header is never withheld, + // so no capability gap can ever fire in this test. + server.setRejectWithRole("REPLICA"); + server.start(); + Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + + RecordingDrainerListener listener = new RecordingDrainerListener(); + try (QuestDB ignored = QuestDB.builder() + .fromConfig(facadeConfig(server.getPort())) + .drainerListener(listener) + .build()) { + awaitTrue(10_000, () -> listener.primaryAttempts.size() >= 3, + "facade-wired drainer listener must observe the all-replica " + + "window via onPrimaryUnavailable"); + Assert.assertEquals("onDurableAckUnavailable must stay SILENT during a " + + "role-reject window — that is the whole point of the M10 split", + 0, listener.daAttempts.size()); + // Primary reappears: the next sweep connects and drains. + server.setRejectWithRole(null); + awaitDrainedSlot("ghost"); + } + // Post-close exact assertions on the complete stream. + List primary = listener.primaryAttemptsSnapshot(); + Assert.assertTrue("expected at least the awaited role-reject attempts, got " + + primary, primary.size() >= 3); + for (int i = 0; i < primary.size(); i++) { + Assert.assertEquals("primary stream must be the uninterrupted 1-based " + + "role-reject count, got " + primary, + Integer.valueOf(i + 1), primary.get(i)); + } + Assert.assertEquals("no capability gap ever existed: the DA stream must be " + + "empty end-to-end", 0, listener.daAttempts.size()); + Assert.assertEquals("a role-reject window must NEVER escalate (Invariant B)", + 0, listener.persistentFailures.get()); + Assert.assertFalse("no .failed sentinel for a transient window", + Files.exists(sfDir + "/ghost/" + OrphanScanner.FAILED_SENTINEL_NAME)); + } + }); + } + + /** + * Same capability-gap scenario as the facade test, one level down through + * {@code Sender.builder().drainerListener(...)} — pins the plumbing that + * the pool path composes (builder field → {@code setDrainerListener} → + * drainer pool → drainer), and awaits the drain outcome via the sender's + * public drainer counters. + */ + @Test + public void testSenderBuilderDrainerListenerObservesCapabilityGapThenDrainSucceeds() throws Exception { + TestUtils.assertMemoryLeak(() -> { + seedOrphanSlot("ghost"); + DurableAckAllHandler handler = new DurableAckAllHandler(); + try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) { + server.setSuppressDurableAckHeader(true); + server.start(); + Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + + String cfg = "ws::addr=localhost:" + server.getPort() + + ";sf_dir=" + sfDir + + ";sender_id=primary" + + ";request_durable_ack=on" + + ";drain_orphans=true" + + ";max_background_drainers=1" + + ";initial_connect_retry=async" + + ";reconnect_initial_backoff_millis=25" + + ";reconnect_max_backoff_millis=200" + + ";close_flush_timeout_millis=0;"; + RecordingDrainerListener listener = new RecordingDrainerListener(); + Sender sender = Sender.builder(cfg) + .drainerListener(listener) + .build(); + try { + QwpWebSocketSender ws = (QwpWebSocketSender) sender; + awaitTrue(10_000, () -> listener.daAttempts.size() >= 3, + "builder-wired drainer listener must observe the capability-gap " + + "retries via onDurableAckUnavailable"); + server.setSuppressDurableAckHeader(false); + awaitTrue(15_000, () -> ws.getTotalBackgroundDrainersSucceeded() >= 1, + "drainer must drain the slot fully once the gap clears"); + } finally { + // The FOREGROUND sender's async initial connect hit the + // scripted capability gap and latched a terminal HALT + // before the server settled (durable ack is loud-fail for + // a foreground producer). close() completes its full + // teardown and then rethrows that latched terminal -- + // expected here, and orthogonal to the drainer stream + // this test pins. The pool facade swallows the same + // rethrow in SenderPool.close(), which is why the facade + // tests use plain try-with-resources. + try { + sender.close(); + Assert.fail("close() must loudly rethrow the foreground's " + + "latched capability-gap terminal"); + } catch (io.questdb.client.cutlass.line.LineSenderException expected) { + Assert.assertTrue("expected the foreground durable-ack terminal, got: " + + expected.getMessage(), + expected.getMessage().contains("durable-ack")); + } + } + assertSingleGapEpisodeThenSilence(listener); + } + }); + } + + // One cluster config drives the facade. sender_pool_min=1 eagerly prewarms + // the one sender whose build() dispatches the orphan drainer; + // query_pool_min=0 keeps the read pool out of the picture. async initial + // connect: the foreground sender must not block or fail build() on the + // very condition the drainer is scripted to observe. Small drainer + // backoffs make the awaited attempts prompt while leaving plenty of + // headroom under the 16-attempt capability-gap settle budget between + // "third callback recorded" and "header restored". + private String facadeConfig(int port) { + return "ws::addr=localhost:" + port + + ";sf_dir=" + sfDir + + ";sender_id=pool" + + ";request_durable_ack=on" + + ";drain_orphans=true" + + ";max_background_drainers=1" + + ";sender_pool_min=1;sender_pool_max=1" + + ";query_pool_min=0;query_pool_max=1" + + ";initial_connect_retry=async" + + ";reconnect_initial_backoff_millis=25" + + ";reconnect_max_backoff_millis=200" + + ";close_flush_timeout_millis=0;"; + } + + // The two capability-gap tests end the same way: one uninterrupted gap + // episode numbered 1..K (no role reject ever intervened, so no reset and + // no primary-stream traffic), then the drain succeeded without escalation. + private void assertSingleGapEpisodeThenSilence(RecordingDrainerListener listener) { + List da = listener.daAttemptsSnapshot(); + Assert.assertTrue("expected at least the awaited gap attempts, got " + da, + da.size() >= 3); + for (int i = 0; i < da.size(); i++) { + Assert.assertEquals("DA stream must be the 1-based attempt count of a single " + + "uninterrupted capability-gap episode, got " + da, + Integer.valueOf(i + 1), da.get(i)); + } + Assert.assertEquals("expected slot path on every DA delivery", + Collections.nCopies(da.size(), sfDir + "/ghost"), listener.daSlotPaths); + Assert.assertEquals("no role reject was scripted: the primary stream must be empty", + 0, listener.primaryAttempts.size()); + Assert.assertEquals("the gap cleared inside the settle budget: no escalation", + 0, listener.persistentFailures.get()); + Assert.assertFalse("no .failed sentinel after a successful drain", + Files.exists(sfDir + "/ghost/" + OrphanScanner.FAILED_SENTINEL_NAME)); + } + + // The drainer unlinks the slot's segment files once fully drained, so the + // slot stops being a candidate orphan. Probed per-slot (not via a + // whole-dir scan) because the foreground sender's own LIVE slot holds a + // pre-created segment file for as long as the sender is up, so a + // dir-level scan never reaches zero. A .failed sentinel would ALSO make + // the slot a non-candidate, so the sentinel is asserted absent explicitly. + private void awaitDrainedSlot(String slotName) throws InterruptedException { + String slotPath = sfDir + "/" + slotName; + awaitTrue(15_000, () -> !OrphanScanner.isCandidateOrphan(slotPath), + "drainer must empty the seeded orphan slot once the server settles"); + Assert.assertFalse("slot must drain cleanly, not quarantine", + Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + } + + private static void awaitTrue(long timeoutMillis, BooleanSupplier condition, String message) + throws InterruptedException { + long deadline = System.currentTimeMillis() + timeoutMillis; + while (System.currentTimeMillis() < deadline) { + if (condition.getAsBoolean()) { + return; + } + Thread.sleep(10); + } + Assert.assertTrue(message + " (timed out after " + timeoutMillis + "ms)", + condition.getAsBoolean()); + } + + // Seeds / with unacked frames — the on-disk shape a + // crashed sender leaves behind (same recipe as + // BackgroundDrainerMidDrainCapabilityGapTest). The engine creates the + // slot dir itself; closing it with unacked data leaves the .sfa segments + // in place, so the slot is a candidate orphan. + private void seedOrphanSlot(String slotName) { + String slotPath = sfDir + "/" + slotName; + try (CursorSendEngine engine = new CursorSendEngine(slotPath, SEGMENT_SIZE_BYTES)) { + long buf = Unsafe.malloc(16, MemoryTag.NATIVE_DEFAULT); + try { + byte[] payload = "frame-bytes-padd".getBytes(StandardCharsets.US_ASCII); + for (int i = 0; i < payload.length; i++) { + Unsafe.getUnsafe().putByte(buf + i, payload[i]); + } + for (int i = 0; i < SEEDED_FRAMES; i++) { + engine.appendBlocking(buf, 16); + } + } finally { + Unsafe.free(buf, 16, MemoryTag.NATIVE_DEFAULT); + } + } + Assert.assertEquals("seeded slot must be a candidate orphan", + 1, OrphanScanner.scan(sfDir, "observer").size()); + } + + private static void rmDirRec(String dir) { + if (!Files.exists(dir)) return; + long find = Files.findFirst(dir); + if (find > 0) { + try { + int rc = 1; + while (rc > 0) { + String name = Files.utf8ToString(Files.findName(find)); + if (name != null && !".".equals(name) && !"..".equals(name)) { + String child = dir + "/" + name; + if (!Files.remove(child)) rmDirRec(child); + } + rc = Files.findNext(find); + } + } finally { + Files.findClose(find); + } + } + Files.remove(dir); + } + + /** + * Thread-safe recording listener. Snapshot accessors copy under the same + * monitor the callbacks append under, so end-of-test assertions never + * observe a list mid-append. + */ + private static final class RecordingDrainerListener implements BackgroundDrainerListener { + final List daAttempts = Collections.synchronizedList(new ArrayList<>()); + final List daSlotPaths = Collections.synchronizedList(new ArrayList<>()); + final AtomicInteger persistentFailures = new AtomicInteger(); + final List primaryAttempts = Collections.synchronizedList(new ArrayList<>()); + + @Override + public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) { + persistentFailures.incrementAndGet(); + } + + @Override + public void onDurableAckUnavailable(String slotPath, int attemptNumber) { + daSlotPaths.add(slotPath); + daAttempts.add(attemptNumber); + } + + @Override + public void onPrimaryUnavailable(String slotPath, int attemptNumber) { + primaryAttempts.add(attemptNumber); + } + + List daAttemptsSnapshot() { + synchronized (daAttempts) { + return new ArrayList<>(daAttempts); + } + } + + List primaryAttemptsSnapshot() { + synchronized (primaryAttempts) { + return new ArrayList<>(primaryAttempts); + } + } + } + + /** + * Acks every inbound frame with STATUS_OK + STATUS_DURABLE_ACK on a + * per-connection wire sequence, so a durable-ack-mode drain runs to + * completion on whichever connection finally gets through (same ack + * shape as BackgroundDrainerMidDrainCapabilityGapTest's handler, without + * the scripted drop). State is keyed per ClientHandler identity; acks are + * best-effort because a connection may be racing its own close. + */ + private static final class DurableAckAllHandler implements TestWebSocketServer.WebSocketServerHandler { + private static final String TABLE = "trades"; + private final java.util.Map wireSeqByConn = + new java.util.IdentityHashMap<>(); + + @Override + public synchronized void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) { + long[] counter = wireSeqByConn.get(client); + if (counter == null) { + counter = new long[1]; + wireSeqByConn.put(client, counter); + } + long seq = counter[0]++; + try { + client.sendBinary(okFrame(seq, seq)); + client.sendBinary(durableAckFrame(seq)); + } catch (IOException ignored) { + // best-effort: the drainer replays on its next connection + } + } + + private static byte[] durableAckFrame(long seqTxn) { + byte[] name = TABLE.getBytes(StandardCharsets.UTF_8); + ByteBuffer bb = ByteBuffer.allocate(1 + 2 + 2 + name.length + 8) + .order(ByteOrder.LITTLE_ENDIAN); + bb.put((byte) 0x02); // STATUS_DURABLE_ACK + bb.putShort((short) 1); // tableCount + bb.putShort((short) name.length); + bb.put(name); + bb.putLong(seqTxn); + return bb.array(); + } + + private static byte[] okFrame(long wireSeq, long seqTxn) { + byte[] name = TABLE.getBytes(StandardCharsets.UTF_8); + ByteBuffer bb = ByteBuffer.allocate(1 + 8 + 2 + 2 + name.length + 8) + .order(ByteOrder.LITTLE_ENDIAN); + bb.put((byte) 0x00); // STATUS_OK + bb.putLong(wireSeq); + bb.putShort((short) 1); // tableCount + bb.putShort((short) name.length); + bb.put(name); + bb.putLong(seqTxn); + return bb.array(); + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/QuestDBLazyConnectTest.java b/core/src/test/java/io/questdb/client/test/QuestDBLazyConnectTest.java new file mode 100644 index 00000000..47dd5fa8 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/QuestDBLazyConnectTest.java @@ -0,0 +1,150 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test; + +import io.questdb.client.QuestDB; +import io.questdb.client.QuestDBBuilder; +import io.questdb.client.Sender; +import io.questdb.client.test.cutlass.qwp.client.TestPorts; +import org.junit.Assert; +import org.junit.Test; + +/** + * {@code lazy_connect=true} makes a {@link QuestDB} facade tolerate the server + * being down at startup without disabling reads: the ingest side + * connects asynchronously (writes buffer until the wire is up) and the read pool + * connects lazily on first use. Reads stay enabled and connect once the server + * is up (the recovery lifecycle is covered end-to-end by + * {@link QuestDBServerRecoveryTest}). + *

    + * Because both sides must start non-blocking, a knob that forces a blocking / + * fail-fast startup ({@code initial_connect_retry} other than {@code async}, or + * an explicit {@code query_pool_min > 0}) is a configuration conflict and is + * rejected up front with a clear remedy. + */ +public class QuestDBLazyConnectTest { + + @Test(timeout = 30_000) + public void testLazyConnectStartsAndWritesWhileServerDown() { + int port = TestPorts.findUnusedPort(); + // No server at `port`, sender_pool_min defaults to 1, and the only + // resilience knob is lazy_connect=true. (a) build() must return promptly + // -- the read pool defaults to min=0 and the ingest side goes async, so + // neither side fail-fasts -- and (b) a write must buffer without throwing. + try (QuestDB db = QuestDB.connect("ws::addr=localhost:" + port + + ";lazy_connect=true;reconnect_max_duration_millis=200" + + ";reconnect_initial_backoff_millis=10;reconnect_max_backoff_millis=50" + + ";close_flush_timeout_millis=0;")) { + Sender sender = db.borrowSender(); + Assert.assertNotNull("a sender must be available with no server present", sender); + sender.table("t").longColumn("v", 1L).atNow(); + } + } + + @Test(timeout = 30_000) + public void testLazyConnectKeepsReadsEnabledWhileServerDown() { + int port = TestPorts.findUnusedPort(); + // Reads are ENABLED, just deferred: under lazy_connect the read pool + // defaults to min=0, so build() does not eagerly connect or fail-fast + // while the server is down. The read client connects lazily on the + // first borrowQuery() once the server is up (covered end-to-end by + // QuestDBServerRecoveryTest). This is the whole point of lazy_connect + // over the old write-only mode, which disabled reads outright. + try (QuestDB db = QuestDB.connect("ws::addr=localhost:" + port + + ";lazy_connect=true;close_flush_timeout_millis=0;")) { + Assert.assertNotNull("the handle must build read-enabled while the server is down", db); + } + } + + @Test + public void testLazyConnectAcceptsOnAndAllowsExplicitAsync() { + int port = TestPorts.findUnusedPort(); + // lazy_connect accepts on/off as well as true/false, and an explicit + // initial_connect_retry=async is consistent with it (no conflict). + try (QuestDB db = QuestDB.connect("ws::addr=localhost:" + port + + ";lazy_connect=on;initial_connect_retry=async;query_pool_min=0" + + ";close_flush_timeout_millis=0;")) { + Assert.assertNotNull(db); + } + } + + @Test + public void testLazyConnectConflictsWithBlockingInitialConnectRetry() { + // off/false (OFF) and on/true/sync (SYNC) all block or fail-fast at + // startup, so each conflicts with lazy_connect and must be rejected with + // a clear remedy. + assertLazyConflict("initial_connect_retry=off", "initial_connect_retry", "async"); + assertLazyConflict("initial_connect_retry=sync", "initial_connect_retry", "async"); + assertLazyConflict("initial_connect_retry=on", "initial_connect_retry", "async"); + } + + @Test + public void testLazyConnectConflictsWithExplicitQueryPoolMinInConfig() { + // An explicit query_pool_min > 0 makes the read pool eagerly fail-fast at + // startup, contradicting lazy_connect. + assertLazyConflict("query_pool_min=1", "query_pool_min", "0"); + assertLazyConflict("query_pool_min=2", "query_pool_min", "0"); + // query_pool_min=0 is exactly what lazy_connect wants -- no conflict. + int port = TestPorts.findUnusedPort(); + try (QuestDB db = QuestDB.connect("ws::addr=localhost:" + port + + ";lazy_connect=true;query_pool_min=0;close_flush_timeout_millis=0;")) { + Assert.assertNotNull(db); + } + } + + @Test + public void testLazyConnectConflictsWithExplicitQueryPoolMinFromBuilder() { + // The conflict also fires when query_pool_min > 0 comes from an explicit + // builder call (queryPoolMin / queryPoolSize), not just the connect string. + int port = TestPorts.findUnusedPort(); + assertLazyConflict(QuestDB.builder() + .fromConfig("ws::addr=localhost:" + port + ";lazy_connect=true;close_flush_timeout_millis=0;") + .queryPoolMin(1), "query_pool_min", "0"); + assertLazyConflict(QuestDB.builder() + .fromConfig("ws::addr=localhost:" + port + ";lazy_connect=true;close_flush_timeout_millis=0;") + .queryPoolSize(2), "query_pool_min", "0"); + } + + private static void assertLazyConflict(String extraKeys, String... expectedFragments) { + int port = TestPorts.findUnusedPort(); + assertLazyConflict(QuestDB.builder().fromConfig("ws::addr=localhost:" + port + + ";lazy_connect=true;" + extraKeys + ";close_flush_timeout_millis=0;"), expectedFragments); + } + + private static void assertLazyConflict(QuestDBBuilder builder, String... expectedFragments) { + try { + builder.build().close(); + Assert.fail("expected lazy_connect configuration conflict"); + } catch (IllegalArgumentException e) { + String msg = e.getMessage(); + Assert.assertNotNull(msg); + Assert.assertTrue(msg, msg.contains("lazy_connect")); + for (int i = 0; i < expectedFragments.length; i++) { + Assert.assertTrue("'" + msg + "' should mention '" + expectedFragments[i] + "'", + msg.contains(expectedFragments[i])); + } + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/QuestDBServerRecoveryTest.java b/core/src/test/java/io/questdb/client/test/QuestDBServerRecoveryTest.java new file mode 100644 index 00000000..c68be090 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/QuestDBServerRecoveryTest.java @@ -0,0 +1,114 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test; + +import io.questdb.client.QuestDB; +import io.questdb.client.Sender; +import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer; +import org.junit.Assert; +import org.junit.Test; + +import java.util.concurrent.TimeUnit; +import java.util.function.BooleanSupplier; + +/** + * End-to-end resilience: the facade starts with the server down, the producer + * keeps writing (buffered), and once the server comes up the write side + * reconnects and the read side -- previously deferred so it could not fail-fast + * the build -- can connect. + *

    + * The mock cannot answer a real SELECT (result frames are exercised against a + * real server in the parent repo), so the read step asserts the query client + * connects once the server is up, not the row contents. + */ +public class QuestDBServerRecoveryTest { + + @Test(timeout = 60_000) + public void testFacadeStartsWhileServerDownThenWritesAndReaderConnectsOnRecovery() throws Exception { + // One mock server (the whole "cluster"), bound so the port is known but + // NOT accepting yet: the address is reachable but no WebSocket upgrade + // completes, so the server is effectively "down". It serves ingest ACK + // on the write path and a SERVER_INFO frame on the read path -- the read + // path is gated so the ingest connection's ACK stream is never disturbed. + try (TestWebSocketServer server = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() { + })) { + server.setSendServerInfo(true); // the egress client's connect() waits for SERVER_INFO + // One cluster config drives both pools: + // lazy_connect=true expands to exactly this resilience: the ingest + // side goes async (the producer never blocks; writes buffer until the + // wire is up) and the read pool defaults to min=0 (the otherwise + // fail-fast reader never sinks the build while the server is down, + // and connects lazily on the first query). + String cfg = "ws::addr=localhost:" + server.getPort() + + ";lazy_connect=true" + + ";sender_pool_min=1;sender_pool_max=1;query_pool_max=1" + + ";auth_timeout_ms=2000;reconnect_initial_backoff_millis=20" + + ";reconnect_max_backoff_millis=100;reconnect_max_duration_millis=600000" + + ";close_flush_timeout_millis=1000;"; + + // (1) server down + (2) client starts: + try (QuestDB db = QuestDB.builder().fromConfig(cfg).build()) { + Assert.assertEquals("no handshake while the server is down", 0, server.handshakeCount()); + + // lazy_connect keeps reads ENABLED, just deferred: the read pool + // defaults to min=0, so nothing connects while the server is + // down. The read client connects lazily on the first + // borrowQuery() once the server is up (step 5). + + // (3) client writes -> buffers in the cursor SF engine; the call + // must not throw even though the server is down. + Sender sender = db.borrowSender(); + sender.table("t").longColumn("v", 1L).atNow(); + + // (4) server starts: + server.start(); + Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + + // The write side reconnects on its own once the server is up. + awaitTrue("ingest must connect after the server comes up", + () -> server.handshakeCount() >= 1); + + // (5) client can now read: the deferred reader connects on the + // first borrowQuery() (the mock does not serve rows, so we + // assert the connection, not the result). + int handshakesBeforeQuery = server.handshakeCount(); + db.borrowQuery().close(); + awaitTrue("query client must connect after the server comes up", + () -> server.handshakeCount() >= handshakesBeforeQuery + 1); + } + } + } + + private static void awaitTrue(String message, BooleanSupplier condition) throws InterruptedException { + long deadline = System.nanoTime() + TimeUnit.SECONDS.toNanos(15); + while (System.nanoTime() < deadline) { + if (condition.getAsBoolean()) { + return; + } + Thread.sleep(20); + } + Assert.assertTrue(message, condition.getAsBoolean()); + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/http/client/WebSocketClientTest.java b/core/src/test/java/io/questdb/client/test/cutlass/http/client/WebSocketClientTest.java index cf121d8c..cefdac35 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/http/client/WebSocketClientTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/http/client/WebSocketClientTest.java @@ -31,16 +31,61 @@ import io.questdb.client.cutlass.http.client.WebSocketSendBuffer; import io.questdb.client.network.PlainSocketFactory; import io.questdb.client.network.Socket; +import io.questdb.client.network.SocketReadinessWaiter; import org.junit.Assert; import org.junit.Test; import java.lang.reflect.Field; import java.lang.reflect.Method; +import java.util.concurrent.CyclicBarrier; +import java.util.concurrent.atomic.AtomicReference; import static io.questdb.client.test.tools.TestUtils.assertMemoryLeak; public class WebSocketClientTest { + /** + * close() frees native memory (recv/fragment buffers, send buffers), so + * its guard must be a CAS, not a volatile check-then-act: two concurrent + * closers passing the flag check together would both run + * disconnect()/Unsafe.free -- a native double-free. Closers can race in + * practice: the owner thread's teardown vs the I/O thread's exit path vs + * stale duplicate references (see CursorWebSocketSendLoop). The memory + * counters checked by assertMemoryLeak flag a double-free as a counter + * mismatch. + */ + @Test + public void testConcurrentCloseRunsTeardownExactlyOnce() throws Exception { + assertMemoryLeak(() -> { + final int threads = 4; + final int iterations = 200; + for (int i = 0; i < iterations; i++) { + StubWebSocketClient client = new StubWebSocketClient(); + CyclicBarrier barrier = new CyclicBarrier(threads); + AtomicReference failure = new AtomicReference<>(); + Thread[] closers = new Thread[threads]; + for (int t = 0; t < threads; t++) { + closers[t] = new Thread(() -> { + try { + barrier.await(); + client.close(); + } catch (Throwable e) { + failure.compareAndSet(null, e); + } + }); + closers[t].start(); + } + for (Thread closer : closers) { + closer.join(); + } + Throwable t = failure.get(); + if (t != null) { + throw new AssertionError("concurrent close failed on iteration " + i, t); + } + } + }); + } + @Test public void testExtractMaxBatchSizeAbsentHeaderReturnsZero() throws Exception { String response = "HTTP/1.1 101 Switching Protocols\r\n" @@ -263,7 +308,7 @@ public int send(long bufferPtr, int bufferLen) { } @Override - public void startTlsSession(CharSequence peerName) { + public void startTlsSession(CharSequence peerName, SocketReadinessWaiter waiter) { throw new UnsupportedOperationException(); } diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/BackgroundConnectTimeoutDefaultTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/BackgroundConnectTimeoutDefaultTest.java new file mode 100644 index 00000000..d5f1d660 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/BackgroundConnectTimeoutDefaultTest.java @@ -0,0 +1,81 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client; + +import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender; +import org.junit.Assert; +import org.junit.Test; + +import static io.questdb.client.cutlass.qwp.client.QwpWebSocketSender.DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS; +import static io.questdb.client.cutlass.qwp.client.QwpWebSocketSender.effectiveConnectTimeoutMs; + +/** + * Background (drainer) connect walks must never inherit the untimed native + * connect that connect_timeout=0 (the default) means for the foreground. + *

    + * During an outage a drainer is routinely parked inside a blocking native + * connect ({@code nf.connectAddrInfo}) that neither unpark nor interrupt + * cancels. The drainer pool's close sequence (2.5s graceful drain + + * requestStop + 500ms + shutdownNow) then reliably lands on the failed-stop + * teardown protocol: the WebSocket client and microbatch buffers are + * deliberately leaked and the SF slot lock is held until the OS connect + * deadline (SYN retries, 60-130s on Linux) resolves the stuck call. A finite + * background default bounds that window to seconds. Foreground semantics are + * intentionally untouched: an explicit user value is honoured verbatim on + * both paths, and the foreground's unset default stays untimed. + */ +public class BackgroundConnectTimeoutDefaultTest { + + @Test + public void testBackgroundExplicitValueHonoured() { + Assert.assertEquals(500, effectiveConnectTimeoutMs(true, 500)); + Assert.assertEquals(60_000, effectiveConnectTimeoutMs(true, 60_000)); + } + + @Test + public void testBackgroundUnsetGetsFiniteDefault() { + Assert.assertEquals(DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS, effectiveConnectTimeoutMs(true, 0)); + // Defensive: builder validation rejects negatives, but the resolver + // must not turn a bad value back into an untimed background connect. + Assert.assertEquals(DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS, effectiveConnectTimeoutMs(true, -1)); + } + + @Test + public void testDefaultIsFinite() { + Assert.assertTrue(DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS > 0); + } + + @Test + public void testForegroundExplicitValueHonoured() { + Assert.assertEquals(500, effectiveConnectTimeoutMs(false, 500)); + } + + @Test + public void testForegroundUnsetStaysUntimed() { + // 0 => WebSocketClient falls back to nf.connectAddrInfo (OS-bounded). + // Historical foreground behaviour, deliberately preserved. + Assert.assertEquals(0, effectiveConnectTimeoutMs(false, 0)); + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseDrainTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseDrainTest.java index ef012229..a233e0e1 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseDrainTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseDrainTest.java @@ -38,8 +38,7 @@ import java.util.concurrent.atomic.AtomicLong; /** - * Regression tests for the close() drain semantics specified in - * design/qwp-cursor-durability.md. + * Regression tests for the close() drain semantics. *

    * Without {@code close_flush_timeout_millis}, close() returned as soon as * the cursor I/O loop's {@code running} flag flipped — meaning frames diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseSafetyNetTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseSafetyNetTest.java index fe3bb059..2a266212 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseSafetyNetTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseSafetyNetTest.java @@ -30,6 +30,7 @@ import io.questdb.client.SenderErrorHandler; import io.questdb.client.cutlass.line.LineSenderException; import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender; +import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer; import org.jetbrains.annotations.NotNull; import org.junit.Assert; import org.junit.Rule; @@ -62,47 +63,59 @@ public class CloseSafetyNetTest { public final TemporaryFolder sfDir = TemporaryFolder.builder().assureDeletion().build(); @Test(timeout = 30_000) - public void testCloseRethrowsUnsurfacedTerminalWithoutCustomHandler() { - // No server, no handler, tight reconnect budget: the I/O thread - // latches a never-connected budget-exhaustion terminal that nothing - // has surfaced to the user. close() must throw it. - Sender sender = Sender.fromConfig(cfg()); - boolean closed = false; - try { - awaitLatchedTerminal((QwpWebSocketSender) sender); + public void testCloseRethrowsUnsurfacedTerminalWithoutCustomHandler() throws Exception { + // A 401 server, no handler: the I/O thread latches a genuine auth + // terminal (ws-upgrade-failed / SECURITY_ERROR) that nothing has + // surfaced to the user. close() must throw it. (Under Invariant B a + // mere connection error would retry forever and never latch -- only a + // genuine terminal like auth does.) + try (TestWebSocketServer server = new TestWebSocketServer(NOOP_HANDLER)) { + server.setRejectWithStatus(401, "Unauthorized"); + server.start(); + Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + Sender sender = Sender.fromConfig(cfg(server.getPort())); + boolean closed = false; try { - closed = true; - sender.close(); - Assert.fail("close() must rethrow a terminal error that no synchronous " - + "caller and no custom handler has seen"); - } catch (LineSenderException e) { - String msg = e.getMessage() == null ? "" : e.getMessage(); - Assert.assertTrue("close() must rethrow the latched terminal: " + msg, - msg.contains("never-connected-budget-exhausted")); - Assert.assertTrue("the latched instance is the typed server exception", - e instanceof LineSenderServerException); - } - } finally { - if (!closed) { - sender.close(); + awaitLatchedTerminal((QwpWebSocketSender) sender); + try { + closed = true; + sender.close(); + Assert.fail("close() must rethrow a terminal error that no synchronous " + + "caller and no custom handler has seen"); + } catch (LineSenderException e) { + String msg = e.getMessage() == null ? "" : e.getMessage(); + Assert.assertTrue("close() must rethrow the latched terminal: " + msg, + msg.contains("ws-upgrade-failed") || msg.contains("401")); + Assert.assertTrue("the latched instance is the typed server exception", + e instanceof LineSenderServerException); + } + } finally { + if (!closed) { + sender.close(); + } } } } @Test(timeout = 30_000) public void testCloseStaysSilentWhenCustomHandlerAlreadyDelivered() throws Exception { - // Same terminal, but the user installed a custom error handler and + // Same auth terminal, but the user installed a custom error handler and // the dispatcher delivered the error to it. close() must NOT // double-signal. - ErrorInbox inbox = new ErrorInbox(); - Sender sender = Sender.builder(cfg()) - .errorHandler(inbox) - .build(); - Assert.assertTrue("terminal must reach the custom handler within 10s", - inbox.await(10, TimeUnit.SECONDS)); - Assert.assertNotNull(inbox.get()); - // The handler owns the error now; a rethrow here would double-signal. - sender.close(); + try (TestWebSocketServer server = new TestWebSocketServer(NOOP_HANDLER)) { + server.setRejectWithStatus(401, "Unauthorized"); + server.start(); + Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + ErrorInbox inbox = new ErrorInbox(); + Sender sender = Sender.builder(cfg(server.getPort())) + .errorHandler(inbox) + .build(); + Assert.assertTrue("terminal must reach the custom handler within 10s", + inbox.await(10, TimeUnit.SECONDS)); + Assert.assertNotNull(inbox.get()); + // The handler owns the error now; a rethrow here would double-signal. + sender.close(); + } } /** @@ -120,8 +133,8 @@ private static void awaitLatchedTerminal(QwpWebSocketSender sender) { } } - private String cfg() { - return "ws::addr=localhost:" + TestPorts.findUnusedPort() + private String cfg(int port) { + return "ws::addr=localhost:" + port + ";sf_dir=" + sfDir.getRoot().getAbsolutePath() + ";initial_connect_retry=async" + ";reconnect_max_duration_millis=400" @@ -130,6 +143,13 @@ private String cfg() { + ";close_flush_timeout_millis=0;"; } + private static final TestWebSocketServer.WebSocketServerHandler NOOP_HANDLER = + new TestWebSocketServer.WebSocketServerHandler() { + @Override + public void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) { + } + }; + private static class ErrorInbox implements SenderErrorHandler { private final CountDownLatch latch = new CountDownLatch(1); private final AtomicReference ref = new AtomicReference<>(); diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectAsyncTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectAsyncTest.java index 0733de8f..fd1c604c 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectAsyncTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectAsyncTest.java @@ -49,10 +49,11 @@ /** * Behavior of {@code initial_connect_retry=async}: the producer-thread * {@code Sender.fromConfig} must return immediately even when no server - * is reachable; the I/O thread retries connect in the background, and - * terminal failures (auth/upgrade reject, budget exhaustion) are - * delivered through the async error inbox rather than thrown at the - * call site. + * is reachable; the I/O thread retries connect in the background. Plain + * connect failures are retried indefinitely (Invariant B: no wall-clock + * budget give-up); only genuine terminals (auth/upgrade reject, + * durable-ack capability gap) are delivered through the async error + * inbox rather than thrown at the call site. */ public class InitialConnectAsyncTest { @@ -106,17 +107,19 @@ public void testAsyncAuthFailureDeliversToErrorInbox() throws Exception { } @Test - public void testAsyncBudgetExhaustionDeliversToErrorInbox() throws Exception { - // No server. With async mode and a tight cap, the I/O thread - // exhausts its connect budget and surfaces a SenderError to the - // user-supplied handler. fromConfig itself does not throw; only - // close() rethrows the latched terminal so a user who never - // installed a handler still sees the failure on shutdown. + public void testAsyncNoServerRetriesForeverNoTerminal() throws Exception { + // INVARIANT B: an SF sender in async mode pointed at a dead port must + // NEVER surface a connection-error terminal -- a down server is transient + // (it may appear; the data is safe in SF), so the I/O thread retries + // forever. reconnect_max_duration_millis is IGNORED as a give-up deadline: + // no SenderError lands, the sender stays usable, and wasEverConnected() + // stays false. Only a GENUINE terminal (auth/upgrade) or SF exhaustion may + // surface -- see testAsyncAuthFailureDeliversToErrorInbox. int port = TestPorts.findUnusedPort(); ErrorInbox inbox = new ErrorInbox(); String cfg = "ws::addr=localhost:" + port + sfDirOpt() + ";initial_connect_retry=async" - + ";reconnect_max_duration_millis=400" + + ";reconnect_max_duration_millis=200" + ";reconnect_initial_backoff_millis=10" + ";reconnect_max_backoff_millis=50" + ";close_flush_timeout_millis=0;"; @@ -124,38 +127,25 @@ public void testAsyncBudgetExhaustionDeliversToErrorInbox() throws Exception { .errorHandler(inbox) .build(); try { - // Wait up to 5s for the I/O thread to exhaust its budget. - Assert.assertTrue( - "async budget exhaustion must surface a SenderError within 5s", - inbox.await(5, TimeUnit.SECONDS)); - SenderError err = inbox.get(); - Assert.assertNotNull( - "async budget exhaustion must surface a SenderError to the inbox", - err); - Assert.assertEquals( - "budget exhaustion is a HALT-policy terminal", - SenderError.Policy.HALT, err.getAppliedPolicy()); - Assert.assertEquals( - "category must be PROTOCOL_VIOLATION for budget exhaustion", - SenderError.Category.PROTOCOL_VIOLATION, err.getCategory()); - String msg = err.getServerMessage() == null ? "" : err.getServerMessage(); - Assert.assertTrue( - "error message must use never-connected tag (no successful connect): " + msg, - msg.contains("never-connected-budget-exhausted")); - Assert.assertTrue( - "error message must hint at config-likely cause: " + msg, - msg.contains("never reached the server")); + // Observe well past the (ignored) 200ms budget: no terminal lands. Assert.assertFalse( - "wasEverConnected() must be false when no connect ever succeeded", + "async SF sender must NOT surface a connection-error terminal " + + "(Invariant B: retries forever past the budget)", + inbox.await(1500, TimeUnit.MILLISECONDS)); + Assert.assertNull("no SenderError may be delivered for a down server", inbox.get()); + // Sender stays usable -- producer keeps appending to SF. + sender.table("foo").longColumn("v", 1L).atNow(); + sender.flush(); + Assert.assertFalse( + "wasEverConnected() stays false while no server is reachable", ((QwpWebSocketSender) sender).wasEverConnected()); } finally { - assertCloseRethrowsTerminal(sender, - "never-connected-budget-exhausted"); + sender.close(); } } @Test - public void testAsyncDeliversBufferedRowsWhenServerArrivesLate() { + public void testAsyncDeliversBufferedRowsWhenServerArrivesLate() throws Exception { // Sender opens before the server is listening. Frames are // appended to the cursor SF engine on the producer thread. The // I/O thread retries connect in the background; once the server @@ -169,7 +159,10 @@ public void testAsyncDeliversBufferedRowsWhenServerArrivesLate() { + ";reconnect_initial_backoff_millis=20" + ";reconnect_max_backoff_millis=200" + ";close_flush_timeout_millis=2000;"; - try (Sender sender = Sender.fromConfig(cfg)) { + // fromConfig/flush/setup failures must fail the test -- only + // close() teardown noise is tolerated (see closeQuietly). + Sender sender = Sender.fromConfig(cfg); + try { QwpWebSocketSender wss = (QwpWebSocketSender) sender; // wasEverConnected starts false in async mode — the I/O // thread has not yet completed an upgrade. @@ -198,9 +191,9 @@ public void testAsyncDeliversBufferedRowsWhenServerArrivesLate() { Assert.assertTrue( "wasEverConnected() must flip to true after the I/O thread connects", ((QwpWebSocketSender) sender).wasEverConnected()); + } finally { + closeQuietly(sender); } - } catch (Exception ignored) { - // already closed } } @@ -233,13 +226,12 @@ public void testAsyncReturnsImmediatelyWithNoServer() { } @Test - public void testConnectionLostBudgetExhaustionTagsDifferently() { - // Server is up at first (initial connect succeeds + ACKs one - // batch), then we tear it down. The I/O loop tries to reconnect, - // every attempt hits TCP refused, and the budget exhausts. - // Because the loop did connect at least once before the outage, - // the SenderError must use the connection-lost tag and the sender - // must report wasEverConnected()==true. + public void testConnectionLostRetriesForeverNoTerminal() throws Exception { + // INVARIANT B: after a successful connect, if the server drops, the + // mid-stream reconnect must retry FOREVER -- it must NEVER surface a + // connection-lost terminal on a wall-clock budget. The rows are safe in + // SF and the server may return, so reconnect_max_duration_millis is + // ignored as a give-up deadline. wasEverConnected() stays true. AckHandler handler = new AckHandler(); try (TestWebSocketServer server = new TestWebSocketServer(handler)) { int port = server.getPort(); @@ -248,7 +240,7 @@ public void testConnectionLostBudgetExhaustionTagsDifferently() { ErrorInbox inbox = new ErrorInbox(); String cfg = "ws::addr=localhost:" + port - + ";reconnect_max_duration_millis=400" + + ";reconnect_max_duration_millis=200" + ";reconnect_initial_backoff_millis=10" + ";reconnect_max_backoff_millis=50" + ";close_flush_timeout_millis=0;"; @@ -265,54 +257,48 @@ public void testConnectionLostBudgetExhaustionTagsDifferently() { "wasEverConnected() must be true after a successful connect", ((QwpWebSocketSender) sender).wasEverConnected()); - // Tear the server down. The cursor I/O loop's tryReceiveAcks - // polls every 50us and discovers the peer disconnect on its - // own, then enters the reconnect loop and exhausts the - // 400ms budget — no producer activity required. + // Tear the server down. The I/O loop discovers the disconnect and + // enters reconnect -- which must retry forever, NOT surface a + // terminal on the (ignored) 200ms budget. server.close(); - Assert.assertTrue("budget exhaustion must surface a SenderError within 5s", - inbox.await(5, TimeUnit.SECONDS)); - SenderError err = inbox.get(); - Assert.assertNotNull("budget exhaustion must surface a SenderError", err); - String msg = err.getServerMessage() == null ? "" : err.getServerMessage(); - Assert.assertTrue( - "error message must use connection-lost tag: " + msg, - msg.contains("connection-lost-budget-exhausted")); - Assert.assertTrue( - "error message must hint at transient cause: " + msg, - msg.contains("server unreachable since last connect")); + Assert.assertFalse( + "mid-stream reconnect must NOT surface a connection-lost terminal " + + "(Invariant B: retries forever past the budget)", + inbox.await(1500, TimeUnit.MILLISECONDS)); + Assert.assertNull("no terminal may be delivered on a transient outage", inbox.get()); Assert.assertTrue( "wasEverConnected() must remain true after the outage", ((QwpWebSocketSender) sender).wasEverConnected()); } finally { - assertCloseRethrowsTerminal(sender, "connection-lost-budget-exhausted"); + // closeQuietly (not a bare close()) so a close-path exception + // cannot replace a pending AssertionError from the contract + // assertions above and mask a genuine failure. + closeQuietly(sender); } - } catch (Exception ignored) { - // already closed } } @Test - public void testWasEverConnectedTrueImmediatelyInSyncMode() { + public void testWasEverConnectedTrueImmediatelyInSyncMode() throws Exception { // Default (OFF) and SYNC modes both connect on the user thread // before fromConfig returns. wasEverConnected() must therefore // already be true the instant the sender becomes visible to the // caller — there is no observable "never connected" window in - // those modes, so misclassifying a budget exhaustion as - // never-connected is impossible. + // those modes. try (TestWebSocketServer server = new TestWebSocketServer(new AckHandler())) { int port = server.getPort(); server.start(); Assert.assertTrue(server.awaitStart(5, java.util.concurrent.TimeUnit.SECONDS)); String cfg = "ws::addr=localhost:" + port + ";close_flush_timeout_millis=0;"; - try (Sender sender = Sender.fromConfig(cfg)) { + Sender sender = Sender.fromConfig(cfg); + try { Assert.assertTrue( "wasEverConnected() must be true immediately in OFF/SYNC mode", ((QwpWebSocketSender) sender).wasEverConnected()); + } finally { + closeQuietly(sender); } - } catch (Exception ignored) { - // already closed } } @@ -335,6 +321,20 @@ private static void awaitAtLeastOneConnectAttempt(QwpWebSocketSender wss) { } } + /** + * Closes the sender, tolerating close-path teardown noise only. Used + * instead of a broad {@code catch (Exception ignored)} around a whole + * test body, which would swallow fromConfig/flush/setup failures and + * let the contract assertions pass vacuously. + */ + private static void closeQuietly(Sender sender) { + try { + sender.close(); + } catch (Exception ignored) { + // close() teardown noise only + } + } + /** * Closes the sender and tolerates either outcome: * * close() throws -- the latched terminal must mention the expected @@ -362,8 +362,11 @@ private static void assertCloseRethrowsTerminal(Sender sender, String expectedSu /** * Returns a unique temp sf_dir snippet for embedding in a config - * string. initial_connect_retry on/sync/async requires sf_dir per - * spec §3.5; without it the builder rejects construction. + * string. The builder does NOT require sf_dir for any + * initial_connect_retry mode — without it the sender builds in + * memory mode and buffers rows in the in-RAM cursor ring. These + * tests set an sf_dir so the rows accumulated before the first + * successful connect are disk-backed (the durable SF path). */ private static String sfDirOpt() { String dir = java.nio.file.Paths.get( diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectRetryTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectRetryTest.java index d5c5d5af..2d775773 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectRetryTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectRetryTest.java @@ -42,9 +42,12 @@ public class InitialConnectRetryTest { /** - * Temp sf_dir for retry-mode tests. Per spec §3.5, - * initial_connect_retry on/sync/async requires sf_dir — memory-mode - * senders cannot durably retry across reconnects. + * Temp sf_dir for retry-mode tests. The builder does NOT require + * sf_dir for any initial_connect_retry mode — memory-mode senders + * share the same retry machinery, buffering rows in the in-RAM + * cursor ring instead of on disk. These tests use an sf_dir so the + * retried rows are disk-backed and the tests exercise the durable + * SF path. */ private static String makeSfDir() { return java.nio.file.Paths.get( diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/PrReviewRedTestsE2e.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/PrReviewRedTestsE2e.java index 51da7427..35da304b 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/PrReviewRedTestsE2e.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/PrReviewRedTestsE2e.java @@ -60,9 +60,9 @@ public class PrReviewRedTestsE2e { *

  • {@code fail()} auth-terminal branch (lines 437-438)
  • *
  • {@code fail()} budget-exhausted branch (lines 484-485)
  • * - * The locked spec ({@code design/qwp-cursor-error-api.md} § "Path 2: - * producer-side typed throw") requires {@code signal.terminalError = err} - * to be written BEFORE {@code errorInbox.offer(err)}. + * The error-API contract ("Path 2: producer-side typed throw") requires + * {@code signal.terminalError = err} to be written BEFORE + * {@code errorInbox.offer(err)}. *

    * Concrete consequence the spec calls out: a user-supplied error handler * that synchronously calls {@code sender.flush()} from inside diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpColumnBatchViewsTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpColumnBatchViewsTest.java index 697f3350..21b96af4 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpColumnBatchViewsTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpColumnBatchViewsTest.java @@ -76,6 +76,29 @@ public void setUp() { @After public void tearDown() { + // Safety net for exits that bypass the assertMemoryLeak wrapper; + // normally a no-op because the wrapper's finally already freed them. + freeAllocations(); + } + + /** + * Wraps a test body in {@link TestUtils#assertMemoryLeak} and frees the + * tracked allocations BEFORE the leak check fires -- LeakCheck closes at + * the end of the wrapped lambda, so freeing only in @After would run too + * late and fail every test now that the check asserts strict per-tag + * equality. + */ + private void assertMemoryLeak(TestUtils.LeakProneCode code) throws Exception { + TestUtils.assertMemoryLeak(() -> { + try { + code.run(); + } finally { + freeAllocations(); + } + }); + } + + private void freeAllocations() { for (long[] alloc : allocations) { Unsafe.free(alloc[0], alloc[1], MemoryTag.NATIVE_DEFAULT); } @@ -84,7 +107,7 @@ public void tearDown() { @Test public void testColumnViewArrayRowAddr() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); Object l = setupArrayColumnLayout(batch, new boolean[]{false, true, false}, @@ -102,7 +125,7 @@ public void testColumnViewArrayRowAddr() throws Exception { @Test public void testColumnViewBatchAccessorReturnsParent() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 1); setupLongColumnLayout(batch, 0, "x", new long[]{42L}, new boolean[]{false}); Assert.assertSame(batch, batch.column(0).batch()); @@ -111,7 +134,7 @@ public void testColumnViewBatchAccessorReturnsParent() throws Exception { @Test public void testColumnViewBinaryAccessors() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupBinaryColumnLayout(batch, new byte[][]{{0x00, 0x7F, (byte) 0xFF}, null, {0x01}}, @@ -137,7 +160,7 @@ public void testColumnViewBinaryAccessors() throws Exception { @Test public void testColumnViewBoolValue() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 5); setupBooleanColumnLayout(batch, 0, new boolean[]{true, false, true, true, false}, @@ -154,7 +177,7 @@ public void testColumnViewBoolValue() throws Exception { @Test public void testColumnViewByteValue() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 4); setupByteColumnLayout(batch, 0, new byte[]{Byte.MIN_VALUE, -1, 0, Byte.MAX_VALUE}, @@ -169,7 +192,7 @@ public void testColumnViewByteValue() throws Exception { @Test public void testColumnViewBytesPerValuePerType() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(8, 1); setupLongColumnLayout(batch, 0, "l", new long[]{0}, new boolean[]{false}); setupIntColumnLayout(batch, 1, new int[]{0}, new boolean[]{false}); @@ -193,7 +216,7 @@ public void testColumnViewBytesPerValuePerType() throws Exception { @Test public void testColumnViewCachedPerColumnIndex() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(2, 1); setupLongColumnLayout(batch, 0, "a", new long[]{1L}, new boolean[]{false}); setupLongColumnLayout(batch, 1, "b", new long[]{2L}, new boolean[]{false}); @@ -215,7 +238,7 @@ public void testColumnViewCachedPerColumnIndex() throws Exception { @Test public void testColumnViewCharValue() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupCharColumnLayout(batch, 0, new char[]{'A', 'z', '0'}, @@ -229,7 +252,7 @@ public void testColumnViewCharValue() throws Exception { @Test public void testColumnViewDecimal128Accessors() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); long[] lo = {0xFFEE_DDCC_BBAA_9988L, 0L, 0x1L}; long[] hi = {0x1122_3344_5566_7788L, 0L, 0x2L}; @@ -246,7 +269,7 @@ public void testColumnViewDecimal128Accessors() throws Exception { @Test public void testColumnViewDelegatesAgreeWithBatchPrimitives() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(5, 4); setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L, 0L, 4L}, new boolean[]{false, false, true, false}); setupIntColumnLayout(batch, 1, new int[]{10, 20, 0, 40}, new boolean[]{false, false, true, false}); @@ -279,7 +302,7 @@ public void testColumnViewDelegatesAgreeWithBatchPrimitives() throws Exception { @Test public void testColumnViewDoubleArrayElements() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupArrayColumnLayout(batch, new boolean[]{false, true, false}, @@ -293,7 +316,7 @@ public void testColumnViewDoubleArrayElements() throws Exception { @Test public void testColumnViewDoubleValue() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 4); setupDoubleColumnLayout(batch, 0, new double[]{1.5, -1.5, 0.0, Double.MAX_VALUE}, @@ -308,7 +331,7 @@ public void testColumnViewDoubleValue() throws Exception { @Test public void testColumnViewFloatValue() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupFloatColumnLayout(batch, 0, new float[]{1.5f, -1.5f, 0.0f}, @@ -322,7 +345,7 @@ public void testColumnViewFloatValue() throws Exception { @Test public void testColumnViewGeohashValue() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(3, 2); setupGeohashColumnLayout(batch, 0, "g20", new long[]{0xABCDEL, 0L}, new boolean[]{false, true}, 20); setupGeohashColumnLayout(batch, 1, "g40", new long[]{0x12345_6789AL, 0L}, new boolean[]{false, true}, 40); @@ -344,7 +367,7 @@ public void testColumnViewGeohashValue() throws Exception { @Test public void testColumnViewGetColumnIndex() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(3, 1); setupLongColumnLayout(batch, 0, "a", new long[]{0}, new boolean[]{false}); setupLongColumnLayout(batch, 1, "b", new long[]{0}, new boolean[]{false}); @@ -357,7 +380,7 @@ public void testColumnViewGetColumnIndex() throws Exception { @Test public void testColumnViewGetColumnWireType() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(2, 1); setupLongColumnLayout(batch, 0, "l", new long[]{0}, new boolean[]{false}); setupVarcharColumnLayout(batch, 1, "s", new String[]{""}, new boolean[]{false}); @@ -368,7 +391,7 @@ public void testColumnViewGetColumnWireType() throws Exception { @Test public void testColumnViewIntValue() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 4); setupIntColumnLayout(batch, 0, new int[]{Integer.MIN_VALUE + 1, -1, 0, Integer.MAX_VALUE}, @@ -383,7 +406,7 @@ public void testColumnViewIntValue() throws Exception { @Test public void testColumnViewLong256AndLong256Word() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); long[][] words = {{0xAAAAL, 0xBBBBL, 0xCCCCL, 0xDDDDL}, {0L, 0L, 0L, 0L}}; setupLong256ColumnLayout(batch, words, new boolean[]{false, true}); @@ -409,7 +432,7 @@ public void testColumnViewLong256AndLong256Word() throws Exception { @Test public void testColumnViewLongValue() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 4); setupLongColumnLayout(batch, 0, "l", new long[]{Long.MIN_VALUE + 1, -1L, 0L, Long.MAX_VALUE}, @@ -424,7 +447,7 @@ public void testColumnViewLongValue() throws Exception { @Test public void testColumnViewNonNullCount() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 5); setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L, 0L, 4L, 0L}, @@ -435,7 +458,7 @@ public void testColumnViewNonNullCount() throws Exception { @Test public void testColumnViewNonNullIndex() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 5); // Rows 1 and 3 are NULL; dense indices for non-null rows are 0, 1, 2. setupLongColumnLayout(batch, 0, "l", @@ -452,7 +475,7 @@ public void testColumnViewNonNullIndex() throws Exception { public void testColumnViewNonNullIndexNoNulls() throws Exception { // When there are no nulls, dense index equals row index (layout skips the // nonNullIdx fill; the method just returns the row back). - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L, 3L}, @@ -466,7 +489,7 @@ public void testColumnViewNonNullIndexNoNulls() throws Exception { @Test public void testColumnViewNullBitmapAddrNoNulls() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L, 3L}, @@ -477,7 +500,7 @@ public void testColumnViewNullBitmapAddrNoNulls() throws Exception { @Test public void testColumnViewNullBitmapAddrWithNulls() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 5); setupLongColumnLayout(batch, 0, "l", new long[]{1L, 0L, 3L, 0L, 5L}, @@ -496,7 +519,7 @@ public void testColumnViewNullBitmapAddrWithNulls() throws Exception { @Test public void testColumnViewNullValuesReturnTypeSentinels() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(6, 1); setupLongColumnLayout(batch, 0, "l", new long[]{0L}, new boolean[]{true}); setupIntColumnLayout(batch, 1, new int[]{0}, new boolean[]{true}); @@ -519,7 +542,7 @@ public void testColumnViewNullValuesReturnTypeSentinels() throws Exception { @Test public void testColumnViewOfReturnsThis() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(2, 1); setupLongColumnLayout(batch, 0, "a", new long[]{1L}, new boolean[]{false}); setupLongColumnLayout(batch, 1, "b", new long[]{2L}, new boolean[]{false}); @@ -532,7 +555,7 @@ public void testColumnViewOfReturnsThis() throws Exception { @Test public void testColumnViewRebindingPicksUpFreshLayout() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L}, new boolean[]{false, false}); ColumnView col = batch.column(0); @@ -560,7 +583,7 @@ public void testColumnViewRebindingPicksUpFreshLayout() throws Exception { @Test public void testColumnViewShortValue() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 4); setupShortColumnLayout(batch, 0, new short[]{Short.MIN_VALUE + 1, -1, 0, Short.MAX_VALUE}, @@ -577,7 +600,7 @@ public void testColumnViewShortValue() throws Exception { public void testColumnViewStrBDualHold() throws Exception { // strA and strB are independent slots; a call to strB must not invalidate // an already-obtained strA view, and vice-versa. - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupVarcharColumnLayout(batch, 0, "s", new String[]{"alpha", "beta", null}, @@ -596,7 +619,7 @@ public void testColumnViewStrBDualHold() throws Exception { @Test public void testColumnViewStringHeapAllocated() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupVarcharColumnLayout(batch, 0, "s", new String[]{"alpha", null, "gamma"}, @@ -610,7 +633,7 @@ public void testColumnViewStringHeapAllocated() throws Exception { @Test public void testColumnViewStringSink() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupVarcharColumnLayout(batch, 0, "s", new String[]{"alpha", null, "gamma"}, @@ -632,7 +655,7 @@ public void testColumnViewStringSink() throws Exception { @Test public void testColumnViewSymbolAccessors() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 5); String[] dict = {"AAPL", "MSFT", "GOOG"}; int[] rowIds = {0, 1, 0, 2, -1}; @@ -662,7 +685,7 @@ public void testColumnViewSymbolAccessors() throws Exception { @Test public void testColumnViewUuidLoHi() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); long[] lo = {0xCAFE_BABEL, 0L}; long[] hi = {0xDEAD_BEEFL, 0L}; @@ -677,7 +700,7 @@ public void testColumnViewUuidLoHi() throws Exception { @Test public void testColumnViewUuidWithSink() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); long[] lo = {0x1111_1111_1111_1111L, 0L}; long[] hi = {0x2222_2222_2222_2222L, 0L}; @@ -693,7 +716,7 @@ public void testColumnViewUuidWithSink() throws Exception { @Test public void testColumnViewValuesAddrMatchesLayout() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(2, 1); Object lLayout = setupLongColumnLayout(batch, 0, "l", new long[]{1L}, new boolean[]{false}); Object dLayout = setupDoubleColumnLayout(batch, 1, new double[]{2.0}, new boolean[]{false}); @@ -704,7 +727,7 @@ public void testColumnViewValuesAddrMatchesLayout() throws Exception { @Test public void testColumnViewVarcharAndStringBytesAddr() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupVarcharColumnLayout(batch, 0, "v", new String[]{"hello", "world", null}, @@ -722,7 +745,7 @@ public void testColumnViewVarcharAndStringBytesAddr() throws Exception { @Test public void testForEachRowEmptyBatch() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 0); // Register a minimal layout so column() doesn't trip on null, though // forEachRow never reaches into it. @@ -741,7 +764,7 @@ public void testForEachRowEmptyBatch() throws Exception { @Test public void testForEachRowExceptionPropagates() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 5); setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L, 3L, 4L, 5L}, @@ -762,7 +785,7 @@ public void testForEachRowExceptionPropagates() throws Exception { @Test public void testForEachRowReusesSameInstance() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 4); setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L, 3L, 4L}, @@ -775,7 +798,7 @@ public void testForEachRowReusesSameInstance() throws Exception { @Test public void testForEachRowVisitsRowsInOrder() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 5); setupLongColumnLayout(batch, 0, "l", new long[]{10L, 20L, 30L, 40L, 50L}, @@ -796,7 +819,7 @@ public void testForEachRowVisitsRowsInOrder() throws Exception { @Test public void testRowViewArrayAccessors() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupArrayColumnLayout(batch, new boolean[]{false, true, false}, @@ -811,7 +834,7 @@ public void testRowViewArrayAccessors() throws Exception { @Test public void testRowViewBatchAccessor() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 1); setupLongColumnLayout(batch, 0, "x", new long[]{42L}, new boolean[]{false}); Assert.assertSame(batch, batch.row(0).batch()); @@ -820,7 +843,7 @@ public void testRowViewBatchAccessor() throws Exception { @Test public void testRowViewBinaryAccessor() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); setupBinaryColumnLayout(batch, new byte[][]{{0x00, 0x7F, (byte) 0xFF}, null}, @@ -841,7 +864,7 @@ public void testRowViewBinaryAccessor() throws Exception { @Test public void testRowViewBinaryBDualHold() throws Exception { // binaryA and binaryB are independent slots, parallel to strA/strB. - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); setupBinaryColumnLayout(batch, new byte[][]{{0x01, 0x02}, {(byte) 0xFE, (byte) 0xFF}}, @@ -860,7 +883,7 @@ public void testRowViewBinaryBDualHold() throws Exception { @Test public void testRowViewByteAndShortAndCharAndFloat() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(4, 2); setupByteColumnLayout(batch, 0, new byte[]{(byte) 127, 0}, new boolean[]{false, true}); setupShortColumnLayout(batch, 1, new short[]{(short) -32000, 0}, new boolean[]{false, true}); @@ -884,7 +907,7 @@ public void testRowViewByteAndShortAndCharAndFloat() throws Exception { @Test public void testRowViewDecimal128() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); long[] lo = {0x1122_3344_5566_7788L, 0L}; long[] hi = {0x99AA_BBCC_DDEE_FF00L, 0L}; @@ -898,7 +921,7 @@ public void testRowViewDecimal128() throws Exception { @Test public void testRowViewDelegatesAgreeWithBatchPrimitives() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(5, 4); setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L, 0L, 4L}, new boolean[]{false, false, true, false}); setupIntColumnLayout(batch, 1, new int[]{10, 20, 0, 40}, new boolean[]{false, false, true, false}); @@ -921,7 +944,7 @@ public void testRowViewDelegatesAgreeWithBatchPrimitives() throws Exception { @Test public void testRowViewGeohashValue() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); setupGeohashColumnLayout(batch, 0, "g", new long[]{0xDEAD_BEEFL, 0L}, new boolean[]{false, true}, 32); Assert.assertEquals(0xDEAD_BEEFL, batch.row(0).getGeohashValue(0)); @@ -931,7 +954,7 @@ public void testRowViewGeohashValue() throws Exception { @Test public void testRowViewGetRowIndex() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 5); setupLongColumnLayout(batch, 0, "l", new long[]{0L, 0L, 0L, 0L, 0L}, @@ -947,7 +970,7 @@ public void testRowViewGetRowIndex() throws Exception { @Test public void testRowViewLong256WithSink() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); long[][] words = {{0x1L, 0x2L, 0x3L, 0x4L}, {0L, 0L, 0L, 0L}}; setupLong256ColumnLayout(batch, words, new boolean[]{false, true}); @@ -965,7 +988,7 @@ public void testRowViewLong256WithSink() throws Exception { @Test public void testRowViewLong256Word() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); long[][] words = {{0x11L, 0x22L, 0x33L, 0x44L}, {0L, 0L, 0L, 0L}}; setupLong256ColumnLayout(batch, words, new boolean[]{false, true}); @@ -980,7 +1003,7 @@ public void testRowViewLong256Word() throws Exception { @Test public void testRowViewOfReturnsThis() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); setupLongColumnLayout(batch, 0, "l", new long[]{7L, 8L}, new boolean[]{false, false}); RowView v = batch.row(0); @@ -991,7 +1014,7 @@ public void testRowViewOfReturnsThis() throws Exception { @Test public void testRowViewSingleSharedInstance() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L, 3L}, new boolean[]{false, false, false}); RowView a = batch.row(0); @@ -1006,7 +1029,7 @@ public void testRowViewSingleSharedInstance() throws Exception { @Test public void testRowViewStrAStrBDualHold() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); setupVarcharColumnLayout(batch, 0, "s", new String[]{"alpha", "beta"}, @@ -1023,7 +1046,7 @@ public void testRowViewStrAStrBDualHold() throws Exception { @Test public void testRowViewStringAccessors() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 3); setupVarcharColumnLayout(batch, 0, "s", new String[]{"alpha", null, "gamma"}, @@ -1046,7 +1069,7 @@ public void testRowViewStringAccessors() throws Exception { @Test public void testRowViewSymbolAccessors() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 4); String[] dict = {"AAPL", "MSFT"}; int[] rowIds = {0, 1, 0, -1}; @@ -1066,7 +1089,7 @@ public void testRowViewSymbolAccessors() throws Exception { @Test public void testRowViewUuidLoHi() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); long[] lo = {0xCAFE_BABEL, 0L}; long[] hi = {0xDEAD_BEEFL, 0L}; @@ -1080,7 +1103,7 @@ public void testRowViewUuidLoHi() throws Exception { @Test public void testRowViewUuidWithSink() throws Exception { - TestUtils.assertMemoryLeak(() -> { + assertMemoryLeak(() -> { QwpColumnBatch batch = newBatch(1, 2); long[] lo = {0xAAAAL, 0L}; long[] hi = {0xBBBBL, 0L}; diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpConnectWalkBackgroundIsolationTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpConnectWalkBackgroundIsolationTest.java new file mode 100644 index 00000000..51ef603a --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpConnectWalkBackgroundIsolationTest.java @@ -0,0 +1,224 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client; + +import io.questdb.client.DefaultHttpClientConfiguration; +import io.questdb.client.cutlass.http.client.WebSocketClient; +import io.questdb.client.cutlass.line.LineSenderException; +import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop; +import io.questdb.client.network.PlainSocketFactory; +import io.questdb.client.test.tools.TestUtils; +import org.junit.Test; + +import java.util.ArrayList; +import java.util.Collections; +import java.util.List; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicReference; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertNotNull; +import static org.junit.Assert.assertNull; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; + +/** + * Coverage of the connect-walk concurrency policy (M11): no network I/O + * runs under a sender-wide lock for background work. A FOREGROUND walk + * holds the connect-walk lock across its sweep (it owns the shared round + * state and the lifecycle commits); BACKGROUND (drainer) walks take no + * lock at all — each sweeps a private {@code QwpHostHealthTracker + * .RoundCursor} and records health-only results — so a drainer sweep + * proceeds CONCURRENTLY with a foreground walk that is parked inside a + * blocking connect, and the foreground's reconnect and {@code close()} + * paths can never queue behind (or be queued behind by) a drainer's + * endpoint walk. + *

    + * The proof shape: pin a foreground walk inside {@code connect()} (lock + * held, I/O in flight), then run TWO full background sweeps to completion + * while the foreground is still parked. Under the old walk-wide lock + * (monitor or tryLock-yield) both background calls would have blocked or + * yielded; lock-free they must reach the client factory and fail with the + * ordinary end-of-round error. + */ +public class QwpConnectWalkBackgroundIsolationTest { + + /** Tracks every stub for defensive close (close() is idempotent). */ + private static final List LIVE_STUBS = + Collections.synchronizedList(new ArrayList<>()); + + @Test + public void testBackgroundSweepRunsConcurrentlyWithParkedForegroundWalk() throws Exception { + TestUtils.assertMemoryLeak(() -> { + try (QwpWebSocketSender sender = QwpWebSocketSender.createForTesting("localhost", 19999)) { + final CountDownLatch foregroundInConnect = new CountDownLatch(1); + final CountDownLatch releaseForeground = new CountDownLatch(1); + final AtomicInteger factoryCalls = new AtomicInteger(); + sender.setClientFactoryOverride(() -> { + int call = factoryCalls.incrementAndGet(); + StubClient stub = new StubClient( + call == 1 ? foregroundInConnect : null, + call == 1 ? releaseForeground : null); + LIVE_STUBS.add(stub); + return stub; + }); + + // Foreground walk on a helper thread: its stub connect() + // parks on releaseForeground, so the walk holds the + // connect-walk lock with I/O "in flight" for as long as + // this test wants. + final CursorWebSocketSendLoop.ReconnectFactory foreground = + sender.newReconnectFactory(); + final AtomicReference foregroundError = new AtomicReference<>(); + Thread fg = new Thread(() -> { + try { + foreground.reconnect(); + } catch (Throwable e) { + foregroundError.set(e); + } + }, "test-foreground-walk"); + fg.setDaemon(true); + fg.start(); + try { + assertTrue("foreground walk must reach its (blocking) connect attempt", + foregroundInConnect.await(5, TimeUnit.SECONDS)); + + // TWO background sweeps run to completion while the + // foreground is parked mid-connect. Each must reach the + // client factory (lock-free walk, no yield, no blocking) + // and fail with the ordinary end-of-round error. Two + // sweeps prove per-walk cursor independence: the second + // sweep gets its own full walk, not the first's + // exhausted cursor. + for (int sweep = 1; sweep <= 2; sweep++) { + final CursorWebSocketSendLoop.ReconnectFactory background = + sender.newBackgroundReconnectFactory(() -> false); + try { + background.reconnect(); + fail("stub connect always throws; background sweep " + sweep + + " must fail its round"); + } catch (Exception e) { + assertTrue("background sweep " + sweep + " must fail with the " + + "ordinary end-of-round error, not a lock artifact " + + "(got: " + e.getMessage() + ")", + e instanceof LineSenderException + && String.valueOf(e.getMessage()) + .contains("Failed to connect")); + } + assertEquals("background sweep " + sweep + " must have reached the " + + "client factory while the foreground is parked", + 1 + sweep, factoryCalls.get()); + assertTrue("foreground must still be parked in connect (background " + + "sweeps must not disturb it)", + fg.isAlive()); + assertNull("foreground walk must not have failed while background " + + "sweeps ran", + foregroundError.get()); + } + } finally { + releaseForeground.countDown(); + } + fg.join(5_000); + assertFalse("foreground walk thread must exit once released", fg.isAlive()); + + // The foreground's own outcome is unaffected by the two + // background sweeps that ran under it: the ordinary + // end-of-round failure for its single-endpoint round. + Throwable fgErr = foregroundError.get(); + assertNotNull("foreground walk fails its (single-endpoint) round once the " + + "stub connect throws", fgErr); + assertTrue("foreground failure is the ordinary end-of-round connect error " + + "(got: " + fgErr.getMessage() + ")", + fgErr instanceof LineSenderException + && String.valueOf(fgErr.getMessage()).contains("Failed to connect")); + } finally { + closeAllStubs(); + } + }); + } + + private static void closeAllStubs() { + synchronized (LIVE_STUBS) { + for (StubClient c : LIVE_STUBS) { + try { + c.close(); + } catch (Throwable ignored) { + // best-effort; close() is idempotent + } + } + LIVE_STUBS.clear(); + } + } + + /** + * Real-constructor stub (native buffers allocated and freed by the base + * class; the walk closes failed-attempt clients itself). {@code connect} + * optionally parks on a latch to pin the walk — and, on the foreground + * path, the connect-walk lock — then always throws, so no walk ever + * "succeeds" and reaches upgrade or lifecycle commits. + */ + private static final class StubClient extends WebSocketClient { + private final CountDownLatch entered; + private final CountDownLatch release; + + StubClient(CountDownLatch entered, CountDownLatch release) { + super(DefaultHttpClientConfiguration.INSTANCE, PlainSocketFactory.INSTANCE); + this.entered = entered; + this.release = release; + } + + @Override + public void connect(CharSequence host, int port) { + if (entered != null) { + entered.countDown(); + } + if (release != null) { + try { + if (!release.await(10, TimeUnit.SECONDS)) { + throw new RuntimeException("stub connect never released"); + } + } catch (InterruptedException e) { + Thread.currentThread().interrupt(); + throw new RuntimeException("stub connect interrupted", e); + } + } + throw new RuntimeException("stub: connection refused"); + } + + @Override + protected void ioWait(int timeout, int op) { + throw new UnsupportedOperationException("stub: no socket"); + } + + @Override + protected void setupIoWait() { + // no-op + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpHostHealthTrackerTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpHostHealthTrackerTest.java index 2ae217c4..ffd3a4b9 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpHostHealthTrackerTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpHostHealthTrackerTest.java @@ -319,4 +319,98 @@ public void testZone_ZoneIdComparisonIsCaseInsensitive() { Assert.assertEquals(QwpHostHealthTracker.ZoneTier.SAME, t.getZoneTier(0)); Assert.assertEquals(QwpHostHealthTracker.ZoneTier.OTHER, t.getZoneTier(1)); } + + @Test + public void testRoundCursor_FullSweepInLivePriorityOrderThenExhausted() { + QwpHostHealthTracker t = new QwpHostHealthTracker(3); + t.recordSuccess(2); // HEALTHY -> first + t.recordTransportError(0); // TRANSPORT_ERROR -> last + // host 1 stays UNKNOWN -> middle + QwpHostHealthTracker.RoundCursor c = t.newRoundCursor(); + Assert.assertEquals(2, c.next()); + Assert.assertEquals(1, c.next()); + Assert.assertEquals(0, c.next()); + Assert.assertEquals("cursor must be exhausted after a full sweep", -1, c.next()); + Assert.assertEquals("cursor exhaustion is sticky", -1, c.next()); + } + + @Test + public void testRoundCursor_ReRanksRemainingHostsOnLiveStateChange() { + QwpHostHealthTracker t = new QwpHostHealthTracker(3); + QwpHostHealthTracker.RoundCursor c = t.newRoundCursor(); + Assert.assertEquals(0, c.next()); // all UNKNOWN -> idx order + // Another walker observes host 2 healthy mid-sweep: it must now + // outrank the still-UNKNOWN host 1 for THIS cursor's next pick. + t.recordSuccess(2, false); + Assert.assertEquals(2, c.next()); + Assert.assertEquals(1, c.next()); + Assert.assertEquals(-1, c.next()); + } + + @Test + public void testRoundCursor_DoesNotConsumeOrDependOnSharedRound() { + QwpHostHealthTracker t = new QwpHostHealthTracker(2); + // Exhaust the SHARED round completely... + t.recordTransportError(0); + t.recordTransportError(1); + Assert.assertTrue(t.isRoundExhausted()); + Assert.assertEquals(-1, t.pickNext()); + // ...a fresh cursor still gets a FULL sweep (its attempted set is + // private), ordered by the live states. + QwpHostHealthTracker.RoundCursor c = t.newRoundCursor(); + Assert.assertEquals(0, c.next()); + Assert.assertEquals(1, c.next()); + Assert.assertEquals(-1, c.next()); + // ...and the cursor's sweep left the shared round untouched. + Assert.assertTrue(t.isRoundExhausted()); + Assert.assertEquals(-1, t.pickNext()); + } + + @Test + public void testRoundCursors_AreIndependentNoEndpointStealing() { + QwpHostHealthTracker t = new QwpHostHealthTracker(2); + QwpHostHealthTracker.RoundCursor a = t.newRoundCursor(); + QwpHostHealthTracker.RoundCursor b = t.newRoundCursor(); + // Interleaved: each cursor must sweep EVERY host exactly once; + // a's claims must not consume b's sweep or vice versa. + Assert.assertEquals(0, a.next()); + Assert.assertEquals(0, b.next()); + Assert.assertEquals(1, a.next()); + Assert.assertEquals(1, b.next()); + Assert.assertEquals(-1, a.next()); + Assert.assertEquals(-1, b.next()); + } + + @Test + public void testHealthOnlyRecords_UpdateStateButNeverTheSharedRoundBit() { + QwpHostHealthTracker t = new QwpHostHealthTracker(1); + // Health-only variants (markRoundAttempted=false): state flips... + t.recordTransportError(0, false); + Assert.assertEquals(QwpHostHealthTracker.HostState.TRANSPORT_ERROR, t.getState(0)); + t.recordRoleReject(0, true, false); + Assert.assertEquals(QwpHostHealthTracker.HostState.TRANSIENT_REJECT, t.getState(0)); + t.recordSuccess(0, false); + Assert.assertEquals(QwpHostHealthTracker.HostState.HEALTHY, t.getState(0)); + // ...but the shared round never sees an attempt: the host is still + // pickable and the round is not exhausted. This is what keeps a + // background drainer's sweep invisible to the foreground's round. + Assert.assertFalse(t.isRoundExhausted()); + Assert.assertEquals(0, t.pickNext()); + } + + @Test + public void testHealthOnlySuccess_StillFeedsStickyHealthyRecency() { + QwpHostHealthTracker t = new QwpHostHealthTracker(2); + // Foreground succeeded on 0 (round-marking), a background walker + // later succeeded on 1 (health-only): the background success is the + // most RECENT and must win the sticky-Healthy pin across + // beginRound(true). + t.recordSuccess(0); + t.recordSuccess(1, false); + t.beginRound(true); + Assert.assertEquals("most recent success (health-only or not) is sticky", + QwpHostHealthTracker.HostState.HEALTHY, t.getState(1)); + Assert.assertEquals(QwpHostHealthTracker.HostState.UNKNOWN, t.getState(0)); + Assert.assertEquals(1, t.pickNext()); + } } diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientConnectTimeoutTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientConnectTimeoutTest.java new file mode 100644 index 00000000..e0435b72 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientConnectTimeoutTest.java @@ -0,0 +1,88 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client; + +import io.questdb.client.cutlass.http.client.HttpClientException; +import io.questdb.client.cutlass.qwp.client.QwpQueryClient; +import org.junit.Assert; +import org.junit.Assume; +import org.junit.Test; + +public class QwpQueryClientConnectTimeoutTest { + + /** + * A connect-phase timeout must be reported as a connect_timeout failure, not + * relabeled as an "exceeded auth_timeout" overage. + *

    + * {@code QwpQueryClient.runUpgradeWithTimeout} used to wrap the {@code connect()} + * and {@code upgrade()} calls in one try block, so the timeout-flagged exception + * thrown by the (in-diff) connect_timeout path was caught by the {@code isTimeout()} + * branch intended for upgrade() and rewritten with the (much larger, and wrong) + * auth_timeout value -- e.g. a connect that bailed after 500 ms reported + * "exceeded auth_timeout=15000ms". The ingest side never had this because it + * routes through {@code QwpUpgradeFailures.classify}, which leaves the + * connect-timeout exception unmodified. + */ + @Test(timeout = 30_000) + public void testConnectTimeoutNotReportedAsAuthTimeout() { + // 192.0.2.0/24 is TEST-NET-1 (RFC 5737): on a normal network the SYN is + // silently dropped, so the TCP connect stalls and our application-level + // connect_timeout (500 ms) fires -- long before auth_timeout_ms (15000 ms). + // The WebSocket upgrade phase is never reached. + try (QwpQueryClient client = QwpQueryClient.fromConfig( + "ws::addr=192.0.2.1:9009;connect_timeout=500;auth_timeout_ms=15000;failover=off;target=any;")) { + long start = System.currentTimeMillis(); + try { + client.connect(); + Assert.fail("expected connect to fail"); + } catch (HttpClientException ex) { + long elapsed = System.currentTimeMillis() - start; + String msg = ex.getMessage(); + + // The connect_timeout path is only exercised when the runner routes + // TEST-NET-1 into a black hole (dropped SYN). Skip -- rather than + // flake -- on the other two outcomes: + // - no route: a fast ENETUNREACH surfaces as "could not connect". + // - (rare) the host accepts the connect: the upgrade then runs the + // full auth_timeout, so elapsed ~ auth_timeout (>5 s). + // Neither gate keys on the connect-vs-auth label, so neither can mask + // the regression: a black-holed connect always bails at ~500 ms with + // a message that is "connect timed out" (fixed) or "...auth_timeout..." + // (the bug) -- both reach the assertions below. + Assume.assumeFalse("no route to TEST-NET-1 black hole on this runner: " + msg, + msg.contains("could not connect")); + Assume.assumeTrue("TEST-NET-1 is not a black hole on this runner (elapsed=" + elapsed + "ms): " + msg, + elapsed < 5_000); + + // It bailed at connect_timeout=500 ms, nowhere near auth_timeout=15000 ms. + // Regression: name the connect phase, never auth_timeout. + Assert.assertFalse("connect-phase timeout misreported as auth_timeout: " + msg, + msg.contains("auth_timeout")); + Assert.assertTrue("expected a connect-timeout diagnostic, got: " + msg, + msg.contains("connect timed out")); + } + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientWalkTrackerTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientWalkTrackerTest.java index ee5909ce..4a254402 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientWalkTrackerTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientWalkTrackerTest.java @@ -170,10 +170,14 @@ public void testWalk_AllUnreachableThrowsHttpClientException() { // The exception type is HttpClientException (transport-only // failure mode) -- distinct from QwpRoleMismatchException which // would falsely suggest a topology issue. - int port1 = TestPorts.findUnusedPort(); - int port2 = TestPorts.findUnusedPort(); + // findUnusedPorts (plural) holds both probe sockets open at once so + // the two ports are guaranteed distinct — two separate + // findUnusedPort() calls can return the SAME port (bind-close-return + // lets the kernel recycle it immediately), which fails the config's + // duplicate-addr validation before the walk under test even runs. + int[] ports = TestPorts.findUnusedPorts(2); try (QwpQueryClient client = QwpQueryClient.fromConfig( - "ws::addr=localhost:" + port1 + ",localhost:" + port2 + ";auth_timeout_ms=300;")) { + "ws::addr=localhost:" + ports[0] + ",localhost:" + ports[1] + ";auth_timeout_ms=300;")) { try { client.connect(); Assert.fail("expected HttpClientException on unreachable hosts"); diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpRoleRejectBackoffGrowthTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpRoleRejectBackoffGrowthTest.java new file mode 100644 index 00000000..ff1b858f --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpRoleRejectBackoffGrowthTest.java @@ -0,0 +1,189 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client; + +import io.questdb.client.Sender; +import org.junit.Assert; +import org.junit.Test; + +import java.io.IOException; +import java.io.OutputStream; +import java.net.InetAddress; +import java.net.ServerSocket; +import java.net.Socket; +import java.nio.charset.StandardCharsets; +import java.util.concurrent.CopyOnWriteArrayList; +import java.util.concurrent.atomic.AtomicBoolean; + +/** + * Regression guard for the foreground role-reject retry storm. + * + *

    When every reachable endpoint role-rejects the {@code /write/v4} upgrade + * (a genuine all-replica failover window, or a misconfigured address list that + * points at replicas only), the cursor I/O loop MUST retry with the same + * capped exponential backoff-with-jitter every other reconnect branch uses -- + * NOT pin at {@code reconnect_initial_backoff_millis} forever. Pinning turned + * this into a fixed ~10/s storm of fresh TLS handshakes (new + * {@code WebSocketClient} + new {@code SSLContext} + trust-store re-read) per + * endpoint, in breach of the documented capped-exponential-backoff contract and + * asymmetric with the orphan drainer, which already grows to + * {@code reconnect_max_backoff_millis}. + * + *

    The server here is plaintext loopback, so a role-reject upgrade completes + * in well under a millisecond and the wall-clock gap between successive attempts + * is dominated by the backoff park. Under the old fixed-interval bug every gap + * stayed {@code ~= reconnect_initial_backoff_millis}; under capped exponential + * backoff a later gap climbs many multiples past it. + */ +public class QwpRoleRejectBackoffGrowthTest { + + @Test(timeout = 30_000) + public void testRoleRejectRetryUsesCappedExponentialBackoff() throws Exception { + try (RoleRejectServer server = new RoleRejectServer()) { + server.start(); + + final long initialBackoffMillis = 50; + String cfg = "ws::addr=127.0.0.1:" + server.port() + + ";reconnect_initial_backoff_millis=" + initialBackoffMillis + + ";reconnect_max_backoff_millis=4000" + + ";auth_timeout_ms=2000" + + ";auto_flush_rows=1" + + ";close_flush_timeout_millis=0" + + ";initial_connect_retry=async;"; + + try (Sender sender = Sender.fromConfig(cfg)) { + // Kick the I/O thread into the connect/role-reject loop. + sender.table("t").longColumn("v", 1L).atNow(); + // Wait for enough attempts to observe several backoff doublings: + // the parked gaps run ~50, ~100, ~200, ~400, ~800, ~1600 ms + // (+jitter). Seven attempts give six gaps up to the ~1600 ms step. + waitFor(() -> server.attemptNanos.size() >= 7, 25_000); + } + + Long[] ts = server.attemptNanos.toArray(new Long[0]); + Assert.assertTrue("expected at least 7 upgrade attempts, got " + ts.length, ts.length >= 7); + + long firstGapMs = (ts[1] - ts[0]) / 1_000_000L; + long maxGapMs = 0; + StringBuilder gaps = new StringBuilder(); + for (int i = 1; i < ts.length; i++) { + long gapMs = (ts[i] - ts[i - 1]) / 1_000_000L; + gaps.append(gapMs).append(i < ts.length - 1 ? "," : ""); + if (gapMs > maxGapMs) { + maxGapMs = gapMs; + } + } + + // Under the fixed-interval bug every gap stayed ~= 50 ms (no jitter, + // no growth) over a sub-millisecond plaintext handshake, so maxGap + // could never climb past ~60 ms. Capped exponential backoff drives a + // later gap to 400 ms+ by the fourth doubling. Require maxGap to reach + // at least 4x the initial interval: unreachable under the old + // behaviour, comfortably cleared under the new one. + Assert.assertTrue( + "role-reject backoff did not grow (fixed-interval storm): gaps=[" + gaps + + "]ms maxGap=" + maxGapMs + "ms firstGap=" + firstGapMs + + "ms initial=" + initialBackoffMillis + "ms", + maxGapMs >= initialBackoffMillis * 4); + // And a later gap must dwarf the first, proving genuine growth rather + // than a single anomalous park. + Assert.assertTrue( + "role-reject gaps are flat, not exponential: gaps=[" + gaps + + "]ms maxGap=" + maxGapMs + "ms firstGap=" + firstGapMs + "ms", + maxGapMs >= firstGapMs * 3); + } + } + + private static void waitFor(java.util.function.BooleanSupplier cond, long timeoutMs) throws InterruptedException { + long deadline = System.currentTimeMillis() + timeoutMs; + while (System.currentTimeMillis() < deadline) { + if (cond.getAsBoolean()) { + return; + } + Thread.sleep(20); + } + } + + private static final class RoleRejectServer implements AutoCloseable { + final CopyOnWriteArrayList attemptNanos = new CopyOnWriteArrayList<>(); + private final ServerSocket socket; + private final AtomicBoolean running = new AtomicBoolean(true); + + RoleRejectServer() throws IOException { + this.socket = new ServerSocket(0, 50, InetAddress.getLoopbackAddress()); + } + + int port() { + return socket.getLocalPort(); + } + + void start() { + Thread t = new Thread(this::loop, "role-reject-backoff-server"); + t.setDaemon(true); + t.start(); + } + + @Override + public void close() throws IOException { + running.set(false); + socket.close(); + } + + private void loop() { + while (running.get()) { + try { + Socket s = socket.accept(); + Thread h = new Thread(() -> handle(s), "role-reject-backoff-handler"); + h.setDaemon(true); + h.start(); + } catch (IOException e) { + if (!running.get()) { + return; + } + } + } + } + + private void handle(Socket s) { + try (Socket sock = s) { + byte[] discard = new byte[8192]; + int n = sock.getInputStream().read(discard); + if (n < 0) { + return; + } + // Record the attempt only once we have actually read the upgrade + // request, so the timestamp reflects a real handshake attempt. + attemptNanos.add(System.nanoTime()); + String resp = "HTTP/1.1 421 Misdirected Request\r\n" + + "X-QuestDB-Role: REPLICA\r\n" + + "Content-Length: 0\r\nConnection: close\r\n\r\n"; + OutputStream out = sock.getOutputStream(); + out.write(resp.getBytes(StandardCharsets.US_ASCII)); + out.flush(); + } catch (Exception ignored) { + } + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpWebSocketSenderJvmErrorCleanupTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpWebSocketSenderJvmErrorCleanupTest.java new file mode 100644 index 00000000..4a507e99 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpWebSocketSenderJvmErrorCleanupTest.java @@ -0,0 +1,277 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client; + +import io.questdb.client.cutlass.http.client.WebSocketClient; +import io.questdb.client.cutlass.line.LineSenderException; +import io.questdb.client.cutlass.qwp.client.QwpHostHealthTracker; +import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender; +import io.questdb.client.std.Unsafe; +import org.junit.Assert; +import org.junit.Test; + +import java.lang.reflect.Constructor; +import java.lang.reflect.Field; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.function.Supplier; + +/** + * Regression coverage (M10): {@code buildAndConnect}'s connect/upgrade try + * used to catch only {@code HttpClientException} and {@code Exception}, so a + * JVM {@link java.lang.Error} (OOM, LinkageError, StackOverflowError) thrown + * mid-connect escaped with the half-built {@code WebSocketClient} open -- fd + * plus native buffers, unreachable by GC, freed only in {@code close()}. The + * fix adds a {@code catch (Error)} arm that closes the client quietly (a + * close failure under memory pressure must not mask the original Error) and + * rethrows without recording endpoint-health penalties: a JVM failure is not + * endpoint health data. + *

    + * Uses the same bare-instance pattern as + * {@code CursorWebSocketSendLoopJvmErrorTest}: {@code Unsafe.allocateInstance} + * plus reflective wiring of the fields the connect walk dereferences, with the + * {@code clientFactoryOverride} test seam substituting a stub client whose + * {@code connect()} throws. + */ +public class QwpWebSocketSenderJvmErrorCleanupTest { + + @Test + public void testErrorDuringConnectClosesClientAndStopsWalk() throws Exception { + // Two endpoints: an Error on the FIRST connect attempt must close that + // attempt's client and propagate immediately -- no walk to endpoint 2, + // no health penalty. Contrast with the Exception path (below) which + // closes, records a transport error and keeps walking. + QwpWebSocketSender sender = newBareSender(); + QwpHostHealthTracker tracker = wireEndpoints(sender, 2); + List built = new ArrayList<>(); + OutOfMemoryError oom = new OutOfMemoryError("simulated allocation failure"); + installFactory(sender, () -> { + StubClient c = newStubClient(); + c.connectError = oom; + built.add(c); + return c; + }); + + try { + invokeBuildAndConnect(sender); + Assert.fail("a JVM Error must propagate out of buildAndConnect"); + } catch (InvocationTargetException ite) { + Assert.assertSame("the original Error must surface", oom, ite.getCause()); + } + Assert.assertEquals("Error must stop the walk on the first attempt", 1, built.size()); + Assert.assertEquals("half-built client must be closed exactly once", + 1, built.get(0).closeCalls); + Assert.assertEquals("a JVM failure is not endpoint health data", + QwpHostHealthTracker.HostState.UNKNOWN, tracker.getState(0)); + Assert.assertEquals("unattempted endpoint must stay untouched", + QwpHostHealthTracker.HostState.UNKNOWN, tracker.getState(1)); + } + + @Test + public void testCloseFailureDoesNotMaskOriginalError() throws Exception { + // Under OOM, close() itself can throw. The cleanup must be + // best-effort: the ORIGINAL Error surfaces, not the close failure. + QwpWebSocketSender sender = newBareSender(); + wireEndpoints(sender, 1); + OutOfMemoryError oom = new OutOfMemoryError("simulated allocation failure"); + StubClient stub = newStubClient(); + stub.connectError = oom; + stub.throwOnClose = true; + installFactory(sender, () -> stub); + + try { + invokeBuildAndConnect(sender); + Assert.fail("a JVM Error must propagate out of buildAndConnect"); + } catch (InvocationTargetException ite) { + Assert.assertSame("close() failure must not mask the original Error", + oom, ite.getCause()); + } + Assert.assertEquals("close must have been attempted", 1, stub.closeCalls); + } + + @Test + public void testExceptionPathStillClosesAndWalksAllEndpoints() throws Exception { + // Seam sanity + behavioral contrast: a plain RuntimeException stays on + // the existing path -- close, record a transport penalty, walk the + // next endpoint, and surface LineSenderException once the round is + // exhausted. + QwpWebSocketSender sender = newBareSender(); + QwpHostHealthTracker tracker = wireEndpoints(sender, 2); + List built = new ArrayList<>(); + installFactory(sender, () -> { + StubClient c = newStubClient(); + c.connectRuntimeError = new IllegalStateException("simulated transport failure"); + built.add(c); + return c; + }); + + try { + invokeBuildAndConnect(sender); + Assert.fail("an exhausted round must surface LineSenderException"); + } catch (InvocationTargetException ite) { + Assert.assertTrue("expected LineSenderException, got " + ite.getCause(), + ite.getCause() instanceof LineSenderException); + } + Assert.assertEquals("an Exception must keep the walk going", 2, built.size()); + for (StubClient c : built) { + Assert.assertEquals("every attempt's client must be closed", 1, c.closeCalls); + } + Assert.assertEquals("Exception path records the transport penalty", + QwpHostHealthTracker.HostState.TRANSPORT_ERROR, tracker.getState(0)); + Assert.assertEquals("Exception path records the transport penalty", + QwpHostHealthTracker.HostState.TRANSPORT_ERROR, tracker.getState(1)); + } + + /** + * Bypasses the real constructor -- no wire client, engine or dispatcher + * needed. The connect walk dereferences only the fields wired below plus + * primitives whose zero-defaults are valid here (field initializers do + * not run under {@code Unsafe.allocateInstance}), plus the connect-walk + * lock, which buildAndConnect acquires unconditionally and is therefore + * wired here. + */ + private static QwpWebSocketSender newBareSender() throws Exception { + QwpWebSocketSender sender = (QwpWebSocketSender) Unsafe.getUnsafe() + .allocateInstance(QwpWebSocketSender.class); + setField(sender, "connectWalkLock", new java.util.concurrent.locks.ReentrantLock()); + return sender; + } + + private static QwpHostHealthTracker wireEndpoints(QwpWebSocketSender sender, + int count) throws Exception { + QwpWebSocketSender.Endpoint[] eps = new QwpWebSocketSender.Endpoint[count]; + for (int i = 0; i < count; i++) { + eps[i] = new QwpWebSocketSender.Endpoint("localhost", 9000 + i); + } + setField(sender, "endpoints", Arrays.asList(eps)); + QwpHostHealthTracker tracker = new QwpHostHealthTracker(count); + setField(sender, "hostTracker", tracker); + return tracker; + } + + private static void installFactory(QwpWebSocketSender sender, + Supplier factory) throws Exception { + setField(sender, "clientFactoryOverride", factory); + } + + /** + * Drives the private connect walk through its private foreground + * {@code ReconnectSupplier} (no-arg: abortCheck null means foreground; + * the bare sender's null {@code cursorSendLoop} and false {@code closed} + * make {@code isAborted()} false). + */ + private static void invokeBuildAndConnect(QwpWebSocketSender sender) throws Exception { + Class supplierClass = Class.forName( + "io.questdb.client.cutlass.qwp.client.QwpWebSocketSender$ReconnectSupplier"); + Constructor ctor = supplierClass.getDeclaredConstructor(QwpWebSocketSender.class); + ctor.setAccessible(true); + Object ctx = ctor.newInstance(sender); + Method m = QwpWebSocketSender.class.getDeclaredMethod("buildAndConnect", supplierClass); + m.setAccessible(true); + m.invoke(sender, ctx); + } + + private static StubClient newStubClient() { + try { + return (StubClient) Unsafe.getUnsafe().allocateInstance(StubClient.class); + } catch (InstantiationException e) { + throw new AssertionError(e); + } + } + + private static void setField(Object target, String name, Object value) throws Exception { + Field f = QwpWebSocketSender.class.getDeclaredField(name); + f.setAccessible(true); + f.set(target, value); + } + + /** + * Minimal stub: every method the connect walk touches is overridden so no + * base-class state (native buffers, socket) is ever dereferenced -- + * instances come from {@code Unsafe.allocateInstance}, so the base + * constructor never ran. Fields rely on zero-defaults; tests assign them + * post-allocation. + */ + private static final class StubClient extends WebSocketClient { + int closeCalls; + Error connectError; + RuntimeException connectRuntimeError; + boolean throwOnClose; + + private StubClient() { + // Never invoked -- instances come from Unsafe.allocateInstance. + super(null, null); + } + + @Override + public void close() { + closeCalls++; + if (throwOnClose) { + throw new IllegalStateException("simulated close failure under memory pressure"); + } + } + + @Override + public void connect(CharSequence host, int port) { + if (connectError != null) { + throw connectError; + } + if (connectRuntimeError != null) { + throw connectRuntimeError; + } + } + + @Override + public void setConnectTimeout(int connectTimeoutMillis) { + } + + @Override + public void setQwpClientId(String clientId) { + } + + @Override + public void setQwpMaxVersion(int maxVersion) { + } + + @Override + public void setQwpRequestDurableAck(boolean enabled) { + } + + @Override + public void upgrade(CharSequence path, int timeout, CharSequence authorizationHeader) { + } + + @Override + protected void ioWait(int timeout, int op) { + } + + @Override + protected void setupIoWait() { + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/ReconnectTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/ReconnectTest.java index 5c0f9bd2..bd619d14 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/ReconnectTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/ReconnectTest.java @@ -101,33 +101,37 @@ public void testReconnectAfterServerInducedDisconnect() throws Exception { } @Test - public void testReconnectGivesUpAfterCap() throws Exception { - // Server is up at first (initial connect succeeds + ACKs batch 1), - // then we tear it down — subsequent reconnect attempts get TCP - // connection-refused and accumulate against the budget. With a - // 500ms cap, the loop should give up well inside the test's 5s - // poll window and the next user-thread flush() must throw. + public void testReconnectNeverGivesUpInvariantB() throws Exception { + // INVARIANT B: server is up at first (initial connect + ACK), then torn + // down. The I/O loop enters reconnect and must retry FOREVER -- flush() + // must keep succeeding (publishing to on-disk SF), never surface a + // give-up / budget terminal. The rows are safe in SF and the server may + // return, so reconnect_max_duration_millis is ignored as a give-up + // deadline. try (TestWebSocketServer server = new TestWebSocketServer(new AckHandler())) { int port = server.getPort(); server.start(); Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); String cfg = "ws::addr=localhost:" + port - + ";reconnect_max_duration_millis=500" + + ";reconnect_max_duration_millis=300" + ";reconnect_initial_backoff_millis=10" + ";reconnect_max_backoff_millis=50" + ";close_flush_timeout_millis=0;"; - try (Sender sender = Sender.fromConfig(cfg)) { + Throwable observed = null; + // fromConfig/first-flush/setup failures must fail the test -- + // only close() teardown noise is tolerated in the finally below. + Sender sender = Sender.fromConfig(cfg); + try { sender.table("foo").longColumn("v", 1L).atNow(); sender.flush(); - // Tear down the server: existing client connection gets - // EOF, the I/O loop tries to reconnect, every attempt - // hits TCP refused → budget exhausts. + // Tear down the server: the I/O loop gets EOF and enters + // reconnect; every attempt hits TCP refused but must keep + // retrying past the (ignored) 300ms budget. server.close(); - Throwable observed = null; - long deadline = System.currentTimeMillis() + 5_000; + long deadline = System.currentTimeMillis() + 2_000; long iter = 0; while (System.currentTimeMillis() < deadline) { iter++; @@ -140,21 +144,18 @@ public void testReconnectGivesUpAfterCap() throws Exception { } Thread.sleep(50); } - Assert.assertNotNull( - "sender should have surfaced the terminal reconnect-cap error", - observed); - String msg = observed.getMessage() == null ? "" : observed.getMessage(); - Assert.assertTrue( - "error message must mention the give-up: " + msg, - msg.contains("reconnect failed") - || msg.contains("I/O thread failed") - || msg.contains("Failed to connect")); - } catch (LineSenderException ignored) { + } finally { + try { + sender.close(); + } catch (Exception ignored) { + // close() teardown noise -- the contract under test is the + // flush loop above, captured in `observed`. + } } - // close() rethrows the latched terminal reconnect-cap error - // (commit 052f6ee). Already observed and asserted above. - } catch (Exception ignored) { - // already closed + Assert.assertNull( + "mid-stream reconnect must retry forever, not surface a terminal " + + "(Invariant B); flush() threw: " + observed, + observed); } } diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/TestPorts.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/TestPorts.java index 43b3e8e0..ecf10800 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/TestPorts.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/TestPorts.java @@ -40,4 +40,44 @@ public static int findUnusedPort() { throw new RuntimeException("failed to allocate an ephemeral port", e); } } + + /** + * Allocates {@code n} DISTINCT ephemeral ports. All {@code n} probe + * sockets are held open simultaneously, so the kernel is forced to hand + * out {@code n} different ports; they are closed together only after + * every port has been collected. + *

    + * Do NOT emulate this with repeated {@link #findUnusedPort()} calls: + * that helper is bind-close-return, and once its probe socket closes the + * port returns to the kernel's ephemeral pool — Linux readily hands the + * just-released port straight back to the next {@code bind(0)}, so two + * back-to-back calls can return the SAME port. That exact race made a + * multi-addr config fail validation with "duplicate addr entry" in CI. + */ + public static int[] findUnusedPorts(int n) { + if (n <= 0) { + throw new IllegalArgumentException("n must be > 0: " + n); + } + ServerSocket[] sockets = new ServerSocket[n]; + int[] ports = new int[n]; + try { + for (int i = 0; i < n; i++) { + sockets[i] = new ServerSocket(0, 50, InetAddress.getLoopbackAddress()); + ports[i] = sockets[i].getLocalPort(); + } + return ports; + } catch (IOException e) { + throw new RuntimeException("failed to allocate " + n + " ephemeral ports", e); + } finally { + for (ServerSocket s : sockets) { + if (s != null) { + try { + s.close(); + } catch (IOException ignored) { + // best-effort; the probe socket carries no state + } + } + } + } + } } diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/DrainerForegroundEventIsolationTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/DrainerForegroundEventIsolationTest.java new file mode 100644 index 00000000..f9dd14ab --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/DrainerForegroundEventIsolationTest.java @@ -0,0 +1,376 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client.sf; + +import io.questdb.client.Sender; +import io.questdb.client.SenderConnectionEvent; +import io.questdb.client.SenderConnectionListener; +import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender; +import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner; +import io.questdb.client.std.Files; +import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer; +import io.questdb.client.test.tools.TestUtils; +import org.jetbrains.annotations.NotNull; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.ByteOrder; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.List; +import java.util.concurrent.TimeUnit; + +/** + * Contract: background orphan-slot drainers are invisible in the foreground + * sender's connection-event stream. {@link SenderConnectionEvent}s describe + * the FOREGROUND connection's lifecycle — the documented meaning a monitoring + * integration depends on: {@code CONNECTED} fires once when the sender first + * comes up, {@code RECONNECTED}/{@code FAILED_OVER} fire when the sender's own + * connection was re-established, {@code DISCONNECTED} fires when the sender's + * own connection dropped. A drainer connecting, reconnecting after a wire + * drop, or failing over is background bookkeeping for an orphan slot and must + * not masquerade as foreground lifecycle transitions. + *

    + * Both tests are black-box: real {@code Sender} built from config, real + * {@link TestWebSocketServer}, events captured through the public + * {@code connectionListener} builder hook. They do not care HOW drainer + * connects are isolated from foreground state — any implementation that keeps + * drainer activity out of the user-visible event stream passes. + *

    + * Barriers: the drain outcome is awaited via the public drainer counters + * before close; sender close drains the event-dispatcher inbox before + * returning, so post-close assertions observe the complete delivered stream; + * {@code getDroppedConnectionNotifications() == 0} guards the + * absence-assertions against inbox-overflow false greens. + */ +public class DrainerForegroundEventIsolationTest { + + private static final int GHOST_ROWS = 5; + + private String sfDir; + + @Before + public void setUp() { + sfDir = Paths.get(System.getProperty("java.io.tmpdir"), + "qdb-drainer-event-iso-" + System.nanoTime()).toString(); + } + + @After + public void tearDown() { + if (sfDir != null) rmDirRec(sfDir); + } + + /** + * A drainer's successful connect must not fire a foreground success event. + * The foreground connects exactly once against a healthy server and never + * drops, so the event stream must contain exactly one success-kind event: + * the initial {@code CONNECTED}. A second success-kind event means the + * drainer's connect leaked into the foreground lifecycle stream (today it + * surfaces as a fabricated {@code RECONNECTED}/{@code FAILED_OVER} while + * the foreground connection never went away). + */ + @Test + public void testDrainerConnectMustNotFireForegroundSuccessEvents() throws Exception { + TestUtils.assertMemoryLeak(() -> { + seedGhostSlot(); + + RecordingListener events = new RecordingListener(); + AckAllHandler handler = new AckAllHandler(); + try (TestWebSocketServer server = new TestWebSocketServer(handler)) { + server.start(); + Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + + String cfg = "ws::addr=localhost:" + server.getPort() + + ";sf_dir=" + sfDir + + ";sender_id=primary" + + ";drain_orphans=true" + + ";max_background_drainers=1;"; + try (Sender sender = Sender.builder(cfg) + .connectionListener(events) + .build()) { + QwpWebSocketSender ws = (QwpWebSocketSender) sender; + awaitDrainSuccess(ws, handler.distinctPayloads, 10_000); + Assert.assertEquals( + "absence-assertions require a lossless event stream", + 0, ws.getDroppedConnectionNotifications()); + } + // Sender is closed: the dispatcher inbox has been drained, the + // captured list is the complete delivered stream. + List successes = events.ofKinds( + SenderConnectionEvent.Kind.CONNECTED, + SenderConnectionEvent.Kind.RECONNECTED, + SenderConnectionEvent.Kind.FAILED_OVER); + Assert.assertEquals( + "background drainer connects must be invisible in the " + + "foreground connection-event stream; expected the " + + "initial CONNECTED only, got: " + successes, + 1, successes.size()); + Assert.assertEquals( + "the single success event must be the foreground's " + + "first-connect CONNECTED", + SenderConnectionEvent.Kind.CONNECTED, + successes.get(0).getKind()); + } + }); + } + + /** + * A drainer's mid-drain wire drop must not fire a foreground + * {@code DISCONNECTED}. The server deterministically drops the drainer's + * first connection after acking one frame; the drainer reconnects and + * finishes the slot. The foreground connection is healthy for the whole + * test (it never sends and is never dropped), so a {@code DISCONNECTED} + * in the stream is a phantom: it reports an outage, against an endpoint + * the foreground is healthily using, that the foreground never had. + */ + @Test + public void testDrainerWireDropMustNotFirePhantomForegroundDisconnect() throws Exception { + TestUtils.assertMemoryLeak(() -> { + seedGhostSlot(); + + RecordingListener events = new RecordingListener(); + DropFirstDataConnectionHandler handler = new DropFirstDataConnectionHandler(); + try (TestWebSocketServer server = new TestWebSocketServer(handler)) { + server.start(); + Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + + String cfg = "ws::addr=localhost:" + server.getPort() + + ";sf_dir=" + sfDir + + ";sender_id=primary" + + ";drain_orphans=true" + + ";max_background_drainers=1;"; + try (Sender sender = Sender.builder(cfg) + .connectionListener(events) + .build()) { + QwpWebSocketSender ws = (QwpWebSocketSender) sender; + awaitDrainSuccess(ws, handler.distinctPayloads, 15_000); + // Fixture sanity: the drain really did span a wire drop — + // at least two distinct data connections served frames. + Assert.assertTrue( + "expected the drainer to reconnect after the scripted " + + "drop; data connections=" + handler.dataConnections(), + handler.dataConnections() >= 2); + Assert.assertEquals( + "absence-assertions require a lossless event stream", + 0, ws.getDroppedConnectionNotifications()); + } + List disconnects = events.ofKinds( + SenderConnectionEvent.Kind.DISCONNECTED); + Assert.assertEquals( + "a background drainer's wire drop must not surface as a " + + "foreground DISCONNECTED — the foreground connection " + + "never dropped; got: " + disconnects, + 0, disconnects.size()); + } + }); + } + + // Ghost sender against a silent server leaves an unacked orphan slot with + // GHOST_ROWS frames under the group root (same recipe as + // BackgroundDrainerEndToEndTest). + private void seedGhostSlot() throws Exception { + try (TestWebSocketServer silent = new TestWebSocketServer(new SilentHandler())) { + silent.start(); + Assert.assertTrue(silent.awaitStart(5, TimeUnit.SECONDS)); + String cfg = "ws::addr=localhost:" + silent.getPort() + + ";sf_dir=" + sfDir + + ";sender_id=ghost" + + ";close_flush_timeout_millis=0;"; + try (Sender g = Sender.fromConfig(cfg)) { + for (int i = 0; i < GHOST_ROWS; i++) { + g.table("foo").longColumn("v", i).atNow(); + g.flush(); + } + } + } + Assert.assertEquals("ghost slot must be a candidate orphan", + 1, OrphanScanner.scan(sfDir, "primary").size()); + } + + private static void awaitDrainSuccess( + QwpWebSocketSender ws, + java.util.Set distinctPayloads, + long timeoutMillis + ) throws InterruptedException { + long deadline = System.currentTimeMillis() + timeoutMillis; + while (System.currentTimeMillis() < deadline + && (distinctPayloads.size() < GHOST_ROWS + || ws.getTotalBackgroundDrainersSucceeded() < 1)) { + Thread.sleep(20); + } + Assert.assertEquals("drainer must replay every ghost-slot row", + GHOST_ROWS, distinctPayloads.size()); + Assert.assertEquals("drainer must drain the slot fully and exit cleanly", + 1, ws.getTotalBackgroundDrainersSucceeded()); + } + + private static void rmDirRec(String dir) { + if (!Files.exists(dir)) return; + long find = Files.findFirst(dir); + if (find > 0) { + try { + int rc = 1; + while (rc > 0) { + String name = Files.utf8ToString(Files.findName(find)); + if (name != null && !".".equals(name) && !"..".equals(name)) { + String child = dir + "/" + name; + if (!Files.remove(child)) rmDirRec(child); + } + rc = Files.findNext(find); + } + } finally { + Files.findClose(find); + } + } + Files.remove(dir); + } + + // status OK + wire seq + tableCount 0 — the minimal ack the non-durable + // drain path consumes (same shape as BackgroundDrainerEndToEndTest). + private static byte[] buildAck(long wireSeq) { + byte[] buf = new byte[1 + 8 + 2]; + ByteBuffer bb = ByteBuffer.wrap(buf).order(ByteOrder.LITTLE_ENDIAN); + bb.put((byte) 0x00); + bb.putLong(wireSeq); + bb.putShort((short) 0); + return buf; + } + + /** Captures every delivered event for post-close exact assertions. */ + private static final class RecordingListener implements SenderConnectionListener { + private final List captured = new ArrayList<>(); + + @Override + public synchronized void onEvent(@NotNull SenderConnectionEvent event) { + captured.add(event); + } + + synchronized List ofKinds(SenderConnectionEvent.Kind... kinds) { + List out = new ArrayList<>(); + for (int i = 0, n = captured.size(); i < n; i++) { + SenderConnectionEvent e = captured.get(i); + for (SenderConnectionEvent.Kind k : kinds) { + if (e.getKind() == k) { + out.add(e); + break; + } + } + } + return out; + } + } + + private static class SilentHandler implements TestWebSocketServer.WebSocketServerHandler { + @Override + public void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) { + // intentionally no ack + } + } + + /** + * Acks every frame with a per-connection wire sequence. The foreground + * connection never sends data in these tests, so only drainer connections + * show up here. + */ + private static class AckAllHandler implements TestWebSocketServer.WebSocketServerHandler { + final java.util.Set distinctPayloads = + java.util.Collections.synchronizedSet(new java.util.HashSet<>()); + private final java.util.Map wireSeqByConn = + new java.util.IdentityHashMap<>(); + + @Override + public synchronized void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) { + distinctPayloads.add(java.util.Arrays.toString(data)); + long[] counter = wireSeqByConn.get(client); + if (counter == null) { + counter = new long[1]; + wireSeqByConn.put(client, counter); + } + try { + client.sendBinary(buildAck(counter[0]++)); + } catch (IOException ignored) { + // best-effort: connection may be racing its own close + } + } + } + + /** + * Deterministic mid-drain wire drop. The first connection that sends a + * binary frame (the drainer — the foreground never sends in these tests) + * gets exactly one frame acked, then the server closes its socket on the + * next frame. Every later connection acks all traffic with a + * per-connection wire sequence, so the reconnected drain runs to + * completion. State is keyed per {@code ClientHandler} identity: a dead + * connection's reader can deliver late buffered frames after a newer + * connection started, and those must neither ack with a stale counter nor + * disturb the live connection (same discipline as + * BackgroundDrainerMidDrainCapabilityGapTest's GapScenarioHandler). + */ + private static class DropFirstDataConnectionHandler + implements TestWebSocketServer.WebSocketServerHandler { + final java.util.Set distinctPayloads = + java.util.Collections.synchronizedSet(new java.util.HashSet<>()); + private final List arrivalOrder = new ArrayList<>(); + private final java.util.Map wireSeqByConn = + new java.util.IdentityHashMap<>(); + + synchronized int dataConnections() { + return arrivalOrder.size(); + } + + @Override + public synchronized void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) { + distinctPayloads.add(java.util.Arrays.toString(data)); + long[] counter = wireSeqByConn.get(client); + if (counter == null) { + counter = new long[1]; + wireSeqByConn.put(client, counter); + arrivalOrder.add(client); + } + boolean firstConnection = arrivalOrder.get(0) == client; + long seq = counter[0]++; + try { + if (firstConnection) { + if (seq == 0) { + client.sendBinary(buildAck(seq)); + } else if (seq == 1) { + client.close(); // mid-drain wire drop + } + // seq > 1: late buffered frames from the condemned + // connection; ignore. + } else { + client.sendBinary(buildAck(seq)); + } + } catch (IOException ignored) { + // best-effort: the connection died under us; the drainer + // replays on its next connection + } + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/OrphanScanIntegrationTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/OrphanScanIntegrationTest.java index 2ec7b836..eccc030b 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/OrphanScanIntegrationTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/OrphanScanIntegrationTest.java @@ -39,16 +39,17 @@ import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.nio.file.Paths; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.CountDownLatch; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicLong; /** * Integration check: with {@code drain_orphans=true} the foreground sender - * sees sibling slots holding unacked data and a follow-up call to - * {@link OrphanScanner#scan} from outside the sender returns the same. - *

    - * The drainer runtime that actually empties orphan slots is a follow-up; - * this test pins down the visibility/scan piece. + * sees sibling slots holding unacked data, adopts them via the background + * drainer pool, and replays their unacked frames — after which + * {@link OrphanScanner#scan} reports no candidates, both while the adopting + * sender is still open and after it closes. */ public class OrphanScanIntegrationTest { @@ -75,7 +76,8 @@ public void testScanFindsOrphanFromPriorSenderUnderSameGroupRoot() throws Except // with sender_id=primary and drain_orphans=true. // Phase 1: ghost writes + closes; never acked. - try (TestWebSocketServer ghostServer = new TestWebSocketServer(new SilentHandler())) { + SilentHandler silent = new SilentHandler(); + try (TestWebSocketServer ghostServer = new TestWebSocketServer(silent)) { ghostServer.start(); Assert.assertTrue(ghostServer.awaitStart(5, TimeUnit.SECONDS)); @@ -85,21 +87,29 @@ public void testScanFindsOrphanFromPriorSenderUnderSameGroupRoot() throws Except try (Sender ghost = Sender.fromConfig(ghostCfg)) { ghost.table("foo").longColumn("v", 7L).atNow(); ghost.flush(); + // The frame must reach the wire before we close: on-the-wire + // implies the I/O loop read it back from the slot's .sfa, so + // the recovered slot holds publishedFsn >= 1 and the drain in + // phase 2 proves something. Without this await, + // close_flush_timeout=0 can close before the async publish + // lands and the "drain" would trivially succeed on an empty + // slot (observed as "fully drained (target=0)"). + Assert.assertTrue("ghost frame must reach the wire before close", + silent.awaitFrame(5, TimeUnit.SECONDS)); // No wait for ACK — close right away; close_flush_timeout=0 // means we don't drain. } - } catch (Exception ignored) { - // best-effort } // Independent verification: the scanner sees the ghost slot. ObjList seen = OrphanScanner.scan(sfDir, "primary"); Assert.assertEquals("ghost slot must be a candidate orphan", 1, seen.size()); Assert.assertEquals(sfDir + "/ghost", seen.get(0)); - // Phase 2: open the primary sender with drain_orphans=true. We - // can't directly assert the log output in this test, but the - // call must not throw, and the primary's own slot must NOT - // appear in a fresh scan (sender_id-filtered). + // Phase 2: open the primary sender with drain_orphans=true. The + // background drainer pool adopts the ghost slot, replays its + // unacked frames against the ACKing primaryServer, and the + // drained slot's .sfa files are removed when the drainer's + // engine closes fully drained. try (TestWebSocketServer primaryServer = new TestWebSocketServer(new AckHandler())) { primaryServer.start(); Assert.assertTrue(primaryServer.awaitStart(5, TimeUnit.SECONDS)); @@ -112,19 +122,28 @@ public void testScanFindsOrphanFromPriorSenderUnderSameGroupRoot() throws Except try (Sender primary = Sender.fromConfig(primaryCfg)) { primary.table("foo").longColumn("v", 8L).atNow(); primary.flush(); + // Await the drain while the primary is still open so this + // assertion exercises the drainer runtime itself and does + // not depend on close()'s bounded graceful-drain window. + long deadlineNanos = System.nanoTime() + TimeUnit.SECONDS.toNanos(10); + while (OrphanScanner.scan(sfDir, "primary").size() > 0 + && System.nanoTime() < deadlineNanos) { + Thread.sleep(10); + } + Assert.assertEquals( + "drainer should have adopted + drained the ghost slot " + + "while the primary sender is open", + 0, OrphanScanner.scan(sfDir, "primary").size()); } - // With drain_orphans=true, the background drainer pool adopts - // the ghost slot, replays its unacked frames against the now- - // ACKing primaryServer, and removes the drained slot dir. // Primary's own slot drains cleanly on close() and is filtered - // out by sender_id. Net: scanner sees neither. + // out by sender_id; the drained ghost slot must not resurface + // (e.g. as a spurious .failed quarantine). Net: scanner sees + // neither. ObjList postRun = OrphanScanner.scan(sfDir, "primary"); Assert.assertEquals( "drain_orphans=true should have drained + removed the " + "ghost slot; primary's own slot is sender_id-filtered", 0, postRun.size()); - } catch (Exception ignored) { - // best-effort } }); } @@ -154,20 +173,38 @@ private static void touchFile(String path) { /** Receives binary frames but never acks. Causes the sender to * leave unacked data on disk on close. */ private static class SilentHandler implements TestWebSocketServer.WebSocketServerHandler { + private final CountDownLatch frameReceived = new CountDownLatch(1); + + boolean awaitFrame(long timeout, TimeUnit unit) throws InterruptedException { + return frameReceived.await(timeout, unit); + } + @Override public void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) { - // Drop on the floor — no ACK. + // Drop on the floor — no ACK. Record receipt so the test can + // prove the frame reached the wire (hence the slot's .sfa) + // before the ghost sender closes. + frameReceived.countDown(); } } - /** Acks every binary frame. */ + /** + * Acks every binary frame. Sequence numbers are per-connection: the + * primary sender and the orphan drainer each open their own WebSocket, + * and each connection numbers its frames from 0. A single shared + * counter would hand the second connection an ack seq it never sent + * ("ACK wire seq N exceeds highest sent 0"), making the drain succeed + * only via the client's clamping fallback. + */ private static class AckHandler implements TestWebSocketServer.WebSocketServerHandler { - private final AtomicLong nextSeq = new AtomicLong(0); + private final ConcurrentHashMap seqByClient = + new ConcurrentHashMap<>(); @Override public void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) { + long seq = seqByClient.computeIfAbsent(client, k -> new AtomicLong()).getAndIncrement(); try { - client.sendBinary(buildAck(nextSeq.getAndIncrement())); + client.sendBinary(buildAck(seq)); } catch (IOException e) { throw new RuntimeException(e); } diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerDurableAckRetryTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerDurableAckRetryTest.java index 5be1b75a..0a6b69f8 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerDurableAckRetryTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerDurableAckRetryTest.java @@ -26,21 +26,26 @@ import io.questdb.client.DefaultHttpClientConfiguration; import io.questdb.client.cutlass.http.client.WebSocketClient; +import io.questdb.client.cutlass.http.client.WebSocketUpgradeException; import io.questdb.client.network.PlainSocketFactory; +import io.questdb.client.cutlass.line.LineSenderException; import io.questdb.client.cutlass.qwp.client.QwpDurableAckMismatchException; +import io.questdb.client.cutlass.qwp.client.QwpIngressRoleRejectedException; import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer; import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener; import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop; import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner; import io.questdb.client.std.Files; +import io.questdb.client.test.tools.TestUtils; import org.junit.After; import org.junit.Assert; import org.junit.Before; import org.junit.Test; -import java.io.IOException; import java.nio.file.Paths; import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; import java.util.List; import java.util.concurrent.CountDownLatch; import java.util.concurrent.TimeUnit; @@ -68,6 +73,14 @@ */ public class BackgroundDrainerDurableAckRetryTest { + /** + * Every {@link StubWebSocketClient} allocated by a test, so its ~128 KB + * of eagerly-malloced native buffers (recv + send + control) can be + * released before the leak check fires (M7). + */ + private static final List LIVE_STUBS = + Collections.synchronizedList(new ArrayList<>()); + private static final long FAST_BACKOFF_MAX_MILLIS = 4L; private static final long FAST_BACKOFF_MILLIS = 1L; private static final long FAST_RECONNECT_MAX_DURATION_MILLIS = 60_000L; @@ -85,6 +98,10 @@ public void setUp() { @After public void tearDown() { + // Safety net for exits that bypass the assertMemoryLeak wrapper; + // normally a no-op because the wrapper's finally already closed + // and cleared the stubs (close() is idempotent). + closeAllStubs(); if (slotPath == null) return; long find = Files.findFirst(slotPath); if (find > 0) { @@ -105,226 +122,787 @@ public void tearDown() { } @Test - public void testCallbackArgumentsCarrySlotPathAndAttemptNumber() { - CountingListener listener = new CountingListener(); - ScriptedFactory factory = ScriptedFactory.failingTimes(3, - () -> new QwpDurableAckMismatchException("h", 1234, "primary")); - BackgroundDrainer drainer = newDrainer(factory); - drainer.setListener(listener); - WebSocketClient out = drainer.connectWithDurableAckRetry(); - assertSame(factory.successSentinel(), out); - assertEquals(3, listener.unavailableSlotPaths.size()); - for (int i = 0; i < 3; i++) { - assertEquals(slotPath, listener.unavailableSlotPaths.get(i)); - assertEquals(Integer.valueOf(i + 1), listener.unavailableAttempts.get(i)); - } - assertEquals(0, listener.persistentFailures.get()); + public void testCallbackArgumentsCarrySlotPathAndAttemptNumber() throws Exception { + assertMemoryLeak(() -> { + CountingListener listener = new CountingListener(); + ScriptedFactory factory = ScriptedFactory.failingTimes(3, + () -> new QwpDurableAckMismatchException("h", 1234, "primary")); + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertSame(factory.successSentinel(), out); + assertEquals(3, listener.unavailableSlotPaths.size()); + for (int i = 0; i < 3; i++) { + assertEquals(slotPath, listener.unavailableSlotPaths.get(i)); + assertEquals(Integer.valueOf(i + 1), listener.unavailableAttempts.get(i)); + } + assertEquals(0, listener.persistentFailures.get()); + }); } @Test - public void testEscalatesAfterMaxAttemptsAndDropsSentinel() { - CountingListener listener = new CountingListener(); - ScriptedFactory factory = ScriptedFactory.alwaysFailing( - () -> new QwpDurableAckMismatchException("h", 1234, "primary")); - BackgroundDrainer drainer = newDrainer(factory); - drainer.setListener(listener); - WebSocketClient out = drainer.connectWithDurableAckRetry(); - assertNull("escalation must signal failure to caller", out); - assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); - // The escalation attempt itself must not also fire onDurableAckUnavailable. - // Threshold attempts trigger one persistent-failure callback and - // (threshold - 1) unavailable callbacks. - int threshold = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS; - assertEquals(threshold - 1, listener.unavailableAttempts.size()); - assertEquals(1, listener.persistentFailures.get()); - assertEquals(threshold, listener.lastPersistentTotalAttempts.get()); - assertTrue("elapsed >= 0", listener.lastPersistentElapsedMs.get() >= 0); - // Sentinel dropped with the right reason prefix. - String sentinel = slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME; - assertTrue("expected .failed sentinel at " + sentinel, Files.exists(sentinel)); - assertNotNull("lastErrorMessage populated", drainer.getLastErrorMessage()); + public void testEscalatesAfterMaxAttemptsAndDropsSentinel() throws Exception { + assertMemoryLeak(() -> { + CountingListener listener = new CountingListener(); + ScriptedFactory factory = ScriptedFactory.alwaysFailing( + () -> new QwpDurableAckMismatchException("h", 1234, "primary")); + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertNull("escalation must signal failure to caller", out); + assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); + // The escalation attempt itself must not also fire onDurableAckUnavailable. + // Threshold attempts trigger one persistent-failure callback and + // (threshold - 1) unavailable callbacks. + int threshold = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS; + assertEquals(threshold - 1, listener.unavailableAttempts.size()); + assertEquals(1, listener.persistentFailures.get()); + assertEquals(threshold, listener.lastPersistentTotalAttempts.get()); + assertTrue("elapsed >= 0", listener.lastPersistentElapsedMs.get() >= 0); + // Sentinel dropped with the right reason prefix. + String sentinel = slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME; + assertTrue("expected .failed sentinel at " + sentinel, Files.exists(sentinel)); + assertNotNull("lastErrorMessage populated", drainer.getLastErrorMessage()); + }); + } + + @Test + public void testListenerThrowingOnPersistentFailureStillMarksFailed() throws Exception { + assertMemoryLeak(() -> { + BackgroundDrainerListener throwing = new BackgroundDrainerListener() { + @Override + public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) { + throw new RuntimeException("listener boom (persistent)"); + } + + @Override + public void onDurableAckUnavailable(String slotPath, int attemptNumber) { + // no-op + } + }; + ScriptedFactory factory = ScriptedFactory.alwaysFailing( + () -> new QwpDurableAckMismatchException("h", 1234, "primary")); + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(throwing); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertNull(out); + assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); + // Sentinel must be dropped even though the listener threw. + assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + @Test + public void testListenerThrowingOnUnavailableContinuesRetrying() throws Exception { + assertMemoryLeak(() -> { + AtomicInteger unavailableCalls = new AtomicInteger(); + BackgroundDrainerListener throwing = new BackgroundDrainerListener() { + @Override + public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) { + Assert.fail("must not escalate"); + } + + @Override + public void onDurableAckUnavailable(String slotPath, int attemptNumber) { + unavailableCalls.incrementAndGet(); + throw new RuntimeException("listener boom (transient)"); + } + }; + ScriptedFactory factory = ScriptedFactory.failingTimes(3, + () -> new QwpDurableAckMismatchException("h", 1234, "primary")); + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(throwing); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertSame(factory.successSentinel(), out); + assertEquals(3, unavailableCalls.get()); + assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome()); + // No sentinel dropped on success. + assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + @Test + public void testNoListenerNoNullPointerOnEscalation() throws Exception { + assertMemoryLeak(() -> { + ScriptedFactory factory = ScriptedFactory.alwaysFailing( + () -> new QwpDurableAckMismatchException("h", 1234, "primary")); + BackgroundDrainer drainer = newDrainer(factory); + // Intentionally leave listener null. + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertNull(out); + assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); + assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + @Test + public void testTerminalUpgradeMarksFailedImmediately() throws Exception { + assertMemoryLeak(() -> { + CountingListener listener = new CountingListener(); + // A genuinely non-retriable upgrade error (non-421 5xx upgrade reject) is + // terminal -- waiting will not fix it -- so the drainer quarantines on the + // first attempt, exactly like the live sender's background loop halts on + // auth/upgrade. A TRANSPORT error, by contrast, is transient and is + // retried (see testTransportErrorNeverQuarantinesInvariantB). + ScriptedFactory factory = ScriptedFactory.alwaysFailing( + () -> new WebSocketUpgradeException(500, null, "server error during upgrade")); + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertNull(out); + assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); + // Listener must not have been touched — this path doesn't fire either callback. + assertEquals(0, listener.unavailableAttempts.size()); + assertEquals(0, listener.persistentFailures.get()); + // Sentinel dropped for a genuine terminal. + String sentinel = slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME; + assertTrue(Files.exists(sentinel)); + // The factory must have been invoked exactly once — no retry on a terminal. + assertEquals(1, factory.attempts()); + }); } @Test - public void testListenerThrowingOnPersistentFailureStillMarksFailed() { - BackgroundDrainerListener throwing = new BackgroundDrainerListener() { - @Override - public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) { - throw new RuntimeException("listener boom (persistent)"); + public void testReturnsClientOnSuccessFirstAttempt() throws Exception { + assertMemoryLeak(() -> { + CountingListener listener = new CountingListener(); + ScriptedFactory factory = ScriptedFactory.alwaysSucceeding(); + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertSame(factory.successSentinel(), out); + assertEquals(1, factory.attempts()); + assertEquals(0, listener.unavailableAttempts.size()); + assertEquals(0, listener.persistentFailures.get()); + assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome()); + assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + @Test + public void testRetriesOnDurableAckMismatchThenSucceeds() throws Exception { + assertMemoryLeak(() -> { + CountingListener listener = new CountingListener(); + int failTimes = 5; + ScriptedFactory factory = ScriptedFactory.failingTimes(failTimes, + () -> new QwpDurableAckMismatchException("h", 1234, "primary")); + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertSame(factory.successSentinel(), out); + assertEquals(failTimes + 1, factory.attempts()); + assertEquals(failTimes, listener.unavailableAttempts.size()); + for (int i = 0; i < failTimes; i++) { + assertEquals(Integer.valueOf(i + 1), listener.unavailableAttempts.get(i)); + } + assertEquals(0, listener.persistentFailures.get()); + assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome()); + assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + @Test + public void testStopRequestedDuringRetryAbortsWithStoppedOutcome() throws Exception { + assertMemoryLeak(() -> { + CountingListener listener = new CountingListener(); + // Slow factory: each attempt blocks for ~30ms throwing DA mismatch. + // Combined with a 50ms reconnectMaxDuration we'd hit budget too, + // so set a long budget and rely on requestStop() to break the loop. + CountDownLatch firstFailureSeen = new CountDownLatch(1); + ScriptedFactory factory = new ScriptedFactory( + /* successSentinel */ stubClient(), + /* throwingTimes */ Integer.MAX_VALUE, + /* throwSupplier */ () -> { + firstFailureSeen.countDown(); + return new QwpDurableAckMismatchException("h", 1234, "primary"); + }); + BackgroundDrainer drainer = newDrainerWithBudgets( + factory, /*reconnectMaxDurationMillis*/ 60_000L, + /*backoffInit*/ 5L, /*backoffMax*/ 10L); + drainer.setListener(listener); + Thread t = new Thread(drainer::connectWithDurableAckRetry, "test-helper"); + t.setDaemon(true); + t.start(); + // Wait until at least one attempt has fired, then signal stop. + Assert.assertTrue("first failure must occur promptly", + firstFailureSeen.await(2, TimeUnit.SECONDS)); + drainer.requestStop(); + t.join(5_000); + Assert.assertFalse("helper must exit after stop", t.isAlive()); + assertEquals(BackgroundDrainer.DrainOutcome.STOPPED, drainer.outcome()); + // No persistent-failure callback on stop; no sentinel dropped. + assertEquals(0, listener.persistentFailures.get()); + assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + @Test + public void testWallTimeBudgetEscalatesBeforeAttemptCap() throws Exception { + assertMemoryLeak(() -> { + CountingListener listener = new CountingListener(); + // Each failure sleeps 12ms; budget is 25ms — second iteration must + // observe deadline crossed without reaching the 16-attempt cap. + ScriptedFactory factory = new ScriptedFactory( + /* successSentinel */ stubClient(), + /* throwingTimes */ Integer.MAX_VALUE, + /* throwSupplier */ () -> { + try { + Thread.sleep(12); + } catch (InterruptedException ignored) { + Thread.currentThread().interrupt(); + } + return new QwpDurableAckMismatchException("h", 1234, "primary"); + }); + BackgroundDrainer drainer = newDrainerWithBudgets( + factory, /*reconnectMaxDurationMillis*/ 25L, + /*backoffInit*/ 1L, /*backoffMax*/ 1L); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertNull(out); + assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); + assertEquals("must escalate by wall time, not attempts", 1, listener.persistentFailures.get()); + int total = listener.lastPersistentTotalAttempts.get(); + assertTrue("escalated before reaching attempt cap (got " + total + ")", + total < BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS); + assertTrue(total >= 1); + assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + @Test + public void testAllReplicaWindowNeverEscalatesInvariantB() throws Exception { + assertMemoryLeak(() -> { + // INVARIANT B (orphan drainer): a store-and-forward drainer must NEVER + // quarantine a slot just because every reachable endpoint is a REPLICA. + // A replica is promotable and a primary will reappear, so an all-replica + // window is a TRANSIENT failover state -- the drainer must keep retrying + // (capped backoff) until a primary is reachable, stopRequested, or SF + // exhaustion. NEITHER the 16-attempt cap NOR the wall-clock reconnect + // budget may escalate it to a .failed sentinel. + // + // Distinct from testEscalatesAfterMaxAttemptsAndDropsSentinel / + // testWallTimeBudgetEscalatesBeforeAttemptCap, which use a genuine + // durable-ack CAPABILITY gap (QwpDurableAckMismatchException -- a server + // upgrades but does not advertise durable ack): that is a real config + // problem and stays terminal. This test uses a role reject (every + // endpoint is a replica right now), which must NOT be terminal. + // + // Red-first: connectWithDurableAckRetry() currently lumps role rejects in + // with the durable-ack-mismatch give-up, so after the 16-attempt cap / + // the budget it markFailed()s and returns -> the helper thread dies. Goes + // green once the drainer treats an all-replica window as retry-forever + // (split the catch: role reject -> retry; capability gap -> quarantine). + CountingListener listener = new CountingListener(); + AtomicInteger attempts = new AtomicInteger(); + ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> { + attempts.incrementAndGet(); + return new QwpIngressRoleRejectedException( + QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000); + }); + // SHORT budget + tiny backoff so BOTH give-up triggers (the 16-attempt + // cap and the 200ms wall clock) would fire promptly under the bug. + BackgroundDrainer drainer = newDrainerWithBudgets( + factory, /*reconnectMaxDurationMillis*/ 200L, /*backoffInit*/ 1L, /*backoffMax*/ 2L); + drainer.setListener(listener); + Thread t = new Thread(drainer::connectWithDurableAckRetry, "invariant-b-orphan-drainer"); + t.setDaemon(true); + t.start(); + + // Observe well past BOTH the 200ms budget and the 16-attempt cap. Under + // the bug the drainer escalates (within the cap time) and the helper + // thread dies; a contract-honoring drainer is still retrying here. + long observeUntilNanos = System.nanoTime() + 600_000_000L; // 600ms >> 200ms budget + while (System.nanoTime() < observeUntilNanos && t.isAlive()) { + Thread.sleep(10); } - @Override - public void onDurableAckUnavailable(String slotPath, int attemptNumber) { - // no-op + try { + assertTrue("orphan drainer gave up on a transient all-replica window (attempts=" + + attempts.get() + ", outcome=" + drainer.outcome() + "): Invariant B " + + "forbids quarantining a slot on the 16-attempt cap or the wall-clock " + + "reconnect budget -- a replica is promotable, so the drainer must keep " + + "retrying until a primary reappears or SF is exhausted", + t.isAlive()); + assertEquals("must not escalate a transient all-replica window to FAILED", + BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome()); + assertEquals("must not fire persistent-failure on an all-replica window", + 0, listener.persistentFailures.get()); + assertFalse("must not quarantine (.failed sentinel) an all-replica window", + Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + assertTrue("must have retried past the 16-attempt cap (got " + attempts.get() + ")", + attempts.get() > BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS); + } finally { + drainer.requestStop(); + t.join(5_000); } - }; - ScriptedFactory factory = ScriptedFactory.alwaysFailing( - () -> new QwpDurableAckMismatchException("h", 1234, "primary")); - BackgroundDrainer drainer = newDrainer(factory); - drainer.setListener(throwing); - WebSocketClient out = drainer.connectWithDurableAckRetry(); - assertNull(out); - assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); - // Sentinel must be dropped even though the listener threw. - assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + assertFalse("helper must exit after stop", t.isAlive()); + assertEquals(BackgroundDrainer.DrainOutcome.STOPPED, drainer.outcome()); + }); } @Test - public void testListenerThrowingOnUnavailableContinuesRetrying() { - AtomicInteger unavailableCalls = new AtomicInteger(); - BackgroundDrainerListener throwing = new BackgroundDrainerListener() { - @Override - public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) { - Assert.fail("must not escalate"); + public void testTransportErrorNeverQuarantinesInvariantB() throws Exception { + assertMemoryLeak(() -> { + // INVARIANT B (orphan drainer): a fully-unreachable cluster (server down, + // network partition -- every endpoint refuses / times out) is TRANSIENT, + // not terminal. The server will come back; the drainer must keep retrying + // (capped backoff) until it does, stopRequested, or SF exhaustion -- it + // must NEVER quarantine the slot on the first failed sweep. This is the + // exact behaviour of the live sender's background loop + // (CursorWebSocketSendLoop.connectLoop: a transport error backs off and + // retries), which the orphan drainer must match. + // + // Red-first: connectWithDurableAckRetry() currently routes any non-role, + // non-durable-ack Throwable (including "all endpoints unreachable") to an + // IMMEDIATE markFailed / .failed sentinel on the first attempt. Green once + // transport errors are retried indefinitely like connectLoop. (Genuine + // terminals -- auth / non-421 upgrade -- must still fail fast.) + CountingListener listener = new CountingListener(); + AtomicInteger attempts = new AtomicInteger(); + ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> { + attempts.incrementAndGet(); + return new LineSenderException( + "Failed to connect: all 2 endpoint(s) unreachable; last=127.0.0.1:9000"); + }); + BackgroundDrainer drainer = newDrainerWithBudgets( + factory, /*reconnectMaxDurationMillis*/ 200L, /*backoffInit*/ 1L, /*backoffMax*/ 2L); + drainer.setListener(listener); + Thread t = new Thread(drainer::connectWithDurableAckRetry, "invariant-b-transport-drainer"); + t.setDaemon(true); + t.start(); + + // Observe well past the 200ms budget: the drainer must still be retrying. + long observeUntilNanos = System.nanoTime() + 600_000_000L; // 600ms >> 200ms budget + while (System.nanoTime() < observeUntilNanos && t.isAlive()) { + Thread.sleep(10); } - @Override - public void onDurableAckUnavailable(String slotPath, int attemptNumber) { - unavailableCalls.incrementAndGet(); - throw new RuntimeException("listener boom (transient)"); + try { + assertTrue("orphan drainer quarantined a fully-unreachable (server-down) cluster " + + "(attempts=" + attempts.get() + ", outcome=" + drainer.outcome() + + "): Invariant B says a down server is transient -- the drainer must " + + "retry indefinitely (exactly like the live background loop), never " + + "quarantine on a transport error", + t.isAlive()); + assertEquals("must not escalate a transient transport error to FAILED", + BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome()); + assertEquals("transport retry must not fire a persistent-failure escalation", + 0, listener.persistentFailures.get()); + assertFalse("must not quarantine (.failed sentinel) a down server", + Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + assertTrue("must have retried the down server well past the first sweep (got " + + attempts.get() + ")", + attempts.get() > BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS); + } finally { + drainer.requestStop(); + t.join(5_000); } - }; - ScriptedFactory factory = ScriptedFactory.failingTimes(3, - () -> new QwpDurableAckMismatchException("h", 1234, "primary")); - BackgroundDrainer drainer = newDrainer(factory); - drainer.setListener(throwing); - WebSocketClient out = drainer.connectWithDurableAckRetry(); - assertSame(factory.successSentinel(), out); - assertEquals(3, unavailableCalls.get()); - assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome()); - // No sentinel dropped on success. - assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + assertFalse("helper must exit after stop", t.isAlive()); + assertEquals(BackgroundDrainer.DrainOutcome.STOPPED, drainer.outcome()); + }); } @Test - public void testNoListenerNoNullPointerOnEscalation() { - ScriptedFactory factory = ScriptedFactory.alwaysFailing( - () -> new QwpDurableAckMismatchException("h", 1234, "primary")); - BackgroundDrainer drainer = newDrainer(factory); - // Intentionally leave listener null. - WebSocketClient out = drainer.connectWithDurableAckRetry(); - assertNull(out); - assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); - assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + public void testJvmErrorEscapesConnectRetryLoop() throws Exception { + assertMemoryLeak(() -> { + // Regression (M3): catch (Throwable) in connectWithDurableAckRetry used + // to swallow java.lang.Error (OOM, LinkageError, StackOverflowError) + // into the indefinite "cluster unreachable" retry -- pinning the slot + // .lock forever with no .failed sentinel and only a throttled WARN as + // a trace. A JVM/programming failure is not a transport outage: + // retrying cannot clear it, so it must escape the loop on the FIRST + // sweep. run()'s outer catch then quarantines the slot (markFailed + + // FAILED) and its finally releases the lock -- quarantine-and-exit. + CountingListener listener = new CountingListener(); + ScriptedFactory factory = ScriptedFactory.alwaysFailing( + () -> new LinkageError("simulated JVM failure")); + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(listener); + try { + drainer.connectWithDurableAckRetry(); + Assert.fail("a JVM Error must escape the retry loop, " + + "not spin as a transport outage"); + } catch (LinkageError expected) { + assertEquals("simulated JVM failure", expected.getMessage()); + } + // No retry: the Error propagated on the very first attempt. + assertEquals(1, factory.attempts()); + // Neither observability callback fires -- this is not a durable-ack + // episode, and no escalation decision was made inside the loop. + assertEquals(0, listener.unavailableAttempts.size()); + assertEquals(0, listener.persistentFailures.get()); + }); } @Test - public void testNonDurableAckExceptionMarksFailedImmediately() { - CountingListener listener = new CountingListener(); - ScriptedFactory factory = ScriptedFactory.alwaysFailing( - () -> new IOException("transport down")); - BackgroundDrainer drainer = newDrainer(factory); - drainer.setListener(listener); - WebSocketClient out = drainer.connectWithDurableAckRetry(); - assertNull(out); - assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); - // Listener must not have been touched — this path doesn't fire either callback. - assertEquals(0, listener.unavailableAttempts.size()); - assertEquals(0, listener.persistentFailures.get()); - // Sentinel reason should reflect the non-DA path (initial connect: ...). - String sentinel = slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME; - assertTrue(Files.exists(sentinel)); - // The factory must have been invoked exactly once — no retry on this path. - assertEquals(1, factory.attempts()); + public void testRoleRejectChurnDoesNotConsumeCapabilityGapBudgetInvariantB() throws Exception { + assertMemoryLeak(() -> { + // Rolling-upgrade interleave: a long all-replica window (role rejects), + // then an old-build node is promoted and upgrades WITHOUT durable ack + // (genuine capability gap). The transient window must not consume the + // 16-attempt settle budget -- the gap phase gets the full budget. + int roleRejects = 20; // > the attempt cap: under the bug the first gap attempt escalates + int cap = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS; + CountingListener listener = new CountingListener(); + AtomicInteger sweeps = new AtomicInteger(); + ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> { + if (sweeps.incrementAndGet() <= roleRejects) { + return new QwpIngressRoleRejectedException( + QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000); + } + return new QwpDurableAckMismatchException("h", 1234, "primary"); + }); + // 60s wall budget: only the attempt cap can fire in this test. + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertNull(out); + assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); + assertEquals(1, listener.persistentFailures.get()); + assertEquals("escalation must count capability-gap attempts only", + cap, listener.lastPersistentTotalAttempts.get()); + assertEquals("full settle budget must be granted after the transient window", + roleRejects + cap, factory.attempts()); + // M10 split: the transient all-replica window lands on the + // onPrimaryUnavailable stream (1..20), the capability-gap episode + // on onDurableAckUnavailable (1..15; the 16th fires + // persistent-failure instead). Neither stream sees the other's + // counter, so a listener alerting on "attemptNumber approaching + // the cap" no longer false-positives on role-reject churn. + assertEquals(roleRejects, listener.primaryUnavailableAttempts.size()); + for (int i = 0; i < roleRejects; i++) { + assertEquals(Integer.valueOf(i + 1), listener.primaryUnavailableAttempts.get(i)); + } + assertEquals(cap - 1, listener.unavailableAttempts.size()); + for (int i = 0; i < cap - 1; i++) { + assertEquals(Integer.valueOf(i + 1), listener.unavailableAttempts.get(i)); + } + assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); } @Test - public void testReturnsClientOnSuccessFirstAttempt() { - CountingListener listener = new CountingListener(); - ScriptedFactory factory = ScriptedFactory.alwaysSucceeding(); - BackgroundDrainer drainer = newDrainer(factory); - drainer.setListener(listener); - WebSocketClient out = drainer.connectWithDurableAckRetry(); - assertSame(factory.successSentinel(), out); - assertEquals(1, factory.attempts()); - assertEquals(0, listener.unavailableAttempts.size()); - assertEquals(0, listener.persistentFailures.get()); - assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome()); - assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + public void testFailoverWindowDoesNotBurnCapabilityGapWallClockInvariantB() throws Exception { + assertMemoryLeak(() -> { + // The wall-clock half of the settle budget must be anchored at the + // FIRST capability-gap error, not at connect entry: an all-replica + // window that outlives reconnectMaxDurationMillis must not cause the + // first genuine capability-gap attempt to escalate on an already- + // expired deadline. Catches the partial fix (separate counter but + // entry-anchored deadline) that the attempt-cap test cannot see. + int roleRejects = 20; + long budgetMillis = 1_000L; + int cap = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS; + CountingListener listener = new CountingListener(); + AtomicInteger sweeps = new AtomicInteger(); + ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> { + if (sweeps.incrementAndGet() <= roleRejects) { + // Burn well past the wall-clock budget inside the transient + // window: 20 * 60ms = 1200ms of sleep alone >> 1000ms budget. + try { + Thread.sleep(60); + } catch (InterruptedException ignored) { + Thread.currentThread().interrupt(); + } + return new QwpIngressRoleRejectedException( + QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000); + } + return new QwpDurableAckMismatchException("h", 1234, "primary"); + }); + BackgroundDrainer drainer = newDrainerWithBudgets( + factory, budgetMillis, /*backoffInit*/ 1L, /*backoffMax*/ 2L); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertNull(out); + assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); + assertEquals(1, listener.persistentFailures.get()); + assertEquals("first gap attempt must not observe a deadline burned by the " + + "transient window -- full attempt budget expected", + cap, listener.lastPersistentTotalAttempts.get()); + assertEquals(roleRejects + cap, factory.attempts()); + assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + @Test + public void testRoleRejectResetsCapabilityGapEpisode() throws Exception { + assertMemoryLeak(() -> { + // An intervening role reject proves the topology changed (the node + // that produced earlier gap errors is gone), so the settle budget + // restarts: 15 gap errors, one role reject, then gaps again -- the + // second episode gets the full 16 attempts, it does not inherit the + // first episode's 15. + int cap = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS; + CountingListener listener = new CountingListener(); + AtomicInteger sweeps = new AtomicInteger(); + ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> { + if (sweeps.incrementAndGet() == cap) { // 16th sweep: role reject between the gap runs + return new QwpIngressRoleRejectedException( + QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000); + } + return new QwpDurableAckMismatchException("h", 1234, "primary"); + }); + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertNull(out); + assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); + assertEquals(1, listener.persistentFailures.get()); + assertEquals("second episode must get the full budget after the reset", + cap, listener.lastPersistentTotalAttempts.get()); + // 15 gap + 1 role reject + 16 gap = 32 sweeps total. + assertEquals(2 * cap, factory.attempts()); + // M10 split, per-stream: the DA stream carries both episodes' + // per-episode numbering (1..15, then 1..15 again -- the second + // episode's 16th attempt fires persistent-failure instead), and + // the reset between them is attributable: exactly one role reject + // on the primary stream. Before the split the reset was an + // ambiguous non-monotonic drop in a single stream. + List expectedDaStream = new ArrayList<>(); + for (int episode = 0; episode < 2; episode++) { + for (int i = 1; i <= cap - 1; i++) { + expectedDaStream.add(i); + } + } + assertEquals(expectedDaStream, listener.unavailableAttempts); + assertEquals(Collections.singletonList(1), listener.primaryUnavailableAttempts); + assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + @Test + public void testRoleRejectAndCapabilityGapLandOnSeparateStreams() throws Exception { + assertMemoryLeak(() -> { + // M10 discriminator: gap -> role reject -> gap -> success. The + // released 1.3.4 contract fed BOTH conditions to + // onDurableAckUnavailable, so this script produced the ambiguous + // stream [1, 1, 1] -- a listener could not tell a budget-bound + // capability-gap episode from a never-escalating role-reject + // window, and could not see WHY the episode counter reset. With + // the split, the DA stream carries only the two one-attempt gap + // episodes ([1, 1] -- the reset stays visible) and the role + // reject that caused the reset lands on the primary stream ([1]). + CountingListener listener = new CountingListener(); + AtomicInteger sweeps = new AtomicInteger(); + ScriptedFactory factory = ScriptedFactory.failingTimes(3, () -> { + if (sweeps.incrementAndGet() == 2) { + return new QwpIngressRoleRejectedException( + QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000); + } + return new QwpDurableAckMismatchException("h", 1234, "primary"); + }); + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertSame(factory.successSentinel(), out); + assertEquals(4, factory.attempts()); + assertEquals("DA stream must carry only the gap episodes, each" + + " restarting at 1 after the role-reject reset", + Arrays.asList(1, 1), listener.unavailableAttempts); + assertEquals("role reject must land on the primary stream", + Collections.singletonList(1), listener.primaryUnavailableAttempts); + assertEquals(Collections.singletonList(slotPath), listener.primaryUnavailableSlotPaths); + assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome()); + assertEquals(0, listener.persistentFailures.get()); + assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + @Test + public void testTransportErrorDoesNotResetCapabilityGapEpisode() throws Exception { + assertMemoryLeak(() -> { + // A transport blip between gap attempts does not prove promotion + // churn: it must neither consume the budget (no increment) nor + // restart it (no reset) -- otherwise a flaky-but-misconfigured + // cluster would evade the cap forever. 15 gaps, one transport error, + // one gap: escalates on that 16th gap attempt. + int cap = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS; + CountingListener listener = new CountingListener(); + AtomicInteger sweeps = new AtomicInteger(); + ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> { + if (sweeps.incrementAndGet() == cap) { // 16th sweep: transport error between the gap runs + return new LineSenderException("Failed to connect: all 2 endpoint(s) " + + "unreachable; last=127.0.0.1:9000"); + } + return new QwpDurableAckMismatchException("h", 1234, "primary"); + }); + BackgroundDrainer drainer = newDrainer(factory); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertNull(out); + assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); + assertEquals(1, listener.persistentFailures.get()); + assertEquals("transport blip must not restart the episode", + cap, listener.lastPersistentTotalAttempts.get()); + // 15 gap + 1 transport + 1 gap = 17 sweeps total. + assertEquals(cap + 1, factory.attempts()); + assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + @Test + public void testTransportWindowDoesNotBurnCapabilityGapWallClock() throws Exception { + assertMemoryLeak(() -> { + // Red-first: the wall-clock half of the settle budget is anchored at + // gap #1, and a transport window BETWEEN gap sweeps must PAUSE it -- + // only gap-to-gap time is the cluster "failing to settle". Under the + // bug the deadline keeps ticking while the cluster is unreachable: + // gap #1 anchors the deadline, the cluster then drops off the network + // for longer than the entire budget (transport errors are retried + // "forever" and charge nothing else), and when it comes back still + // briefly gapped, gap #2 observes an expired deadline and quarantines + // the slot after just 2 gap sweeps -- contradicting both the + // 16-attempt settle intent and Invariant B's "transients never + // consume the budget". Evasion is not a concern: the attempt counter + // survives the window untouched, which + // testTransportErrorDoesNotResetCapabilityGapEpisode pins. + // Here the cluster actually settles after the outage (two more gap + // sweeps, then durable-ack-capable), so the drain must proceed -- + // no escalation, no sentinel. + long budgetMillis = 250L; + CountingListener listener = new CountingListener(); + AtomicInteger sweeps = new AtomicInteger(); + ScriptedFactory factory = ScriptedFactory.failingTimes(4, () -> { + if (sweeps.incrementAndGet() == 2) { + // Cluster fully unreachable for ~2.5x the wall-clock budget. + // A real outage is time spent inside reconnect() walking + // unreachable endpoints, so model it inside the factory. + try { + Thread.sleep(budgetMillis * 2 + 100); + } catch (InterruptedException ignored) { + Thread.currentThread().interrupt(); + } + return new LineSenderException("Failed to connect: all 2 endpoint(s) " + + "unreachable; last=127.0.0.1:9000"); + } + return new QwpDurableAckMismatchException("h", 1234, "primary"); + }); + BackgroundDrainer drainer = newDrainerWithBudgets( + factory, budgetMillis, FAST_BACKOFF_MILLIS, FAST_BACKOFF_MAX_MILLIS); + drainer.setListener(listener); + WebSocketClient out = drainer.connectWithDurableAckRetry(); + assertSame("cluster recovered after the outage -- the drain must proceed, not " + + "quarantine on a wall clock burned by the transport window", + factory.successSentinel(), out); + // gap #1 + outage + gap #2 + gap #3 + success = 5 sweeps. + assertEquals(5, factory.attempts()); + assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome()); + assertEquals("transport window must not trigger persistent-failure escalation", + 0, listener.persistentFailures.get()); + assertFalse("no .failed sentinel: the slot was never in a terminal state", + Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); } @Test - public void testRetriesOnDurableAckMismatchThenSucceeds() { + public void testRoleRejectGrantsFreshWallClockToNextGapEpisode() { + // Companion to testRoleRejectResetsCapabilityGapEpisode, which pins the + // ATTEMPT-counter half of the episode reset but runs under a 60s budget + // where the wall-clock half is unobservable: a mutant that resets only + // capabilityGapAttempts (leaving capabilityGapElapsedNanos / + // lastCapabilityGapNanos ticking) passes it. This test pins the + // WALL-CLOCK half: gap sweeps burn most of the budget, a role reject + // proves the topology churned, and the next gap episode must start + // from a zero wall clock -- under the counter-only mutant the stale + // elapsed (plus the still-anchored lastCapabilityGapNanos charging + // straight across the role-reject window) exhausts the budget and + // quarantines a cluster that was about to settle. + long budgetMillis = 800L; CountingListener listener = new CountingListener(); - int failTimes = 5; - ScriptedFactory factory = ScriptedFactory.failingTimes(failTimes, - () -> new QwpDurableAckMismatchException("h", 1234, "primary")); - BackgroundDrainer drainer = newDrainer(factory); + AtomicInteger sweeps = new AtomicInteger(); + ScriptedFactory factory = ScriptedFactory.failingTimes(5, () -> { + switch (sweeps.incrementAndGet()) { + case 2: + // Burn ~600ms of the 800ms budget inside the first gap + // episode (charged by this sweep's gap-to-gap interval). + sleepQuietly(600); + return new QwpDurableAckMismatchException("h", 1234, "primary"); + case 3: + // Topology churn: the settle budget must restart in full. + return new QwpIngressRoleRejectedException( + QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000); + case 5: + // Second episode burns ~350ms -- well inside a fresh 800ms + // budget, but 600 + 350 > 800 under the mutant's carried-over + // wall clock. + sleepQuietly(350); + return new QwpDurableAckMismatchException("h", 1234, "primary"); + default: + return new QwpDurableAckMismatchException("h", 1234, "primary"); + } + }); + BackgroundDrainer drainer = newDrainerWithBudgets( + factory, budgetMillis, FAST_BACKOFF_MILLIS, FAST_BACKOFF_MAX_MILLIS); drainer.setListener(listener); WebSocketClient out = drainer.connectWithDurableAckRetry(); - assertSame(factory.successSentinel(), out); - assertEquals(failTimes + 1, factory.attempts()); - assertEquals(failTimes, listener.unavailableAttempts.size()); - for (int i = 0; i < failTimes; i++) { - assertEquals(Integer.valueOf(i + 1), listener.unavailableAttempts.get(i)); - } - assertEquals(0, listener.persistentFailures.get()); + assertSame("role reject restarts the episode wall clock -- the second gap " + + "episode must get the full settle budget, not the first " + + "episode's leftovers", + factory.successSentinel(), out); + // gap, gap(+600ms), roleReject, gap, gap(+350ms), success = 6 sweeps. + assertEquals(6, factory.attempts()); assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome()); - assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + assertEquals("a settling cluster must never see a persistent-failure escalation", + 0, listener.persistentFailures.get()); + assertFalse("no .failed sentinel: both gap episodes stayed inside their budgets", + Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + // Per-stream attempt numbering across the reset (M10 split): the DA + // stream carries gaps 1,2 then the fresh episode's 1,2; the role + // reject that restarted the episode lands on the primary stream. + assertEquals(Arrays.asList(1, 2, 1, 2), listener.unavailableAttempts); + assertEquals(Collections.singletonList(1), listener.primaryUnavailableAttempts); } @Test - public void testStopRequestedDuringRetryAbortsWithStoppedOutcome() throws Exception { - CountingListener listener = new CountingListener(); - // Slow factory: each attempt blocks for ~30ms throwing DA mismatch. - // Combined with a 50ms reconnectMaxDuration we'd hit budget too, - // so set a long budget and rely on requestStop() to break the loop. + public void testRequestStopInterruptsLongBackoffParkPromptly() throws Exception { + // Pins the stop-promptness contract of the backoff park: requestStop() + // must break the drainer out of a LONG park (unpark, backstopped by + // the 50ms STOP_CHECK_PARK_CHUNK_NANOS chunking) instead of sleeping + // out the remainder. testStopRequestedDuringRetryAbortsWithStoppedOutcome + // cannot see this: its 5-10ms backoffs complete faster than any + // reasonable join timeout, so a monolithic park with no unpark passes + // it. Here the backoff is 5s and the exit bound is 2s -- an + // implementation that parks the full backoff in one shot fails. CountDownLatch firstFailureSeen = new CountDownLatch(1); - ScriptedFactory factory = new ScriptedFactory( - /* successSentinel */ stubClient(), - /* throwingTimes */ Integer.MAX_VALUE, - /* throwSupplier */ () -> { + ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> { firstFailureSeen.countDown(); - return new QwpDurableAckMismatchException("h", 1234, "primary"); + // Transport error: the un-clamped (boundedByBudget=false) sleep + // path, so the park is backoff+jitter (5-10s), never trimmed to + // the wall-clock budget. + return new LineSenderException( + "Failed to connect: all 2 endpoint(s) unreachable; last=127.0.0.1:9000"); }); BackgroundDrainer drainer = newDrainerWithBudgets( factory, /*reconnectMaxDurationMillis*/ 60_000L, - /*backoffInit*/ 5L, /*backoffMax*/ 10L); - drainer.setListener(listener); - Thread t = new Thread(drainer::connectWithDurableAckRetry, "test-helper"); + /*backoffInit*/ 5_000L, /*backoffMax*/ 5_000L); + Thread t = new Thread(drainer::connectWithDurableAckRetry, "long-park-stop-drainer"); t.setDaemon(true); t.start(); - // Wait until at least one attempt has fired, then signal stop. Assert.assertTrue("first failure must occur promptly", firstFailureSeen.await(2, TimeUnit.SECONDS)); + // Give the drainer a moment to enter the 5-10s park. If requestStop() + // instead lands before the park, the pre-park stopRequested check + // skips it entirely -- either way the exit must be prompt. + Thread.sleep(100); + long stopNanos = System.nanoTime(); drainer.requestStop(); - t.join(5_000); - Assert.assertFalse("helper must exit after stop", t.isAlive()); + t.join(2_000); + long exitMillis = (System.nanoTime() - stopNanos) / 1_000_000L; + Assert.assertFalse("requestStop() must break the drainer out of a 5-10s " + + "backoff park promptly (exit took >" + exitMillis + "ms); " + + "a monolithic park with no unpark sleeps out the full backoff", + t.isAlive()); assertEquals(BackgroundDrainer.DrainOutcome.STOPPED, drainer.outcome()); - // No persistent-failure callback on stop; no sentinel dropped. - assertEquals(0, listener.persistentFailures.get()); - assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + assertFalse("stop is not a failure: no .failed sentinel", + Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); } - @Test - public void testWallTimeBudgetEscalatesBeforeAttemptCap() { - CountingListener listener = new CountingListener(); - // Each failure sleeps 12ms; budget is 25ms — second iteration must - // observe deadline crossed without reaching the 16-attempt cap. - ScriptedFactory factory = new ScriptedFactory( - /* successSentinel */ stubClient(), - /* throwingTimes */ Integer.MAX_VALUE, - /* throwSupplier */ () -> { - try { - Thread.sleep(12); - } catch (InterruptedException ignored) { - Thread.currentThread().interrupt(); - } - return new QwpDurableAckMismatchException("h", 1234, "primary"); - }); - BackgroundDrainer drainer = newDrainerWithBudgets( - factory, /*reconnectMaxDurationMillis*/ 25L, - /*backoffInit*/ 1L, /*backoffMax*/ 1L); - drainer.setListener(listener); - WebSocketClient out = drainer.connectWithDurableAckRetry(); - assertNull(out); - assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); - assertEquals("must escalate by wall time, not attempts", 1, listener.persistentFailures.get()); - int total = listener.lastPersistentTotalAttempts.get(); - assertTrue("escalated before reaching attempt cap (got " + total + ")", - total < BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS); - assertTrue(total >= 1); - assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + private static void sleepQuietly(long millis) { + try { + Thread.sleep(millis); + } catch (InterruptedException ignored) { + Thread.currentThread().interrupt(); + } } private BackgroundDrainer newDrainer(ScriptedFactory factory) { @@ -350,8 +928,35 @@ private BackgroundDrainer newDrainerWithBudgets( /* durableAckKeepaliveIntervalMillis */ 200L); } + /** + * Wraps a test body in {@link TestUtils#assertMemoryLeak} and closes every + * stub the body allocated BEFORE the leak check fires -- LeakCheck closes + * at the end of the wrapped lambda, so an @After-only close would run too + * late and fail every wrapped test. + */ + private static void assertMemoryLeak(TestUtils.LeakProneCode code) throws Exception { + TestUtils.assertMemoryLeak(() -> { + try { + code.run(); + } finally { + closeAllStubs(); + } + }); + } + + private static void closeAllStubs() { + synchronized (LIVE_STUBS) { + for (int i = 0, n = LIVE_STUBS.size(); i < n; i++) { + LIVE_STUBS.get(i).close(); + } + LIVE_STUBS.clear(); + } + } + private static StubWebSocketClient stubClient() { - return new StubWebSocketClient(); + StubWebSocketClient client = new StubWebSocketClient(); + LIVE_STUBS.add(client); + return client; } /** @@ -362,6 +967,8 @@ private static final class CountingListener implements BackgroundDrainerListener final AtomicInteger lastPersistentElapsedMs = new AtomicInteger(-1); final AtomicInteger lastPersistentTotalAttempts = new AtomicInteger(-1); final AtomicInteger persistentFailures = new AtomicInteger(); + final List primaryUnavailableAttempts = new ArrayList<>(); + final List primaryUnavailableSlotPaths = new ArrayList<>(); final List unavailableAttempts = new ArrayList<>(); final List unavailableSlotPaths = new ArrayList<>(); @@ -377,6 +984,12 @@ public synchronized void onDurableAckUnavailable(String slotPath, int attemptNum unavailableSlotPaths.add(slotPath); unavailableAttempts.add(attemptNumber); } + + @Override + public synchronized void onPrimaryUnavailable(String slotPath, int attemptNumber) { + primaryUnavailableSlotPaths.add(slotPath); + primaryUnavailableAttempts.add(attemptNumber); + } } /** diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerInterruptedTeardownTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerInterruptedTeardownTest.java new file mode 100644 index 00000000..d1526146 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerInterruptedTeardownTest.java @@ -0,0 +1,287 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client.sf.cursor; + +import io.questdb.client.DefaultHttpClientConfiguration; +import io.questdb.client.cutlass.http.client.WebSocketClient; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop; +import io.questdb.client.network.PlainSocketFactory; +import io.questdb.client.std.Compat; +import io.questdb.client.std.Files; +import io.questdb.client.std.MemoryTag; +import io.questdb.client.std.Unsafe; +import io.questdb.client.test.tools.TestUtils; +import org.junit.After; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import java.nio.file.Paths; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicReference; + +/** + * Red test for the SEGV half of finding C5 — interrupted drainer teardown + * must not unmap the engine under a live I/O thread. + *

    + * Production sequence: during an outage the drainer's loop I/O thread sits + * inside a blocking native connect ({@code connect_timeout} defaults to 0 = + * OS timeout; neither unpark nor interrupt cancels {@code connect(2)}). + * {@code BackgroundDrainerPool.close()} escalates — graceful drain, + * {@code requestStop()}, then {@code shutdownNow()} — and the interrupt + * lands in {@code loop.close()}'s {@code shutdownLatch.await()}. Pre-fix, + * {@code close()} swallows it and returns while the I/O thread is alive; + * {@code BackgroundDrainer.run()}'s finally then closes the engine — + * {@code munmap}/{@code Unsafe.free} on segment memory a thread that is + * still alive may touch with raw {@code Unsafe} reads. + *

    + * The invariant pinned here: at the moment {@code run()} returns, NOT + * (loop I/O thread alive AND slot lock released). The slot lock is an + * on-disk protocol shared with other processes and scanners, and + * {@code engine.close()} releases it strictly after unmapping — so + * "lock released" is the public, behavioral witness of "engine torn down". + * Either valid fix shape satisfies the invariant: block until the thread + * exits (re-await), or keep the lock/engine alive past {@code run()} by + * delegating engine teardown to the I/O thread's exit path. The tail of the + * test additionally requires that the slot lock is EVENTUALLY released once + * the stuck connect resolves — a fix may defer teardown, not abandon it. + *

    + * NOTE: this test is a proxy for the memory-safety property ("no engine + * access after unmap"), which cannot be asserted in-process — a SEGV kills + * the JVM, and {@code Unsafe.free}'d memory is not guaranteed to fault. The + * invariant is a sufficient teardown discipline, deliberately stricter than + * the minimal property; see the C5 review discussion. + *

    + * Determinism: no sleeps. The interrupt is delivered after + * {@code requestStop()} while the runner is either in an interrupt-immune + * park ({@code LockSupport.parkNanos} preserves the flag) or already in the + * latch await — both routes arrive at the await with the flag set, which + * throws before parking. The "stuck connect" is a latch-gated factory + * (unpark-immune; {@code close()} never interrupts the I/O thread). The + * test only runs safely on pre-fix code because the already-landed + * discard-when-stopped fix keeps the post-teardown I/O thread away from + * engine memory — the hazard this test guards is real on any HEAD without + * that commit. + */ +public class BackgroundDrainerInterruptedTeardownTest { + + private static final long SEGMENT_BYTES = 64 * 1024; + private String tmpDir; + + @Before + public void setUp() { + tmpDir = Paths.get(System.getProperty("java.io.tmpdir"), + "qdb-c5-teardown-" + System.nanoTime()).toString(); + Assert.assertEquals(0, Files.mkdir(tmpDir, Files.DIR_MODE_DEFAULT)); + } + + @After + public void tearDown() { + if (tmpDir == null) return; + long find = Files.findFirst(tmpDir); + if (find > 0) { + try { + int rc = 1; + while (rc > 0) { + String name = Files.utf8ToString(Files.findName(find)); + if (name != null && !".".equals(name) && !"..".equals(name)) { + Files.remove(tmpDir + "/" + name); + } + rc = Files.findNext(find); + } + } finally { + Files.findClose(find); + } + } + Files.remove(tmpDir); + } + + @Test + public void testC5_interruptedTeardownMustNotReleaseSlotUnderLiveIoThread() throws Exception { + TestUtils.assertMemoryLeak(() -> { + // 1. Slot with one published, unacked frame so the drainer opens + // a real engine and spins up a send loop. + long buf = Unsafe.malloc(16, MemoryTag.NATIVE_DEFAULT); + try { + CursorSendEngine prep = new CursorSendEngine(tmpDir, SEGMENT_BYTES); + try { + for (int i = 0; i < 16; i++) { + Unsafe.getUnsafe().putByte(buf + i, (byte) i); + } + Assert.assertEquals(0L, prep.appendBlocking(buf, 16)); + } finally { + // Unacked data on disk -> close() keeps the .sfa files. + prep.close(); + } + } finally { + Unsafe.free(buf, 16, MemoryTag.NATIVE_DEFAULT); + } + + final CountDownLatch enteredReconnect = new CountDownLatch(1); + final CountDownLatch releaseConnect = new CountDownLatch(1); + final AtomicInteger connects = new AtomicInteger(); + final AtomicReference ioThreadRef = new AtomicReference<>(); + // Client #1: initial connect succeeds (on the runner thread), but + // every send throws -- driving the I/O thread into its reconnect + // loop. Client #2: handed back by the gated "connect" after the + // teardown has already run. + final StubWebSocketClient wireDownClient = new StubWebSocketClient(); + final StubWebSocketClient postTeardownClient = new StubWebSocketClient(); + + final CursorWebSocketSendLoop.ReconnectFactory factory = () -> { + if (connects.incrementAndGet() == 1) { + // Initial connect: runs on the drainer's runner thread. + return wireDownClient; + } + // Wire-failure reconnect: runs on the loop's I/O thread. + // Stand-in for a blocking native connect(2): unpark-immune + // (a latch await re-parks after a spurious wake) and never + // interrupted (loop.close() only unparks). + ioThreadRef.set(Thread.currentThread()); + enteredReconnect.countDown(); + releaseConnect.await(); + return postTeardownClient; + }; + + final BackgroundDrainer drainer = new BackgroundDrainer( + tmpDir, SEGMENT_BYTES, Long.MAX_VALUE, factory, + 5_000L, 10L, 50L, false, 0L); + + Thread runner = new Thread(drainer::run, "drainer-runner"); + runner.setDaemon(true); + runner.start(); + try { + Assert.assertTrue("I/O thread never reached the reconnect factory", + enteredReconnect.await(10, TimeUnit.SECONDS)); + + // Pool-shutdown stand-in: requestStop, then the shutdownNow + // interrupt. Wherever the runner is at this instant -- the + // poll park (flag-preserving) or already in the latch await -- + // the flag arrives at the await and throws before parking. + drainer.requestStop(); + runner.interrupt(); + runner.join(10_000L); + Assert.assertFalse("drainer did not return after stop + interrupt", + runner.isAlive()); + Assert.assertEquals(BackgroundDrainer.DrainOutcome.STOPPED, + drainer.outcome()); + + Thread ioThread = ioThreadRef.get(); + Assert.assertNotNull(ioThread); + boolean ioThreadAliveAtReturn = ioThread.isAlive(); + boolean slotLockFreeAtReturn = isSlotLockFree(); + Assert.assertFalse( + "C5 (SEGV): BackgroundDrainer.run() returned with the slot lock " + + "released (engine closed -- segments munmap'd/freed) while " + + "the loop's I/O thread was still alive inside a blocking " + + "connect. loop.close() swallowed the InterruptedException " + + "from shutdownLatch.await() and returned; the finally then " + + "unmapped memory a live thread may touch with raw Unsafe " + + "reads. Teardown must either wait for the thread or be " + + "delegated to its exit path.", + ioThreadAliveAtReturn && slotLockFreeAtReturn); + } finally { + // Unblock the "connect" and quiesce regardless of verdict so + // the memory-leak wrapper sees a fully wound-down world. + releaseConnect.countDown(); + Thread ioThread = ioThreadRef.get(); + if (ioThread != null) { + ioThread.join(10_000L); + Assert.assertFalse("I/O thread did not exit after the connect returned", + ioThread.isAlive()); + } + wireDownClient.close(); + postTeardownClient.close(); + } + + // Deferred is fine; abandoned is not: once the stuck connect + // resolved and the I/O thread exited, the slot lock must be + // released (engine closed by whoever ended up owning teardown), + // or no scanner can ever adopt the slot's remaining data. + long deadlineNanos = System.nanoTime() + TimeUnit.SECONDS.toNanos(10); + while (!isSlotLockFree()) { + Assert.assertTrue( + "slot lock never released after the I/O thread exited -- " + + "engine teardown was abandoned, not deferred", + System.nanoTime() < deadlineNanos); + Compat.onSpinWait(); + } + }); + } + + /** + * Public, behavioral probe of the slot lock: opening an engine on the + * slot succeeds iff no other engine holds the on-disk lock. The probe + * engine is closed immediately; the slot's unacked data keeps its files + * on disk, so probing is observation-only. + */ + private boolean isSlotLockFree() { + try { + new CursorSendEngine(tmpDir, SEGMENT_BYTES).close(); + return true; + } catch (IllegalStateException e) { + String msg = e.getMessage(); + if (msg != null && msg.contains("already in use")) { + return false; + } + throw e; + } + } + + /** + * Minimal concrete {@link WebSocketClient}: connect-level collaborator + * only. Every send throws, so handing it to a live loop deterministically + * drives the I/O thread into its reconnect path without native I/O. + */ + private static final class StubWebSocketClient extends WebSocketClient { + StubWebSocketClient() { + super(DefaultHttpClientConfiguration.INSTANCE, PlainSocketFactory.INSTANCE); + } + + @Override + public void sendBinary(long dataPtr, int length) { + throw new IllegalStateException("stub: wire down"); + } + + @Override + public void sendBinary(long dataPtr, int length, int timeout) { + throw new IllegalStateException("stub: wire down"); + } + + @Override + protected void ioWait(int timeout, int op) { + throw new UnsupportedOperationException("stub: no socket"); + } + + @Override + protected void setupIoWait() { + // no-op + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerMidDrainCapabilityGapTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerMidDrainCapabilityGapTest.java new file mode 100644 index 00000000..889fd3e5 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerMidDrainCapabilityGapTest.java @@ -0,0 +1,426 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client.sf.cursor; + +import io.questdb.client.cutlass.http.client.WebSocketClient; +import io.questdb.client.cutlass.http.client.WebSocketClientFactory; +import io.questdb.client.cutlass.http.client.WebSocketUpgradeException; +import io.questdb.client.cutlass.qwp.client.QwpDurableAckMismatchException; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop; +import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner; +import io.questdb.client.std.Files; +import io.questdb.client.std.MemoryTag; +import io.questdb.client.std.Unsafe; +import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer; +import io.questdb.client.test.tools.TestUtils; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.ByteOrder; +import java.nio.charset.StandardCharsets; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; + +/** + * Mid-drain durable-ack capability-gap coverage for {@link BackgroundDrainer}. + *

    + * The initial-connect path ({@code connectWithDurableAckRetry}) gives a + * cluster-wide durable-ack capability gap a bounded settle budget (16 + * consecutive sweeps / wall clock) before quarantining the slot — the budget + * exists precisely for rolling-upgrade transients. The same condition hit + * mid-drain (wire drops, the loop's reconnect sweep lands on a node + * that upgrades but does not advertise durable ack) must get the same budget: + * the drainer re-enters the budgeted connect instead of dropping a + * {@code .failed} sentinel on the first sweep. Genuine terminals (auth, + * non-421 upgrade reject) still quarantine immediately — the sanctioned + * terminal set is unchanged. + *

    + * Wire realism: a real {@link TestWebSocketServer} acks over a live socket; + * the scripted {@link CursorWebSocketSendLoop.ReconnectFactory} decides, + * per connect attempt, whether the sweep sees a healthy node or the + * capability gap. The mid-drain drop is deterministic — the server closes + * the first connection after durably acking exactly one frame. + */ +public class BackgroundDrainerMidDrainCapabilityGapTest { + + private static final long FAST_BACKOFF_MAX_MILLIS = 4L; + private static final long FAST_BACKOFF_MILLIS = 1L; + private static final long RECONNECT_MAX_DURATION_MILLIS = 60_000L; + private static final int SEEDED_FRAMES = 5; + private static final long SEGMENT_SIZE_BYTES = 16384L; + private static final long SF_MAX_TOTAL_BYTES = 1L << 20; + + private String slotPath; + + @Before + public void setUp() { + slotPath = Paths.get(System.getProperty("java.io.tmpdir"), + "qdb-mid-drain-gap-" + System.nanoTime()).toString(); + assertEquals("mkdir slot dir", 0, Files.mkdir(slotPath, Files.DIR_MODE_DEFAULT)); + } + + @After + public void tearDown() { + rmDirRec(slotPath); + } + + @Test + public void testMidDrainCapabilityGapGetsSettleBudgetNotQuarantine() throws Exception { + TestUtils.assertMemoryLeak(() -> { + seedSlot(SEEDED_FRAMES); + GapScenarioHandler handler = new GapScenarioHandler(/* dropFirstConnection */ true); + try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) { + server.start(); + assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + // Call 1: healthy connect (drain starts). The server durably + // acks one frame, then drops the wire. Calls 2-4: the + // reconnect sweep finds only the capability-gap node. Call 5+: + // the rolling upgrade settled; a capable node is back. + ScriptedWireFactory factory = + new ScriptedWireFactory(server.getPort(), 2, 4); + BackgroundDrainer drainer = newDrainer(factory); + CountingListener listener = new CountingListener(); + drainer.setListener(listener); + + runToCompletion(drainer); + + assertEquals("a transient capability gap inside the settle budget " + + "must not quarantine the slot", + BackgroundDrainer.DrainOutcome.SUCCESS, drainer.outcome()); + assertFalse("no .failed sentinel after a successful drain", + Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + // 1 healthy + 3 gap sweeps + 1 healthy at minimum. Stopping at + // 2 means the drainer latched terminal on the first gap sweep. + assertTrue("expected the drainer to retry through the gap, attempts=" + + factory.attempts(), factory.attempts() >= 5); + // The loop's own failed sweep (call 2) latches the loop; budget + // attempts 1 and 2 (calls 3, 4) fire the observability callback. + assertEquals(Arrays.asList(1, 2), listener.unavailableAttempts); + assertEquals(0, listener.persistentFailures.get()); + } + }); + } + + @Test + public void testMidDrainPersistentCapabilityGapExhaustsBudgetThenQuarantines() throws Exception { + TestUtils.assertMemoryLeak(() -> { + seedSlot(SEEDED_FRAMES); + GapScenarioHandler handler = new GapScenarioHandler(/* dropFirstConnection */ true); + try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) { + server.start(); + assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + // Gap never clears: every sweep after the drop throws. + ScriptedWireFactory factory = + new ScriptedWireFactory(server.getPort(), 2, Integer.MAX_VALUE); + BackgroundDrainer drainer = newDrainer(factory); + CountingListener listener = new CountingListener(); + drainer.setListener(listener); + + runToCompletion(drainer); + + int budget = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS; + assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); + assertTrue("persistent gap must quarantine after the budget", + Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + // Escalation goes through the settle budget, not the generic + // wire-error path: the persistent-failure callback fires once + // with the full budget consumed. + assertEquals(1, listener.persistentFailures.get()); + assertEquals(budget, listener.lastPersistentTotalAttempts.get()); + // 1 healthy connect + 1 loop reconnect sweep (latches the loop) + // + the full budget of re-entered sweeps. + assertEquals(2 + budget, factory.attempts()); + } + }); + } + + @Test + public void testMidDrainTerminalUpgradeErrorStillQuarantinesImmediately() throws Exception { + TestUtils.assertMemoryLeak(() -> { + seedSlot(SEEDED_FRAMES); + GapScenarioHandler handler = new GapScenarioHandler(/* dropFirstConnection */ true); + try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) { + server.start(); + assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + // Non-421 upgrade reject mid-drain: sanctioned terminal, no + // settle budget — the drainer must quarantine on the first + // sweep exactly as before. + ScriptedWireFactory factory = new ScriptedWireFactory( + server.getPort(), 2, Integer.MAX_VALUE, + () -> new WebSocketUpgradeException(500, null, "server error during upgrade")); + BackgroundDrainer drainer = newDrainer(factory); + CountingListener listener = new CountingListener(); + drainer.setListener(listener); + + runToCompletion(drainer); + + assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome()); + assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + assertEquals("terminal upgrade error must not consume gap sweeps", + 2, factory.attempts()); + assertEquals(0, listener.unavailableAttempts.size()); + assertEquals(0, listener.persistentFailures.get()); + } + }); + } + + private BackgroundDrainer newDrainer(ScriptedWireFactory factory) { + return new BackgroundDrainer( + slotPath, + SEGMENT_SIZE_BYTES, + SF_MAX_TOTAL_BYTES, + factory, + RECONNECT_MAX_DURATION_MILLIS, + FAST_BACKOFF_MILLIS, + FAST_BACKOFF_MAX_MILLIS, + /* requestDurableAck */ true, + /* durableAckKeepaliveIntervalMillis */ 200L); + } + + private static void rmDirRec(String dir) { + if (dir == null || !Files.exists(dir)) return; + long find = Files.findFirst(dir); + if (find > 0) { + try { + int rc = 1; + while (rc > 0) { + String name = Files.utf8ToString(Files.findName(find)); + if (name != null && !".".equals(name) && !"..".equals(name)) { + String child = dir + "/" + name; + if (!Files.remove(child)) rmDirRec(child); + } + rc = Files.findNext(find); + } + } finally { + Files.findClose(find); + } + } + Files.remove(dir); + } + + private static void runToCompletion(BackgroundDrainer drainer) throws InterruptedException { + Thread t = new Thread(drainer, "test-mid-drain-drainer"); + t.setDaemon(true); + t.start(); + t.join(20_000); + if (t.isAlive()) { + drainer.requestStop(); + t.join(5_000); + fail("drainer did not finish within 20s (outcome=" + drainer.outcome() + ")"); + } + } + + private void seedSlot(int frames) { + try (CursorSendEngine engine = new CursorSendEngine(slotPath, SEGMENT_SIZE_BYTES)) { + long buf = Unsafe.malloc(16, MemoryTag.NATIVE_DEFAULT); + try { + byte[] payload = "frame-bytes-padd".getBytes(StandardCharsets.US_ASCII); + for (int i = 0; i < payload.length; i++) { + Unsafe.getUnsafe().putByte(buf + i, payload[i]); + } + for (int i = 0; i < frames; i++) { + engine.appendBlocking(buf, 16); + } + } finally { + Unsafe.free(buf, 16, MemoryTag.NATIVE_DEFAULT); + } + } + } + + /** + * Records listener invocations for exact-count assertions. + */ + private static final class CountingListener implements BackgroundDrainerListener { + final AtomicInteger lastPersistentTotalAttempts = new AtomicInteger(-1); + final AtomicInteger persistentFailures = new AtomicInteger(); + final List unavailableAttempts = new ArrayList<>(); + + @Override + public synchronized void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) { + persistentFailures.incrementAndGet(); + lastPersistentTotalAttempts.set(totalAttempts); + } + + @Override + public synchronized void onDurableAckUnavailable(String slotPath, int attemptNumber) { + unavailableAttempts.add(attemptNumber); + } + } + + /** + * Server-side script. Connection #1 durably acks exactly one frame, then + * closes the socket — a deterministic mid-drain wire drop. Every later + * connection acks all traffic (OK + durable-ack per frame, per-connection + * wire sequence), so a reconnected loop drains to completion. + *

    + * State is keyed per {@code ClientHandler} identity. A dead connection's + * reader can still deliver late buffered frames AFTER a newer connection + * started (the server reads ahead of the socket close), so any + * "latest-connection" flip-flop bookkeeping desyncs the per-connection + * wire sequence and produces phantom connections. Acks are best-effort: + * a late frame from a dead connection must neither ack with a stale + * counter nor kill the reader thread of a live one. + */ + private static final class GapScenarioHandler implements TestWebSocketServer.WebSocketServerHandler { + private static final String TABLE = "trades"; + private final boolean dropFirstConnection; + private final List arrivalOrder = new ArrayList<>(); + private final java.util.Map wireSeqByConn = + new java.util.IdentityHashMap<>(); + + GapScenarioHandler(boolean dropFirstConnection) { + this.dropFirstConnection = dropFirstConnection; + } + + @Override + public synchronized void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) { + long[] counter = wireSeqByConn.get(client); + if (counter == null) { + counter = new long[1]; + wireSeqByConn.put(client, counter); + arrivalOrder.add(client); + } + int connectionIndex = arrivalOrder.indexOf(client) + 1; + long seq = counter[0]++; + try { + if (dropFirstConnection && connectionIndex == 1) { + if (seq == 0) { + client.sendBinary(okFrame(seq, seq)); + client.sendBinary(durableAckFrame(seq)); + } else if (seq == 1) { + client.close(); // mid-drain wire drop + } + // seq > 1: late buffered frames from the condemned + // connection; ignore. + } else { + client.sendBinary(okFrame(seq, seq)); + client.sendBinary(durableAckFrame(seq)); + } + } catch (IOException ignored) { + // Best-effort ack: the connection died under us (e.g. racing + // its own close). The client replays on its next connection. + } + } + + private static byte[] durableAckFrame(long seqTxn) { + byte[] name = TABLE.getBytes(StandardCharsets.UTF_8); + ByteBuffer bb = ByteBuffer.allocate(1 + 2 + 2 + name.length + 8) + .order(ByteOrder.LITTLE_ENDIAN); + bb.put((byte) 0x02); // STATUS_DURABLE_ACK + bb.putShort((short) 1); // tableCount + bb.putShort((short) name.length); + bb.put(name); + bb.putLong(seqTxn); + return bb.array(); + } + + private static byte[] okFrame(long wireSeq, long seqTxn) { + byte[] name = TABLE.getBytes(StandardCharsets.UTF_8); + ByteBuffer bb = ByteBuffer.allocate(1 + 8 + 2 + 2 + name.length + 8) + .order(ByteOrder.LITTLE_ENDIAN); + bb.put((byte) 0x00); // STATUS_OK + bb.putLong(wireSeq); + bb.putShort((short) 1); // tableCount + bb.putShort((short) name.length); + bb.put(name); + bb.putLong(seqTxn); + return bb.array(); + } + } + + /** + * Per-call-index scripted factory over a real wire. Call indexes inside + * {@code [throwFrom, throwTo]} (1-based, inclusive) throw the scripted + * exception; every other call returns a live upgraded client against the + * test server, with durable ack requested — exactly the client the + * production connect walk would hand back. + */ + private static final class ScriptedWireFactory implements CursorWebSocketSendLoop.ReconnectFactory { + private final AtomicInteger calls = new AtomicInteger(); + private final int port; + private final ThrowableSupplier throwSupplier; + private final int throwFrom; + private final int throwTo; + + ScriptedWireFactory(int port, int throwFrom, int throwTo) { + this(port, throwFrom, throwTo, + () -> new QwpDurableAckMismatchException("localhost", port, "primary")); + } + + ScriptedWireFactory(int port, int throwFrom, int throwTo, ThrowableSupplier throwSupplier) { + this.port = port; + this.throwFrom = throwFrom; + this.throwTo = throwTo; + this.throwSupplier = throwSupplier; + } + + int attempts() { + return calls.get(); + } + + @Override + public WebSocketClient reconnect() throws Exception { + int n = calls.incrementAndGet(); + if (n >= throwFrom && n <= throwTo) { + Throwable t = throwSupplier.get(); + if (t instanceof RuntimeException) throw (RuntimeException) t; + if (t instanceof Exception) throw (Exception) t; + throw new RuntimeException(t); + } + WebSocketClient c = WebSocketClientFactory.newPlainTextInstance(); + try { + c.setQwpMaxVersion(1); + c.setQwpRequestDurableAck(true); + c.setConnectTimeout(5_000); + c.connect("localhost", port); + c.upgrade("/write/v4", 5_000, null); + } catch (Throwable t) { + c.close(); + throw t; + } + return c; + } + } + + @FunctionalInterface + private interface ThrowableSupplier { + Throwable get(); + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerPoolConnectPhaseCloseTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerPoolConnectPhaseCloseTest.java new file mode 100644 index 00000000..d31f60d4 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerPoolConnectPhaseCloseTest.java @@ -0,0 +1,189 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client.sf.cursor; + +import io.questdb.client.cutlass.line.LineSenderException; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerPool; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop; +import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner; +import io.questdb.client.std.Files; +import io.questdb.client.std.MemoryTag; +import io.questdb.client.std.Unsafe; +import io.questdb.client.test.tools.TestUtils; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.nio.charset.StandardCharsets; +import java.nio.file.Paths; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +/** + * Coverage of {@link BackgroundDrainerPool#close()}'s split stop policy + * (M11): a drainer that never started draining — still inside its + * connect-retry loop, e.g. during a cluster outage — is stop-signaled + * BEFORE the graceful-drain window, so {@code close()} returns in roughly + * one stop-check park chunk (~50ms) instead of burning the full + * {@code GRACEFUL_DRAIN_MILLIS + STOP_GRACE_MILLIS} (~3s) on a drainer + * that cannot possibly finish. + *

    + * The factory throws a plain transport-shaped {@link LineSenderException} + * (the shape of every outage-time connect failure), which doubles this + * test as the contract check that such failures are retried under + * Invariant B — outcome stays PENDING while running, becomes STOPPED on + * close, and NEVER drops a {@code .failed} sentinel. + */ +public class BackgroundDrainerPoolConnectPhaseCloseTest { + + /** + * Well below the pool's 2.5s graceful window: generous enough for CI + * scheduling jitter, tight enough that a regression to + * "graceful-wait-first" (>= 2500ms) fails loudly. + */ + private static final long CLOSE_BUDGET_MILLIS = 2_000L; + /** Longer than the close budget: close() must not sleep a backoff out. */ + private static final long LONG_BACKOFF_MILLIS = 30_000L; + private static final long SEGMENT_SIZE_BYTES = 16384L; + private static final long SF_MAX_TOTAL_BYTES = 1L << 20; + + private String slotPath; + + @Before + public void setUp() { + slotPath = Paths.get(System.getProperty("java.io.tmpdir"), + "qdb-pool-connect-close-" + System.nanoTime()).toString(); + assertEquals("mkdir slot dir", 0, Files.mkdir(slotPath, Files.DIR_MODE_DEFAULT)); + } + + @After + public void tearDown() { + rmDirRec(slotPath); + } + + @Test + public void testCloseStopsConnectPhaseDrainerWithoutBurningGracefulWindow() throws Exception { + TestUtils.assertMemoryLeak(() -> { + // Unacked data on disk: without it run() exits SUCCESS before + // ever entering the connect-retry loop this test needs. + seedSlot(3); + final CountDownLatch firstAttempt = new CountDownLatch(1); + final AtomicInteger attempts = new AtomicInteger(); + final CursorWebSocketSendLoop.ReconnectFactory factory = () -> { + attempts.incrementAndGet(); + firstAttempt.countDown(); + // Plain transport-shaped failure: the shape of every + // outage-time connect error. Must be retried, never + // quarantined. + throw new LineSenderException( + "Failed to connect: all endpoints unreachable (simulated outage)"); + }; + final BackgroundDrainer drainer = new BackgroundDrainer( + slotPath, + SEGMENT_SIZE_BYTES, + SF_MAX_TOTAL_BYTES, + factory, + /* reconnectMaxDurationMillis */ 60_000L, + /* reconnectInitialBackoffMillis */ LONG_BACKOFF_MILLIS, + /* reconnectMaxBackoffMillis */ LONG_BACKOFF_MILLIS, + /* requestDurableAck */ false, + /* durableAckKeepaliveIntervalMillis */ 0L); + final BackgroundDrainerPool pool = new BackgroundDrainerPool(1); + pool.submit(drainer); + assertTrue("drainer must enter its connect-retry loop", + firstAttempt.await(5, TimeUnit.SECONDS)); + + final long startNanos = System.nanoTime(); + pool.close(); + final long elapsedMillis = (System.nanoTime() - startNanos) / 1_000_000L; + + assertTrue("close() must stop a connect-phase drainer immediately (split stop " + + "policy), not wait out the graceful-drain window; took " + + elapsedMillis + "ms with a " + LONG_BACKOFF_MILLIS + + "ms drainer backoff in flight", + elapsedMillis < CLOSE_BUDGET_MILLIS); + assertEquals("a stop-signaled connect-phase drainer exits STOPPED (slot stays " + + "adoptable), never FAILED", + BackgroundDrainer.DrainOutcome.STOPPED, drainer.outcome()); + assertEquals("a connect-phase drainer must never have advanced ackedFsn", + -1L, drainer.getAckedFsn()); + assertTrue("the drainer must have attempted at least one connect sweep", + attempts.get() >= 1); + assertFalse("outage-shaped connect failures must never quarantine the slot " + + "(.failed sentinel)", + Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + /** + * Seeds {@code frames} frames and returns nothing the test needs beyond + * the on-disk unacked state; mirrors + * {@code BackgroundDrainerTransportOutageRecoveryTest.seedSlot}. + */ + private void seedSlot(int frames) { + try (CursorSendEngine engine = new CursorSendEngine(slotPath, SEGMENT_SIZE_BYTES)) { + long buf = Unsafe.malloc(16, MemoryTag.NATIVE_DEFAULT); + try { + byte[] payload = "frame-bytes-padd".getBytes(StandardCharsets.US_ASCII); + for (int i = 0; i < payload.length; i++) { + Unsafe.getUnsafe().putByte(buf + i, payload[i]); + } + for (int i = 0; i < frames; i++) { + engine.appendBlocking(buf, 16); + } + } finally { + Unsafe.free(buf, 16, MemoryTag.NATIVE_DEFAULT); + } + } + } + + private static void rmDirRec(String dir) { + if (dir == null || !Files.exists(dir)) return; + long find = Files.findFirst(dir); + if (find > 0) { + try { + int rc = 1; + while (rc > 0) { + String name = Files.utf8ToString(Files.findName(find)); + if (name != null && !".".equals(name) && !"..".equals(name)) { + String child = dir + "/" + name; + if (!Files.remove(child)) rmDirRec(child); + } + rc = Files.findNext(find); + } + } finally { + Files.findClose(find); + } + } + Files.remove(dir); + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerTransportOutageRecoveryTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerTransportOutageRecoveryTest.java new file mode 100644 index 00000000..86a3decc --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerTransportOutageRecoveryTest.java @@ -0,0 +1,319 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client.sf.cursor; + +import io.questdb.client.cutlass.http.client.WebSocketClient; +import io.questdb.client.cutlass.http.client.WebSocketClientFactory; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer; +import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop; +import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner; +import io.questdb.client.std.Files; +import io.questdb.client.std.MemoryTag; +import io.questdb.client.std.Unsafe; +import io.questdb.client.test.cutlass.qwp.client.TestPorts; +import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer; +import io.questdb.client.test.tools.TestUtils; +import org.junit.After; +import org.junit.Before; +import org.junit.Test; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.ByteOrder; +import java.nio.charset.StandardCharsets; +import java.nio.file.Paths; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; + +/** + * Down-then-up transport-outage recovery for {@link BackgroundDrainer}, + * end-to-end over a real wire (M8 conjunction gap). + *

    + * Invariant B's two halves were previously pinned only in isolation: + * "a transport outage longer than the settle budget never quarantines" + * (ScriptedFactory unit level, ends with {@code requestStop()}) and + * "the drainer recovers once errors clear" (scripted throws, no real + * outage). This test conjoins them on ONE endpoint: the server is DOWN at + * drainer start (every connect is a genuine ECONNREFUSED through the real + * {@link WebSocketClient} connect/upgrade path), stays down for several + * multiples of {@code reconnect_max_duration_millis} while the drainer + * sweeps, then comes back UP on the SAME port — and the drainer must + * complete the drain, having never dropped a {@code .failed} sentinel or + * fired a persistent-failure escalation during the outage. + */ +public class BackgroundDrainerTransportOutageRecoveryTest { + + private static final long FAST_BACKOFF_MAX_MILLIS = 4L; + private static final long FAST_BACKOFF_MILLIS = 1L; + /** Deliberately tiny: the outage below outlives it several times over. */ + private static final long RECONNECT_MAX_DURATION_MILLIS = 200L; + private static final int SEEDED_FRAMES = 5; + private static final long SEGMENT_SIZE_BYTES = 16384L; + private static final long SF_MAX_TOTAL_BYTES = 1L << 20; + + private String slotPath; + + @Before + public void setUp() { + slotPath = Paths.get(System.getProperty("java.io.tmpdir"), + "qdb-outage-recovery-" + System.nanoTime()).toString(); + assertEquals("mkdir slot dir", 0, Files.mkdir(slotPath, Files.DIR_MODE_DEFAULT)); + } + + @After + public void tearDown() { + rmDirRec(slotPath); + } + + @Test + public void testDrainerSurvivesOutageLongerThanBudgetThenDrainsWhenServerReturns() throws Exception { + TestUtils.assertMemoryLeak(() -> { + long targetFsn = seedSlot(SEEDED_FRAMES); + int port = TestPorts.findUnusedPort(); + WireFactory factory = new WireFactory(port); + BackgroundDrainer drainer = new BackgroundDrainer( + slotPath, + SEGMENT_SIZE_BYTES, + SF_MAX_TOTAL_BYTES, + factory, + RECONNECT_MAX_DURATION_MILLIS, + FAST_BACKOFF_MILLIS, + FAST_BACKOFF_MAX_MILLIS, + /* requestDurableAck */ true, + /* durableAckKeepaliveIntervalMillis */ 200L); + CountingListener listener = new CountingListener(); + drainer.setListener(listener); + + Thread t = new Thread(drainer, "outage-recovery-drainer"); + t.setDaemon(true); + t.start(); + try { + // OUTAGE PHASE: nothing listens on the port, so every sweep is a + // real refused connect. Hold the outage for 3x the wall-clock + // budget AND at least a handful of sweeps, whichever is later -- + // under an Invariant B breach (transport errors charged to the + // budget / attempt cap) the drainer escalates well within this + // window and the thread dies. + long outageUntilNanos = System.nanoTime() + + 3 * RECONNECT_MAX_DURATION_MILLIS * 1_000_000L; + while ((System.nanoTime() < outageUntilNanos || factory.attempts() < 8) + && t.isAlive()) { + Thread.sleep(10); + } + assertTrue("drainer gave up during a transport outage (attempts=" + + factory.attempts() + ", outcome=" + drainer.outcome() + + "): Invariant B says a down server is transient -- the " + + "drainer must still be retrying 3x past the settle budget", + t.isAlive()); + assertEquals("outage must not escalate past PENDING", + BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome()); + assertEquals("outage must not fire a persistent-failure escalation", + 0, listener.persistentFailures.get()); + assertFalse("outage must not quarantine (.failed sentinel) the slot", + Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + + // RECOVERY PHASE: the server comes back on the SAME port. The + // drainer's next sweep connects for real and ships the slot. + try (TestWebSocketServer server = new TestWebSocketServer( + new AckAllHandler(), true, null, port)) { + server.start(); + assertTrue(server.awaitStart(5, TimeUnit.SECONDS)); + t.join(20_000); + if (t.isAlive()) { + drainer.requestStop(); + t.join(5_000); + fail("drainer did not drain within 20s of the server returning " + + "(outcome=" + drainer.outcome() + + ", attempts=" + factory.attempts() + + ", lastError=" + drainer.getLastErrorMessage() + ")"); + } + } + } finally { + drainer.requestStop(); + t.join(5_000); + } + + assertEquals("server recovery must complete the drain", + BackgroundDrainer.DrainOutcome.SUCCESS, drainer.outcome()); + assertEquals("every seeded frame must be durably acked", + targetFsn, drainer.getAckedFsn()); + assertEquals(0, listener.persistentFailures.get()); + assertFalse("no .failed sentinel after a successful drain", + Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME)); + }); + } + + private static void rmDirRec(String dir) { + if (dir == null || !Files.exists(dir)) return; + long find = Files.findFirst(dir); + if (find > 0) { + try { + int rc = 1; + while (rc > 0) { + String name = Files.utf8ToString(Files.findName(find)); + if (name != null && !".".equals(name) && !"..".equals(name)) { + String child = dir + "/" + name; + if (!Files.remove(child)) rmDirRec(child); + } + rc = Files.findNext(find); + } + } finally { + Files.findClose(find); + } + } + Files.remove(dir); + } + + /** Seeds {@code frames} frames and returns the slot's published fsn -- + * the drain target the drainer must ack up to. */ + private long seedSlot(int frames) { + try (CursorSendEngine engine = new CursorSendEngine(slotPath, SEGMENT_SIZE_BYTES)) { + long buf = Unsafe.malloc(16, MemoryTag.NATIVE_DEFAULT); + try { + byte[] payload = "frame-bytes-padd".getBytes(StandardCharsets.US_ASCII); + for (int i = 0; i < payload.length; i++) { + Unsafe.getUnsafe().putByte(buf + i, payload[i]); + } + for (int i = 0; i < frames; i++) { + engine.appendBlocking(buf, 16); + } + } finally { + Unsafe.free(buf, 16, MemoryTag.NATIVE_DEFAULT); + } + return engine.publishedFsn(); + } + } + + /** + * Acks every frame (OK + durable-ack, per-connection wire sequence) so a + * reconnected drainer drains to completion. Trimmed-down clone of the + * mid-drain test's healthy-server behaviour. + */ + private static final class AckAllHandler implements TestWebSocketServer.WebSocketServerHandler { + private static final String TABLE = "trades"; + private final java.util.Map wireSeqByConn = + new java.util.IdentityHashMap<>(); + + @Override + public synchronized void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) { + long[] counter = wireSeqByConn.get(client); + if (counter == null) { + counter = new long[1]; + wireSeqByConn.put(client, counter); + } + long seq = counter[0]++; + try { + client.sendBinary(okFrame(seq, seq)); + client.sendBinary(durableAckFrame(seq)); + } catch (IOException ignored) { + // Best-effort ack: the connection died under us; the client + // replays on its next connection. + } + } + + private static byte[] durableAckFrame(long seqTxn) { + byte[] name = TABLE.getBytes(StandardCharsets.UTF_8); + ByteBuffer bb = ByteBuffer.allocate(1 + 2 + 2 + name.length + 8) + .order(ByteOrder.LITTLE_ENDIAN); + bb.put((byte) 0x02); // STATUS_DURABLE_ACK + bb.putShort((short) 1); // tableCount + bb.putShort((short) name.length); + bb.put(name); + bb.putLong(seqTxn); + return bb.array(); + } + + private static byte[] okFrame(long wireSeq, long seqTxn) { + byte[] name = TABLE.getBytes(StandardCharsets.UTF_8); + ByteBuffer bb = ByteBuffer.allocate(1 + 8 + 2 + 2 + name.length + 8) + .order(ByteOrder.LITTLE_ENDIAN); + bb.put((byte) 0x00); // STATUS_OK + bb.putLong(wireSeq); + bb.putShort((short) 1); // tableCount + bb.putShort((short) name.length); + bb.put(name); + bb.putLong(seqTxn); + return bb.array(); + } + } + + /** Records persistent-failure escalations; the outage must produce none. */ + private static final class CountingListener implements BackgroundDrainerListener { + final AtomicInteger persistentFailures = new AtomicInteger(); + + @Override + public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) { + persistentFailures.incrementAndGet(); + } + + @Override + public void onDurableAckUnavailable(String slotPath, int attemptNumber) { + // transport errors never fire this; nothing to record + } + } + + /** + * Real-wire connect factory: every call performs a genuine TCP connect + + * WebSocket upgrade against the fixed loopback port -- refused while the + * server is down, a live upgraded client once it is up. Exactly the client + * the production connect walk would hand back. + */ + private static final class WireFactory implements CursorWebSocketSendLoop.ReconnectFactory { + private final AtomicInteger calls = new AtomicInteger(); + private final int port; + + WireFactory(int port) { + this.port = port; + } + + int attempts() { + return calls.get(); + } + + @Override + public WebSocketClient reconnect() throws Exception { + calls.incrementAndGet(); + WebSocketClient c = WebSocketClientFactory.newPlainTextInstance(); + try { + c.setQwpMaxVersion(1); + c.setQwpRequestDurableAck(true); + c.setConnectTimeout(5_000); + c.connect("localhost", port); + c.upgrade("/write/v4", 5_000, null); + } catch (Throwable t) { + c.close(); + throw t; + } + return c; + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CloseOwnershipRaceTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CloseOwnershipRaceTest.java index f4cbffd1..e04bf15c 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CloseOwnershipRaceTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CloseOwnershipRaceTest.java @@ -24,6 +24,7 @@ package io.questdb.client.test.cutlass.qwp.client.sf.cursor; +import io.questdb.client.cutlass.qwp.client.QwpAuthFailedException; import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine; import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop; import org.junit.Assert; @@ -59,16 +60,19 @@ public void closeOwnershipSnapshotNeverClaimsAnUnsurfacedError() { sfDir.getRoot().getAbsolutePath(), 16_384)) { Throwable leaked = null; for (int i = 0; i < ROUNDS && leaked == null; i++) { - // A null client, a reconnect factory that never produces one, - // and a zero reconnect budget: start()'s real I/O thread walks - // the production async-initial-connect path and latches a - // genuine RECONNECT_BUDGET_EXHAUSTED terminal within - // microseconds. One authentic null->error latch transition - // per round. + // A null client and a reconnect factory that throws a genuine + // terminal auth reject: start()'s real I/O thread walks the + // production async-initial-connect path and latches a genuine + // (SECURITY_ERROR) terminal within microseconds. One authentic + // null->error latch transition per round. (Under Invariant B a + // connection error / budget would retry forever and never latch; + // only a genuine terminal like auth does.) CursorWebSocketSendLoop loop = new CursorWebSocketSendLoop( null, engine, 0, 1_000_000L, - () -> null, - 0, // reconnect budget: exhausted on arrival + () -> { + throw new QwpAuthFailedException(401, "localhost", 1); + }, + 0, 1, 1); loop.start(); // Race close()'s exact ownership snapshot against the latch diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopInterruptedCloseLeakTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopInterruptedCloseLeakTest.java new file mode 100644 index 00000000..d0e1a8d2 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopInterruptedCloseLeakTest.java @@ -0,0 +1,208 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client.sf.cursor; + +import io.questdb.client.DefaultHttpClientConfiguration; +import io.questdb.client.cutlass.http.client.WebSocketClient; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop; +import io.questdb.client.network.PlainSocketFactory; +import io.questdb.client.std.Compat; +import io.questdb.client.test.tools.TestUtils; +import org.junit.Assert; +import org.junit.Test; + +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicReference; + +/** + * Red test for finding C5 — interrupted drainer teardown abandons a client + * installed by an in-flight reconnect. + *

    + * Production sequence being modeled: during a server outage an orphan + * drainer's {@link CursorWebSocketSendLoop} I/O thread sits inside a + * blocking native connect ({@code connect_timeout} defaults to 0 = OS + * timeout, tens of seconds; neither {@code unpark} nor interrupt cancels + * {@code connect(2)}). Under Invariant B the drainer no longer exits on a + * wall-clock budget, so {@code BackgroundDrainerPool.close()} routinely + * escalates: 2500 ms graceful drain → {@code requestStop()} → + * 500 ms grace → {@code shutdownNow()}. The {@code shutdownNow()} + * interrupt lands in {@code loop.close()}'s {@code shutdownLatch.await()}; + * pre-fix, {@code close()} swallows the {@link InterruptedException}, + * re-interrupts, and returns while the I/O thread is still alive. When the + * in-flight {@code reconnect()} subsequently succeeds, {@code swapClient} + * installs the live client into the abandoned loop — and no code path ever + * closes it: {@code loop.close()} already ran (its {@code client} read saw + * null), and {@code ioLoop}'s exit path only counts down the latch. The + * client's native socket, fds and buffers leak for the life of the process. + *

    + * The test pins the fix-agnostic ownership contract, not a fix strategy: + * every {@code WebSocketClient} the loop obtains — via constructor or + * factory — must be closed by the time the loop is quiescent (I/O thread + * exited, {@code close()} completed or failed loudly). Any of the + * candidate fixes satisfies it: (a) re-awaiting the shutdown latch in a + * loop (close() then picks up the swapped client), (b) closing the current + * client in {@code ioLoop}'s exit path, or (c) {@code connectLoop} + * discarding-and-closing a factory client obtained after {@code running} + * went false. A guard-only fix that merely skips engine teardown (the + * SEGV half of C5) correctly leaves this test red — the leak is a distinct + * defect. + *

    + * Determinism notes: no sleeps or timing races. The interrupt is injected + * by pre-setting the closer thread's interrupt flag — + * {@code CountDownLatch.await()} checks {@code Thread.interrupted()} before + * parking, so the swallow path is entered on the first call. The "stuck + * native connect" is a factory blocked on a test latch, which is faithful: + * a latch await re-parks after {@code close()}'s spurious {@code unpark}, + * and {@code close()} never interrupts the I/O thread. + */ +public class CursorWebSocketSendLoopInterruptedCloseLeakTest { + + @Test + public void testC5_interruptedCloseMustNotLeakClientInstalledByInFlightReconnect() throws Exception { + TestUtils.assertMemoryLeak(() -> { + final CountDownLatch enteredReconnect = new CountDownLatch(1); + final CountDownLatch releaseConnect = new CountDownLatch(1); + final AtomicReference ioThreadRef = new AtomicReference<>(); + final TrackingStubWebSocketClient liveClient = new TrackingStubWebSocketClient(); + + // Stand-in for a blocking native connect(2): entered by the loop's + // I/O thread, immune to unpark, never interrupted by loop.close(). + // Returns a live client once released — the "reconnect succeeds + // mid-teardown" arm of C5. + final CursorWebSocketSendLoop.ReconnectFactory stuckConnect = () -> { + ioThreadRef.set(Thread.currentThread()); + enteredReconnect.countDown(); + releaseConnect.await(); + return liveClient; + }; + + final CursorSendEngine engine = new CursorSendEngine(null, 64 * 1024); + try { + CursorWebSocketSendLoop loop = new CursorWebSocketSendLoop( + null /* async-initial-connect: the I/O thread drives the connect */, + engine, 0L, 1_000L, + stuckConnect, + 5_000L, 100L, 5_000L, false); + loop.start(); + Assert.assertTrue("I/O thread never reached the reconnect factory", + enteredReconnect.await(5, TimeUnit.SECONDS)); + + // Drainer-thread stand-in: BackgroundDrainer.run()'s finally calls + // loop.close() and shutdownNow()'s interrupt lands in the latch + // await. Pre-setting the flag makes that deterministic. + final AtomicReference closeFailure = new AtomicReference<>(); + Thread closer = new Thread(() -> { + Thread.currentThread().interrupt(); + try { + loop.close(); + } catch (Throwable t) { + // A close() that THROWS to signal the failed stop is a + // valid fix shape (QwpWebSocketSender.close()'s + // ioThreadStopped guard consumes exactly that signal). + // The ownership assertion below is what must hold. + closeFailure.set(t); + } + }, "drainer-close-stand-in"); + closer.setDaemon(true); + closer.start(); + + // close()'s first action is running=false. Once observable, the + // teardown is underway and the "connect" may complete. Under a + // re-await fix the closer is still blocked inside close() here, + // so the gate must open before joining it (fix-agnostic order). + long deadlineNanos = System.nanoTime() + TimeUnit.SECONDS.toNanos(5); + while (loop.isRunning()) { + Assert.assertTrue("close() never started", System.nanoTime() < deadlineNanos); + Compat.onSpinWait(); + } + releaseConnect.countDown(); + + closer.join(5_000L); + Assert.assertFalse("closer thread did not finish", closer.isAlive()); + Thread ioThread = ioThreadRef.get(); + Assert.assertNotNull(ioThread); + ioThread.join(5_000L); + Assert.assertFalse("I/O thread did not exit after the connect returned", + ioThread.isAlive()); + + // Loop is quiescent. Capture the verdict BEFORE any cleanup so + // the test's own close calls cannot mask the leak. + boolean closedByLoop = liveClient.closeCount() > 0; + + Assert.assertTrue( + "C5: the WebSocketClient handed to the loop by an in-flight " + + "reconnect() was never closed. loop.close() swallowed the " + + "InterruptedException from shutdownLatch.await() and returned " + + "while the I/O thread was still inside the blocking connect; " + + "swapClient then installed the live client into the abandoned " + + "loop where nothing closes it — its native socket and fds leak " + + "past drainer teardown. Every client the loop obtains " + + "(constructor or factory) must be closed by the time the loop " + + "is quiescent.", + closedByLoop); + } finally { + liveClient.close(); + engine.close(); + } + }); + } + + /** + * Minimal concrete {@link WebSocketClient} — never performs I/O; counts + * {@code close()} calls so the test can assert ownership at quiescence. + * Close remains idempotent via the superclass, matching the production + * contract owners rely on. + */ + private static final class TrackingStubWebSocketClient extends WebSocketClient { + private final AtomicInteger closeCount = new AtomicInteger(); + + TrackingStubWebSocketClient() { + super(DefaultHttpClientConfiguration.INSTANCE, PlainSocketFactory.INSTANCE); + } + + @Override + public void close() { + closeCount.incrementAndGet(); + super.close(); + } + + int closeCount() { + return closeCount.get(); + } + + @Override + protected void ioWait(int timeout, int op) { + throw new UnsupportedOperationException("stub: no socket"); + } + + @Override + protected void setupIoWait() { + // no-op + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopJvmErrorTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopJvmErrorTest.java new file mode 100644 index 00000000..42e91748 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopJvmErrorTest.java @@ -0,0 +1,184 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.cutlass.qwp.client.sf.cursor; + +import io.questdb.client.cutlass.line.LineSenderException; +import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop; +import io.questdb.client.std.Unsafe; +import org.junit.Assert; +import org.junit.Test; + +import java.lang.reflect.Field; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.atomic.AtomicInteger; +import java.util.concurrent.atomic.AtomicLong; + +/** + * Regression coverage (M3): {@code catch (Throwable)} in the reconnect + * machinery used to swallow {@link java.lang.Error} (OOM, LinkageError, + * StackOverflowError) into an indefinite "transport outage" retry with only + * a throttled, possibly-null-message WARN as a trace. A JVM/programming + * failure is not a transport outage -- retrying cannot clear it -- so every + * retry loop must rethrow {@code Error}, after latching it as terminal where + * a producer could otherwise hang in {@code checkError()}. + *

    + * Uses the same {@code Unsafe.allocateInstance} bare-loop pattern as + * {@link CursorWebSocketSendLoopErrorLatchTest}: the retry loops only touch + * the fields wired below, so no live wire client or engine is needed. + */ +public class CursorWebSocketSendLoopJvmErrorTest { + + @Test + public void testConnectWithRetryPropagatesJvmError() { + // The budgeted blocking initial-connect helper must not burn the + // connect budget retrying a JVM Error; it propagates to the caller + // (the producer thread in buildAndConnect) on the first attempt. + AtomicInteger attempts = new AtomicInteger(); + try { + CursorWebSocketSendLoop.connectWithRetry( + () -> { + attempts.incrementAndGet(); + throw new LinkageError("simulated JVM failure"); + }, + /* maxDurationMillis */ 60_000L, + /* initialBackoffMillis */ 1L, + /* maxBackoffMillis */ 4L, + "test initial connect"); + Assert.fail("a JVM Error must propagate, not consume the connect budget"); + } catch (LinkageError expected) { + Assert.assertEquals("simulated JVM failure", expected.getMessage()); + } + Assert.assertEquals("no retry on a JVM Error", 1, attempts.get()); + } + + @Test + public void testConnectLoopPropagatesJvmErrorAndLatchesTerminal() throws Exception { + // The background per-outage reconnect loop must (1) latch the Error + // as terminal FIRST -- a producer parked in checkError() would + // otherwise never observe the failure -- and (2) rethrow so the I/O + // thread dies loudly instead of reconnect-looping forever. + CursorWebSocketSendLoop loop = newBareLoop(); + AtomicInteger attempts = new AtomicInteger(); + wireReconnectPlumbing(loop, attempts); + + Method connectLoop = CursorWebSocketSendLoop.class.getDeclaredMethod( + "connectLoop", Throwable.class, String.class); + connectLoop.setAccessible(true); + try { + connectLoop.invoke(loop, new LineSenderException("initial wire failure"), "reconnect"); + Assert.fail("a JVM Error must escape connectLoop, not be retried"); + } catch (InvocationTargetException ite) { + Assert.assertTrue("expected LinkageError, got " + ite.getCause(), + ite.getCause() instanceof LinkageError); + } + Assert.assertEquals("no retry on a JVM Error", 1, attempts.get()); + assertErrorLatchedAndStopped(loop); + } + + @Test + public void testIoLoopDoesNotFunnelJvmErrorIntoReconnect() throws Exception { + // ioLoop's catch (Throwable) used to funnel EVERYTHING into + // fail(t) -> connectLoop(t, "reconnect"). An Error must instead be + // latched as terminal and rethrown; the finally still counts down + // the shutdown latch so close() cannot hang on the dead thread. + CursorWebSocketSendLoop loop = newBareLoop(); + AtomicInteger attempts = new AtomicInteger(); + wireReconnectPlumbing(loop, attempts); + CountDownLatch shutdownLatch = new CountDownLatch(1); + setField(loop, "shutdownLatch", shutdownLatch); + // client == null + running routes ioLoop into attemptInitialConnect + // -> connectLoop -> the throwing factory, exercising the full funnel. + + Method ioLoop = CursorWebSocketSendLoop.class.getDeclaredMethod("ioLoop"); + ioLoop.setAccessible(true); + try { + ioLoop.invoke(loop); + Assert.fail("a JVM Error must escape ioLoop, not re-enter the reconnect loop"); + } catch (InvocationTargetException ite) { + Assert.assertTrue("expected LinkageError, got " + ite.getCause(), + ite.getCause() instanceof LinkageError); + } + Assert.assertEquals("no retry on a JVM Error", 1, attempts.get()); + Assert.assertEquals("shutdown latch must count down so close() cannot hang", + 0L, shutdownLatch.getCount()); + assertErrorLatchedAndStopped(loop); + } + + private static void assertErrorLatchedAndStopped(CursorWebSocketSendLoop loop) + throws Exception { + Throwable terminal = loop.getTerminalError(); + Assert.assertNotNull("Error must be latched as terminal for checkError()", terminal); + Assert.assertTrue("latch wraps the raw Error once", + terminal instanceof LineSenderException); + Assert.assertTrue("latched cause must be the original Error", + terminal.getCause() instanceof LinkageError); + Assert.assertFalse("recordFatal must stop the loop", + (Boolean) getField(loop, "running")); + try { + loop.checkError(); + Assert.fail("producer-facing checkError must surface the latched terminal"); + } catch (LineSenderException thrown) { + Assert.assertSame(terminal, thrown); + } + } + + /** + * Wires the minimal state the reconnect paths dereference: a factory + * throwing {@link LinkageError} on every attempt, live {@code running}, + * and the attempt counters (field initializers do not run under + * {@code Unsafe.allocateInstance}). + */ + private static void wireReconnectPlumbing(CursorWebSocketSendLoop loop, + AtomicInteger attempts) throws Exception { + CursorWebSocketSendLoop.ReconnectFactory factory = () -> { + attempts.incrementAndGet(); + throw new LinkageError("simulated JVM failure"); + }; + setField(loop, "reconnectFactory", factory); + setField(loop, "running", true); + setField(loop, "totalReconnectAttempts", new AtomicLong()); + setField(loop, "totalReconnects", new AtomicLong()); + } + + private static CursorWebSocketSendLoop newBareLoop() throws Exception { + // Bypass the real constructor -- no wire client or engine needed. + return (CursorWebSocketSendLoop) Unsafe.getUnsafe() + .allocateInstance(CursorWebSocketSendLoop.class); + } + + private static Object getField(Object target, String name) throws Exception { + Field f = CursorWebSocketSendLoop.class.getDeclaredField(name); + f.setAccessible(true); + return f.get(target); + } + + private static void setField(Object target, String name, Object value) throws Exception { + Field f = CursorWebSocketSendLoop.class.getDeclaredField(name); + f.setAccessible(true); + f.set(target, value); + } +} diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/EngineCloseSlotLockReleaseTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/EngineCloseSlotLockReleaseTest.java index 19b51848..804ab6d3 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/EngineCloseSlotLockReleaseTest.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/EngineCloseSlotLockReleaseTest.java @@ -135,6 +135,14 @@ public void testSlotLockReleasedEvenIfRingCloseThrows() throws Exception { managerField.setAccessible(true); SegmentManager capturedManager = (SegmentManager) managerField.get(engine); + // The watermark's 16-byte mmap is also unreachable to the sabotaged + // close() (it NPEs before getting there), so capture and free it + // manually too or the leak check trips on MMAP_DEFAULT. + Field watermarkField = CursorSendEngine.class.getDeclaredField("watermark"); + watermarkField.setAccessible(true); + io.questdb.client.cutlass.qwp.client.sf.cursor.AckWatermark capturedWatermark = + (io.questdb.client.cutlass.qwp.client.sf.cursor.AckWatermark) watermarkField.get(engine); + ringField.set(engine, null); try { @@ -150,6 +158,9 @@ public void testSlotLockReleasedEvenIfRingCloseThrows() throws Exception { // are an artifact of the sabotage. capturedRing.close(); capturedManager.close(); + if (capturedWatermark != null) { + capturedWatermark.close(); + } // The user-visible test: can a fresh SlotLock acquire the // same slot? If the original lock fd is still held, the diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/websocket/TestWebSocketServer.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/websocket/TestWebSocketServer.java index 806d3750..e9f380d2 100644 --- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/websocket/TestWebSocketServer.java +++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/websocket/TestWebSocketServer.java @@ -83,12 +83,25 @@ public class TestWebSocketServer implements Closeable { // QwpQueryClient tests enable this; ingress sender tests leave it off so their // connections carry only ACK frames. private volatile boolean sendServerInfo; + // When true, the server fails the WebSocket upgrade on the egress read path + // (/read...) by dropping the connection before the 101, while still serving + // the ingest write path (/write...) normally. Lets one server + one cluster + // config drive a build where the sender pool connects but the query pool + // cannot. Set via setRejectReadUpgrade(). + private volatile boolean rejectReadUpgrade; // When non-null the next handshake responds with HTTP 421 Misdirected // Request + X-QuestDB-Role: , mimicking a server whose // QwpServerInfoProvider reports REPLICA / PRIMARY_CATCHUP. Set after // construction via setRejectWithRole(). private volatile String rejectingRole; private volatile int rejectingStatusCode; + // When true, 101 upgrade responses omit the X-QWP-Durable-Ack header even + // though the server was constructed with emitDurableAckHeader=true -- + // simulating a rolling-upgrade window where an endpoint upgrades but does + // not advertise durable ack (the drainer's capability-gap condition). + // Live-updatable via setSuppressDurableAckHeader(), so a test can start + // in the gap and later let the cluster "settle". + private volatile boolean suppressDurableAckHeader; // When > 0, the next handshake responds with this status code + the // reason phrase from {@link #rejectingStatusReason}. Used to simulate // 401, 403, 404, 426, 503, etc. that the failover loop should @@ -120,6 +133,23 @@ public TestWebSocketServer(WebSocketServerHandler handler, boolean emitDurableAc */ public TestWebSocketServer(WebSocketServerHandler handler, boolean emitDurableAckHeader, String advertisedRole) throws IOException { + this(handler, emitDurableAckHeader, advertisedRole, 0); + } + + /** + * @param requestedPort loopback port to bind, or {@code 0} for an + * OS-assigned ephemeral port. A caller-chosen port + * lets a test model a server that goes DOWN and later + * comes back UP on the SAME endpoint (down-then-up + * outage realism): allocate via + * {@code TestPorts.findUnusedPort()}, let the client + * bang on the refused port, then bind here. Carries + * the standard bind-close-reuse exposure every + * pre-selected-port test in this suite accepts. + */ + public TestWebSocketServer(WebSocketServerHandler handler, + boolean emitDurableAckHeader, String advertisedRole, + int requestedPort) throws IOException { this.handler = handler; this.emitDurableAckHeader = emitDurableAckHeader; this.advertisedRole = advertisedRole; @@ -129,7 +159,7 @@ public TestWebSocketServer(WebSocketServerHandler handler, // which another process could grab a pre-selected port before start() // binds it. Pinning to loopback keeps client "localhost" connections // routed here rather than to a wildcard listener on the same port. - serverSocket = new ServerSocket(0, 50, java.net.InetAddress.getLoopbackAddress()); + serverSocket = new ServerSocket(requestedPort, 50, java.net.InetAddress.getLoopbackAddress()); serverSocket.setSoTimeout(100); this.port = serverSocket.getLocalPort(); } @@ -208,6 +238,18 @@ public void setRejectWithRole(String role) { this.rejectingRole = role; } + /** + * When enabled, the server fails the WebSocket upgrade on the egress read + * path ({@code /read/...}) while still serving the ingest write path + * ({@code /write/...}) normally. This lets a single server, addressed by a + * single cluster config, accept ingest senders but reject query clients -- + * e.g. to exercise build()'s unwind of an already-built sender pool when the + * query pool fails. + */ + public void setRejectReadUpgrade(boolean rejectReadUpgrade) { + this.rejectReadUpgrade = rejectReadUpgrade; + } + /** * Configure the server to reject the next handshake with an arbitrary * HTTP status code (e.g. 401, 403, 404, 426, 503). Pass {@code 0} to @@ -219,11 +261,26 @@ public void setRejectWithStatus(int statusCode, String reasonPhrase) { this.rejectingStatusReason = reasonPhrase; } + /** + * When enabled, 101 upgrade responses omit the {@code X-QWP-Durable-Ack} + * header even on a server constructed with {@code emitDurableAckHeader} — + * the next opted-in connect ({@code request_durable_ack=on}) observes a + * durable-ack capability gap. Pass {@code false} to clear and resume + * advertising, the way a rolling upgrade eventually settles. The setting + * applies to every new handshake until cleared. + */ + public void setSuppressDurableAckHeader(boolean suppressDurableAckHeader) { + this.suppressDurableAckHeader = suppressDurableAckHeader; + } + /** * When enabled, the server sends a {@code SERVER_INFO} frame immediately - * after a successful 101 upgrade, the way a real egress endpoint does. The - * advertised role follows {@link #setAdvertisedRole}, defaulting to - * {@code STANDALONE}. Leave disabled for ingress (Sender) tests. + * after a successful 101 upgrade on the egress read path ({@code /read/...}), + * the way a real egress endpoint does. Ingest write-path ({@code /write/...}) + * connections never receive it -- their ACK-only response stream would choke + * on an unexpected frame -- so one server can serve both an ingest and a + * query pool from a single cluster config. The advertised role follows + * {@link #setAdvertisedRole}, defaulting to {@code STANDALONE}. */ public void setSendServerInfo(boolean sendServerInfo) { this.sendServerInfo = sendServerInfo; @@ -251,6 +308,10 @@ private static byte[] buildServerInfoFrame(byte role) { return bb.array(); } + private static boolean isReadPath(String path) { + return path != null && path.startsWith("/read"); + } + private static byte roleByte(String role) { if (role == null) { return 0; // ROLE_STANDALONE @@ -313,6 +374,10 @@ public class ClientHandler implements Closeable { private boolean isClosed; private OutputStream out; private Thread readThread; + // Request path from the WebSocket upgrade GET line (e.g. /write/v4, + // /read/v1). Captured during the handshake so the post-upgrade logic can + // distinguish ingest from egress connections. + private String requestPath = ""; ClientHandler(Socket socket) { this.socket = socket; @@ -459,7 +524,15 @@ private boolean performHandshake() throws IOException { } String key = null; - for (String line : request.toString().split("\r\n")) { + String[] lines = request.toString().split("\r\n"); + if (lines.length > 0) { + // GET HTTP/1.1 + String[] parts = lines[0].split(" "); + if (parts.length >= 2) { + requestPath = parts[1]; + } + } + for (String line : lines) { if (line.toLowerCase().startsWith("sec-websocket-key:")) { key = line.substring(18).trim(); break; @@ -470,6 +543,13 @@ private boolean performHandshake() throws IOException { return false; } + // Read-path reject: drop the egress upgrade before the 101 so the + // query pool's connect fails fast, while ingest write-path upgrades + // still complete on this same server. + if (rejectReadUpgrade && isReadPath(requestPath)) { + return false; + } + // Arbitrary-status reject path: tests use setRejectWithStatus // to drive the failover loop's terminal-vs-transient // classification (failover.md §6). @@ -509,7 +589,7 @@ private boolean performHandshake() throws IOException { .append("Upgrade: websocket\r\n") .append("Connection: Upgrade\r\n") .append("Sec-WebSocket-Accept: ").append(acceptKey).append("\r\n"); - if (emitDurableAckHeader) { + if (emitDurableAckHeader && !suppressDurableAckHeader) { sb.append("X-QWP-Durable-Ack: enabled\r\n"); } String role = advertisedRole; @@ -566,7 +646,11 @@ void start() { liveConnections.incrementAndGet(); try { - if (sendServerInfo) { + // SERVER_INFO is an egress-only frame: send it only on a + // read-path (query) connection. An ingest write-path + // connection parses every inbound frame as an ACK and + // would fail on it. + if (sendServerInfo && isReadPath(requestPath)) { sendBinary(buildServerInfoFrame(roleByte(advertisedRole))); } diff --git a/core/src/test/java/io/questdb/client/test/example/QuestDBExamples.java b/core/src/test/java/io/questdb/client/test/example/QuestDBExamples.java index bd3e944a..1aa681f4 100644 --- a/core/src/test/java/io/questdb/client/test/example/QuestDBExamples.java +++ b/core/src/test/java/io/questdb/client/test/example/QuestDBExamples.java @@ -44,11 +44,11 @@ public class QuestDBExamples { public static void main(String[] args) throws Exception { - // 1. Connect with a single configuration string. Both sides run over - // QWP/WebSocket, so one ws:: string configures ingest and egress. - try (QuestDB db = QuestDB.connect("ws::addr=localhost:9000;")) { + // 1. Connect with a single configuration string for the whole cluster. + // Both sides run over QWP/WebSocket, so one ws:: string configures + // ingest and egress; list every node in one addr server list. + try (QuestDB db = QuestDB.connect("ws::addr=node1:9000,node2:9000,node3:9000;")) { ingestWithBorrowedSender(db); - ingestWithThreadAffineSender(db); queryOneShot(db); queryWithBinds(db); cancelExample(db); @@ -59,21 +59,24 @@ public static void main(String[] args) throws Exception { try (QuestDB db = QuestDB.connect( "wss::addr=db.questdb.cloud:9000;token=YOUR_TOKEN_HERE;")) { // ... use db ... - db.executeSql("SELECT 1", new PrintingHandler()).await(); + try (Query q = db.borrowQuery()) { + q.sql("SELECT 1").handler(new PrintingHandler()).submit().await(); + } } - // 3. Custom pool sizing and timeouts via the builder. Use this when - // ingest and egress use separate address lists, or when you need to - // override defaults. + // 3. Custom pool sizing and timeouts via the builder. One cluster config + // (a single addr server list) drives both pools; use the builder to + // override pool/timeout defaults. try (QuestDB db = QuestDB.builder() - .ingestConfig("ws::addr=ingest.cluster:9000;") - .queryConfig("ws::addr=read-replica.cluster:9000;") + .fromConfig("ws::addr=node1.cluster:9000,node2.cluster:9000;") .senderPoolSize(8) .queryPoolSize(4) .acquireTimeoutMillis(10_000) .build()) { // ... use db ... - db.executeSql("SELECT 1", new PrintingHandler()).await(); + try (Query q = db.borrowQuery()) { + q.sql("SELECT 1").handler(new PrintingHandler()).submit().await(); + } } } @@ -84,15 +87,17 @@ public static void main(String[] args) throws Exception { * returns normally; either way the Completion reaches a terminal state. */ static void cancelExample(QuestDB db) { - Completion c = db.executeSql( - "SELECT * FROM big_table ORDER BY ts", - new PrintingHandler()); - // ... some condition decides to abort ... - c.cancel(); - try { - c.await(); - } catch (Exception cancelled) { - // expected when cancel won the race + try (Query q = db.borrowQuery()) { + Completion c = q.sql("SELECT * FROM big_table ORDER BY ts") + .handler(new PrintingHandler()) + .submit(); + // ... some condition decides to abort ... + c.cancel(); + try { + c.await(); + } catch (Exception cancelled) { + // expected when cancel won the race + } } } @@ -113,62 +118,42 @@ static void ingestWithBorrowedSender(QuestDB db) { } /** - * Thread-affine Sender: the first call on a thread leases one and pins it; - * subsequent calls on the same thread return the same instance with zero - * borrow overhead. Best for long-lived dedicated producer threads. - *

    - * Call {@link QuestDB#releaseSender()} on threads borrowed from pools you - * don't own (Netty event loops, etc.) before they're recycled. - */ - static void ingestWithThreadAffineSender(QuestDB db) { - Sender s = db.sender(); - for (int i = 0; i < 1_000; i++) { - s.table("trades") - .symbol("symbol", "BTC-USD") - .doubleColumn("price", 42_500.50 + i) - .longColumn("size", 100) - .atNow(); - } - s.flush(); - // Not strictly required: db.close() reaps pinned Senders. Call it - // only when handing this thread back to a foreign pool. - // db.releaseSender(); - } - - /** - * One-shot query, no bind parameters. {@link QuestDB#executeSql} returns - * a {@link Completion} that you can {@code await()} synchronously, time + * One-shot query, no bind parameters. Borrow a {@link Query} handle, + * submit, await, and close it (try-with-resources). {@code submit()} + * returns a {@link Completion} you can {@code await()} synchronously, time * out on, or cancel. */ static void queryOneShot(QuestDB db) throws InterruptedException { - Completion c = db.executeSql( - "SELECT price FROM trades WHERE symbol = 'BTC-USD' LIMIT 10", - new PrintingHandler()); - c.await(); + try (Query q = db.borrowQuery()) { + q.sql("SELECT price FROM trades WHERE symbol = 'BTC-USD' LIMIT 10") + .handler(new PrintingHandler()) + .submit() + .await(); + } } /** - * Query with bind parameters. Use {@link QuestDB#query()} to get the - * per-thread Query builder, then set SQL, binds (via QwpBindSetter), and - * handler. + * Query with bind parameters. Borrow a {@link Query} handle, then set SQL, + * binds (via QwpBindSetter), and handler. *

    * The same SQL text reuses the server's compiled-factory cache -- bind * values supply the per-call inputs. Interpolating values into the SQL * string defeats that cache. */ static void queryWithBinds(QuestDB db) throws InterruptedException { - Query q = db.query() - .sql("SELECT price FROM trades WHERE symbol = $1 LIMIT $2") - .binds(binds -> { - binds.setVarchar(0, "BTC-USD"); - binds.setLong(1, 10L); - }) - .handler(new PrintingHandler()); - Completion c = q.submit(); - // Optional timeout: returns false if the query is still in flight. - if (!c.await(5, TimeUnit.SECONDS)) { - c.cancel(); - c.await(); + try (Query q = db.borrowQuery()) { + q.sql("SELECT price FROM trades WHERE symbol = $1 LIMIT $2") + .binds(binds -> { + binds.setVarchar(0, "BTC-USD"); + binds.setLong(1, 10L); + }) + .handler(new PrintingHandler()); + Completion c = q.submit(); + // Optional timeout: returns false if the query is still in flight. + if (!c.await(5, TimeUnit.SECONDS)) { + c.cancel(); + c.await(); + } } } diff --git a/core/src/test/java/io/questdb/client/test/impl/ConfigViewTest.java b/core/src/test/java/io/questdb/client/test/impl/ConfigViewTest.java index 38891719..d8258c3b 100644 --- a/core/src/test/java/io/questdb/client/test/impl/ConfigViewTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/ConfigViewTest.java @@ -129,6 +129,35 @@ public void testGetLongNonNumericRejected() { "invalid auth_timeout_ms: abc"); } + @Test + public void testGetBoolAcceptsTrueFalseOnOff() { + Assert.assertTrue(view("ws::addr=h:9000;lazy_connect=true;").getBool("lazy_connect", false)); + Assert.assertTrue(view("ws::addr=h:9000;lazy_connect=on;").getBool("lazy_connect", false)); + Assert.assertFalse(view("ws::addr=h:9000;lazy_connect=false;").getBool("lazy_connect", true)); + Assert.assertFalse(view("ws::addr=h:9000;lazy_connect=off;").getBool("lazy_connect", true)); + // absent key -> caller's default, both polarities + Assert.assertTrue(view("ws::addr=h:9000;").getBool("lazy_connect", true)); + Assert.assertFalse(view("ws::addr=h:9000;").getBool("lazy_connect", false)); + } + + @Test + public void testGetBoolInvalidRejected() { + assertParseError("ws::addr=h:9000;lazy_connect=maybe;", + v -> v.getBool("lazy_connect", false), + "invalid lazy_connect: maybe (expected true, false, on, off)"); + } + + @Test + public void testGetBoolIsCaseSensitive() { + // The connect-string value surface is exact-match lowercase: the + // tokenizer preserves value case and getBool accepts only + // true/false/on/off, so TRUE is rejected loudly rather than silently + // coerced (or worse, silently treated as the default). + assertParseError("ws::addr=h:9000;lazy_connect=TRUE;", + v -> v.getBool("lazy_connect", false), + "invalid lazy_connect: TRUE (expected true, false, on, off)"); + } + @Test public void testGetBoolOnOffInvalidRejected() { assertParseError("ws::addr=h:9000;failover=maybe;", diff --git a/core/src/test/java/io/questdb/client/test/impl/PoolConfigHonoredTest.java b/core/src/test/java/io/questdb/client/test/impl/PoolConfigHonoredTest.java index 34ba4d1a..a8c94e40 100644 --- a/core/src/test/java/io/questdb/client/test/impl/PoolConfigHonoredTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/PoolConfigHonoredTest.java @@ -28,6 +28,7 @@ import io.questdb.client.QuestDBBuilder; import io.questdb.client.impl.ConfigSchema; import io.questdb.client.impl.Side; +import io.questdb.client.test.tools.TestUtils; import org.junit.Assert; import org.junit.Test; @@ -43,40 +44,49 @@ public class PoolConfigHonoredTest { @Test - public void testEveryPoolKeyIsHonored() { - // Drive both the value assertions and the drift guard from one map, so the - // coverage check cannot drift from what is actually asserted. min=0 keys - // let build() resolve the pool keys without pre-warming/connecting. Pool - // sizes resolve to int, the timeouts to long (the snapshot's boxed types). - Map expected = new LinkedHashMap<>(); - expected.put("sender_pool_min", 0); - expected.put("sender_pool_max", 7); - expected.put("query_pool_min", 0); - expected.put("query_pool_max", 5); - expected.put("acquire_timeout_ms", 1234L); - expected.put("idle_timeout_ms", 4321L); - expected.put("max_lifetime_ms", 98765L); - expected.put("housekeeper_interval_ms", 222L); + public void testEveryPoolKeyIsHonored() throws Exception { + TestUtils.assertMemoryLeak(() -> { + // Drive both the value assertions and the drift guard from one map, so the + // coverage check cannot drift from what is actually asserted. min=0 keys + // let build() resolve the pool keys without pre-warming/connecting. Pool + // sizes resolve to int, the timeouts to long (the snapshot's boxed types). + Map expected = new LinkedHashMap<>(); + expected.put("sender_pool_min", 0); + expected.put("sender_pool_max", 7); + expected.put("query_pool_min", 0); + expected.put("query_pool_max", 5); + expected.put("acquire_timeout_ms", 1234L); + expected.put("query_close_timeout_ms", 2468L); + expected.put("idle_timeout_ms", 4321L); + expected.put("max_lifetime_ms", 98765L); + expected.put("housekeeper_interval_ms", 222L); - StringBuilder cfg = new StringBuilder("ws::addr=127.0.0.1:1;"); - for (Map.Entry e : expected.entrySet()) { - cfg.append(e.getKey()).append('=').append(e.getValue()).append(';'); - } - QuestDBBuilder b = QuestDB.builder().fromConfig(cfg.toString()); - b.build().close(); + StringBuilder cfg = new StringBuilder("ws::addr=127.0.0.1:1;"); + for (Map.Entry e : expected.entrySet()) { + cfg.append(e.getKey()).append('=').append(e.getValue()).append(';'); + } + QuestDBBuilder b = QuestDB.builder().fromConfig(cfg.toString()); + b.build().close(); - Map snap = b.poolConfigSnapshotForTest(); - for (Map.Entry e : expected.entrySet()) { - Assert.assertEquals("pool key '" + e.getKey() + "' not honored", e.getValue(), snap.get(e.getKey())); - } + Map snap = b.poolConfigSnapshotForTest(); + for (Map.Entry e : expected.entrySet()) { + Assert.assertEquals("pool key '" + e.getKey() + "' not honored", e.getValue(), snap.get(e.getKey())); + } - // Drift guard: every POOL registry key must appear in the map that drove - // the assertions above, so a new pool key with no assertion trips this. - for (ConfigSchema.KeySpec spec : ConfigSchema.all()) { - if (spec.side() == Side.POOL) { - Assert.assertTrue("registry pool key '" + spec.name() + "' has no honored assertion", - expected.containsKey(spec.name())); + // Drift guard: every POOL registry key must appear in the map that drove + // the assertions above, so a new pool key with no assertion trips this. + for (ConfigSchema.KeySpec spec : ConfigSchema.all()) { + if (spec.side() == Side.POOL) { + // lazy_connect is a facade flag (build()'s tolerant-startup + // branch, covered by QuestDBLazyConnectTest), not a numeric + // pool-sizing knob resolved into the snapshot. + if ("lazy_connect".equals(spec.name())) { + continue; + } + Assert.assertTrue("registry pool key '" + spec.name() + "' has no honored assertion", + expected.containsKey(spec.name())); + } } - } + }); } } diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolErrorSafetyTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolErrorSafetyTest.java index 3994a1d2..3ef9a1b0 100644 --- a/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolErrorSafetyTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolErrorSafetyTest.java @@ -30,11 +30,10 @@ import io.questdb.client.impl.QueryWorker; import io.questdb.client.std.MemoryTag; import io.questdb.client.std.Unsafe; +import io.questdb.client.test.tools.TestUtils; import org.junit.Assert; import org.junit.Test; -import java.lang.reflect.Constructor; -import java.lang.reflect.Method; import java.util.concurrent.atomic.AtomicInteger; import java.util.function.Consumer; @@ -44,8 +43,8 @@ // OutOfMemoryError); the old catches let that Error skip cleanup. // // QwpQueryClient is a concrete class with no fake seam, so these tests inject an -// Error at the real connect step via the package-private connectHook constructor -// (reached by reflection -- the main module is declared `open`). fromConfig() +// Error at the real connect step via the public connectHook constructor. +// fromConfig() // still runs for real, committing the NATIVE_DEFAULT scratch the cleanup must // reclaim, so the memory assertions are meaningful. public class QueryClientPoolErrorSafetyTest { @@ -61,22 +60,24 @@ public class QueryClientPoolErrorSafetyTest { // GREEN: catch (Throwable) -> client.close() runs -> no leak. @Test(timeout = 30_000) public void acquireDoesNotLeakNativeScratchOnErrorFromConnect() throws Exception { - QueryClientPool pool = newPool(CFG, 0, 1, 250, alwaysThrow()); - try { - long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + TestUtils.assertMemoryLeak(() -> { + QueryClientPool pool = newPool(CFG, 0, 1, 250, alwaysThrow()); try { - pool.acquire(); - Assert.fail("expected acquire() to propagate the injected Error"); - } catch (Throwable expected) { - // wrapped or raw -- the leak check is the discriminator + long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + try { + pool.acquire(); + Assert.fail("expected acquire() to propagate the injected Error"); + } catch (Throwable expected) { + // wrapped or raw -- the leak check is the discriminator + } + long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + Assert.assertEquals( + "acquire() leaked NATIVE_DEFAULT scratch on an Error from connect()", + baseline, after); + } finally { + pool.close(); } - long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); - Assert.assertEquals( - "acquire() leaked NATIVE_DEFAULT scratch on an Error from connect()", - baseline, after); - } finally { - pool.close(); - } + }); } // Site: acquire() outer catch around createUnlocked()/start(). An Error must @@ -86,35 +87,37 @@ public void acquireDoesNotLeakNativeScratchOnErrorFromConnect() throws Exception // GREEN: catch (Throwable) -> inFlightCreations restored to 0. @Test(timeout = 30_000) public void acquireRestoresInFlightCreationsOnErrorFromConnect() throws Exception { - QueryClientPool pool = newPool(CFG, 0, 1, 250, alwaysThrow()); - try { + TestUtils.assertMemoryLeak(() -> { + QueryClientPool pool = newPool(CFG, 0, 1, 250, alwaysThrow()); try { - pool.acquire(); - Assert.fail("expected acquire() to propagate the injected Error"); - } catch (Throwable expected) { - // expected - } + try { + pool.acquire(); + Assert.fail("expected acquire() to propagate the injected Error"); + } catch (Throwable expected) { + // expected + } - Assert.assertEquals( - "acquire() leaked an in-flight creation slot on an Error from connect()", - 0, inFlightCreations(pool)); + Assert.assertEquals( + "acquire() leaked an in-flight creation slot on an Error from connect()", + 0, inFlightCreations(pool)); - // Corollary: capacity is usable again -- the next acquire() must - // reach the creation path (and fail there) rather than time out. - try { - pool.acquire(); - Assert.fail("expected second acquire() to re-attempt creation"); - } catch (QueryException e) { - Assert.assertFalse( - "pool wedged: second acquire() timed out -> capacity permanently lost (" - + e.getMessage() + ")", - e.getMessage() != null && e.getMessage().contains("timed out")); - } catch (Throwable injectedAgain) { - // also fine: the Error surfaced again from the re-attempt + // Corollary: capacity is usable again -- the next acquire() must + // reach the creation path (and fail there) rather than time out. + try { + pool.acquire(); + Assert.fail("expected second acquire() to re-attempt creation"); + } catch (QueryException e) { + Assert.assertFalse( + "pool wedged: second acquire() timed out -> capacity permanently lost (" + + e.getMessage() + ")", + e.getMessage() != null && e.getMessage().contains("timed out")); + } catch (Throwable injectedAgain) { + // also fine: the Error surfaced again from the re-attempt + } + } finally { + pool.close(); } - } finally { - pool.close(); - } + }); } // Site: constructor prewarm outer catch. An Error mid-prewarm must run the @@ -124,25 +127,27 @@ public void acquireRestoresInFlightCreationsOnErrorFromConnect() throws Exceptio // GREEN: catch (Throwable) -> cleanup loop closes it -> no leak. @Test(timeout = 30_000) public void preWarmDoesNotLeakNativeScratchOnErrorFromConnect() throws Exception { - long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); - // First connect() succeeds (no-op, leaves the client unconnected but - // built); the second throws an Error mid-prewarm. - AtomicInteger calls = new AtomicInteger(); - Consumer hook = client -> { - if (calls.incrementAndGet() >= 2) { - throw new AssertionError("injected native connect failure"); + TestUtils.assertMemoryLeak(() -> { + long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + // First connect() succeeds (no-op, leaves the client unconnected but + // built); the second throws an Error mid-prewarm. + AtomicInteger calls = new AtomicInteger(); + Consumer hook = client -> { + if (calls.incrementAndGet() >= 2) { + throw new AssertionError("injected native connect failure"); + } + }; + try { + newPool(CFG, 2, 2, 250, hook); + Assert.fail("expected prewarm to propagate the injected Error"); + } catch (Throwable expected) { + // expected -- construction aborts } - }; - try { - newPool(CFG, 2, 2, 250, hook); - Assert.fail("expected prewarm to propagate the injected Error"); - } catch (Throwable expected) { - // expected -- construction aborts - } - long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); - Assert.assertEquals( - "prewarm leaked NATIVE_DEFAULT scratch of an already-built worker on an Error", - baseline, after); + long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + Assert.assertEquals( + "prewarm leaked NATIVE_DEFAULT scratch of an already-built worker on an Error", + baseline, after); + }); } // Site: acquire() outer catch around createUnlocked()/start(). When start() @@ -155,26 +160,28 @@ public void preWarmDoesNotLeakNativeScratchOnErrorFromConnect() throws Exception // GREEN: catch calls created.shutdown() -> client.close() -> no leak. @Test(timeout = 30_000) public void acquireDoesNotLeakNativeScratchOnErrorFromStart() throws Exception { - QueryClientPool pool = newPool(CFG, 0, 1, 250, noConnect(), alwaysThrowStart()); - try { - long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + TestUtils.assertMemoryLeak(() -> { + QueryClientPool pool = newPool(CFG, 0, 1, 250, noConnect(), alwaysThrowStart()); try { - pool.acquire(); - Assert.fail("expected acquire() to propagate the injected start Error"); - } catch (Throwable expected) { - // wrapped or raw -- the leak check is the discriminator + long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + try { + pool.acquire(); + Assert.fail("expected acquire() to propagate the injected start Error"); + } catch (Throwable expected) { + // wrapped or raw -- the leak check is the discriminator + } + long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + Assert.assertEquals( + "acquire() leaked NATIVE_DEFAULT scratch on an Error from start()", + baseline, after); + // The reservation must also be restored so the pool is not wedged. + Assert.assertEquals( + "acquire() leaked an in-flight creation slot on an Error from start()", + 0, inFlightCreations(pool)); + } finally { + pool.close(); } - long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); - Assert.assertEquals( - "acquire() leaked NATIVE_DEFAULT scratch on an Error from start()", - baseline, after); - // The reservation must also be restored so the pool is not wedged. - Assert.assertEquals( - "acquire() leaked an in-flight creation slot on an Error from start()", - 0, inFlightCreations(pool)); - } finally { - pool.close(); - } + }); } // Site: constructor prewarm. When start() throws after createUnlocked() @@ -186,30 +193,32 @@ public void acquireDoesNotLeakNativeScratchOnErrorFromStart() throws Exception { // GREEN: the pending-worker teardown closes it -> no leak. @Test(timeout = 30_000) public void preWarmDoesNotLeakNativeScratchOnErrorFromStart() throws Exception { - long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); - // First worker is admitted to `all` (start is a no-op here -- the test - // never runs queries, and an unstarted thread keeps the later - // shutdown()'s join() instant); the second throws at start() after its - // client is fully built. That second worker is the stranded one: it - // never made it into `all`, so only the new pending-worker teardown - // closes it. The assertion catches a leak of EITHER worker's scratch. - AtomicInteger calls = new AtomicInteger(); - Consumer startHook = w -> { - if (calls.incrementAndGet() >= 2) { - throw new AssertionError("injected thread-start failure"); + TestUtils.assertMemoryLeak(() -> { + long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + // First worker is admitted to `all` (start is a no-op here -- the test + // never runs queries, and an unstarted thread keeps the later + // shutdown()'s join() instant); the second throws at start() after its + // client is fully built. That second worker is the stranded one: it + // never made it into `all`, so only the new pending-worker teardown + // closes it. The assertion catches a leak of EITHER worker's scratch. + AtomicInteger calls = new AtomicInteger(); + Consumer startHook = w -> { + if (calls.incrementAndGet() >= 2) { + throw new AssertionError("injected thread-start failure"); + } + // first worker: leave the dispatch thread unstarted (see above) + }; + try { + newPool(CFG, 2, 2, 250, noConnect(), startHook); + Assert.fail("expected prewarm to propagate the injected start Error"); + } catch (Throwable expected) { + // expected -- construction aborts } - // first worker: leave the dispatch thread unstarted (see above) - }; - try { - newPool(CFG, 2, 2, 250, noConnect(), startHook); - Assert.fail("expected prewarm to propagate the injected start Error"); - } catch (Throwable expected) { - // expected -- construction aborts - } - long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); - Assert.assertEquals( - "prewarm leaked NATIVE_DEFAULT scratch of a start()-failed worker", - baseline, after); + long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + Assert.assertEquals( + "prewarm leaked NATIVE_DEFAULT scratch of a start()-failed worker", + baseline, after); + }); } private static Consumer alwaysThrow() { @@ -232,30 +241,21 @@ private static Consumer alwaysThrowStart() { }; } - private static int inFlightCreations(QueryClientPool pool) throws Exception { - Method m = QueryClientPool.class.getDeclaredMethod("inFlightCreations"); - m.setAccessible(true); - return (int) m.invoke(pool); + private static int inFlightCreations(QueryClientPool pool) { + return pool.inFlightCreations(); } private static QueryClientPool newPool( String cfg, int min, int max, long acquireMs, Consumer connectHook - ) throws Exception { - Constructor c = QueryClientPool.class.getDeclaredConstructor( - String.class, int.class, int.class, long.class, long.class, long.class, Consumer.class); - c.setAccessible(true); - return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, connectHook); + ) { + return new QueryClientPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, connectHook); } private static QueryClientPool newPool( String cfg, int min, int max, long acquireMs, Consumer connectHook, Consumer startHook - ) throws Exception { - Constructor c = QueryClientPool.class.getDeclaredConstructor( - String.class, int.class, int.class, long.class, long.class, long.class, - Consumer.class, Consumer.class); - c.setAccessible(true); - return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, + ) { + return new QueryClientPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, connectHook, startHook); } } diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolLeakTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolLeakTest.java index 53def4b5..a32a488d 100644 --- a/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolLeakTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolLeakTest.java @@ -27,6 +27,7 @@ import io.questdb.client.impl.QueryClientPool; import io.questdb.client.std.MemoryTag; import io.questdb.client.std.Unsafe; +import io.questdb.client.test.tools.TestUtils; import org.junit.Assert; import org.junit.Test; @@ -57,51 +58,55 @@ public class QueryClientPoolLeakTest { @Test(timeout = 10_000) public void acquireDoesNotLeakNativeScratchOnConnectFailure() throws Exception { - try (FakeStatusServer rejecter = new FakeStatusServer(421, "X-QuestDB-Role: REPLICA")) { - rejecter.start(); - String cfg = "ws::addr=127.0.0.1:" + rejecter.port() - + ";target=primary;failover=off;auth_timeout_ms=1000;"; - - QueryClientPool pool = new QueryClientPool( - cfg, 0, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE); - try { - long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + TestUtils.assertMemoryLeak(() -> { + try (FakeStatusServer rejecter = new FakeStatusServer(421, "X-QuestDB-Role: REPLICA")) { + rejecter.start(); + String cfg = "ws::addr=127.0.0.1:" + rejecter.port() + + ";target=primary;failover=off;auth_timeout_ms=1000;"; + + QueryClientPool pool = new QueryClientPool( + cfg, 0, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE); try { - pool.acquire(); - Assert.fail("expected acquire() to throw on connect rejection"); - } catch (RuntimeException expected) { - // QueryException wrapping the underlying connect failure. + long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + try { + pool.acquire(); + Assert.fail("expected acquire() to throw on connect rejection"); + } catch (RuntimeException expected) { + // QueryException wrapping the underlying connect failure. + } + long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + Assert.assertEquals( + "acquire() leaked NATIVE_DEFAULT bytes on connect failure", + baseline, after); + } finally { + pool.close(); } - long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); - Assert.assertEquals( - "acquire() leaked NATIVE_DEFAULT bytes on connect failure", - baseline, after); - } finally { - pool.close(); } - } + }); } @Test(timeout = 10_000) public void preWarmDoesNotLeakNativeScratchOnConnectFailure() throws Exception { - try (FakeStatusServer rejecter = new FakeStatusServer(421, "X-QuestDB-Role: REPLICA")) { - rejecter.start(); - String cfg = "ws::addr=127.0.0.1:" + rejecter.port() - + ";target=primary;failover=off;auth_timeout_ms=1000;"; - - long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); - try { - new QueryClientPool(cfg, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE); - Assert.fail("expected QueryClientPool ctor to throw on connect rejection"); - } catch (RuntimeException expected) { - // target=primary against role=REPLICA yields a connect failure - // out of createUnlocked(). + TestUtils.assertMemoryLeak(() -> { + try (FakeStatusServer rejecter = new FakeStatusServer(421, "X-QuestDB-Role: REPLICA")) { + rejecter.start(); + String cfg = "ws::addr=127.0.0.1:" + rejecter.port() + + ";target=primary;failover=off;auth_timeout_ms=1000;"; + + long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + try { + new QueryClientPool(cfg, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE); + Assert.fail("expected QueryClientPool ctor to throw on connect rejection"); + } catch (RuntimeException expected) { + // target=primary against role=REPLICA yields a connect failure + // out of createUnlocked(). + } + long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); + Assert.assertEquals( + "pool ctor leaked NATIVE_DEFAULT bytes on connect failure", + baseline, after); } - long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT); - Assert.assertEquals( - "pool ctor leaked NATIVE_DEFAULT bytes on connect failure", - baseline, after); - } + }); } private static final class FakeStatusServer implements AutoCloseable { diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryCloseDrainTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryCloseDrainTest.java new file mode 100644 index 00000000..76a4adb7 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/impl/QueryCloseDrainTest.java @@ -0,0 +1,174 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.impl; + +import io.questdb.client.cutlass.qwp.client.QwpQueryClient; +import io.questdb.client.impl.QueryClientPool; +import io.questdb.client.impl.QueryWorker; +import io.questdb.client.test.tools.TestUtils; +import org.junit.Assert; +import org.junit.Test; + +import java.lang.reflect.Field; +import java.lang.reflect.Method; +import java.util.ArrayList; +import java.util.function.Consumer; + +/** + * Regression tests for the bounded, interruptible {@code Query.close()} drain. + * When a submit is still in flight at close() time, the old drain blocked the + * caller unbounded and uninterruptibly on the terminal event (and could hang + * forever if a racing {@code QuestDB.close()} stranded it). The drain now waits + * at most {@code closeQueryTimeoutMillis}, an interrupt aborts it, and a worker + * that fails to drain in time is discarded -- its connection may still carry + * late frames for the abandoned query -- rather than returned to the pool. + *

    + * White-box style: a no-op connect hook builds workers without a network, and + * the in-flight state is simulated by setting {@code QueryImpl.done=false} + * reflectively, so no server or real {@code execute()} is needed to exercise + * the close() drain logic. + */ +public class QueryCloseDrainTest { + + private static final String CFG = "ws::addr=127.0.0.1:1;"; + private static final Consumer NO_CONNECT = c -> { + }; + + @Test(timeout = 30_000) + public void testCloseDiscardsWorkerWhenDrainTimesOut() throws Exception { + TestUtils.assertMemoryLeak(() -> { + try (QueryClientPool pool = new QueryClientPool( + CFG, 0, 2, 1_000L, Long.MAX_VALUE, Long.MAX_VALUE, NO_CONNECT)) { + setCloseQueryTimeout(pool, 150L); + QueryWorker w = pool.acquire(); + long gen = generation(w); + setDone(w, false); // pretend a submit is in flight; nothing will ever signal done + + long startNanos = System.nanoTime(); + closeQuery(w, gen); + long elapsedMs = (System.nanoTime() - startNanos) / 1_000_000; + + Assert.assertTrue("close() must wait about the close budget, elapsed=" + elapsedMs, + elapsedMs >= 120); + Assert.assertTrue("close() must be bounded, not block unbounded, elapsed=" + elapsedMs, + elapsedMs < 5_000); + Assert.assertFalse("a worker that did not drain must be discarded, not returned to the pool", + allWorkers(pool).contains(w)); + Assert.assertEquals("the discarded worker must leave the pool so it can grow a fresh one", + 0, allWorkers(pool).size()); + Assert.assertFalse("the discarded worker's dispatch thread must have exited", + dispatchThread(w).isAlive()); + } + }); + } + + @Test(timeout = 30_000) + public void testCloseIsInterruptible() throws Exception { + TestUtils.assertMemoryLeak(() -> { + try (QueryClientPool pool = new QueryClientPool( + CFG, 0, 2, 1_000L, Long.MAX_VALUE, Long.MAX_VALUE, NO_CONNECT)) { + // A long budget: the only way close() can return promptly is by + // honoring the caller's interrupt. + setCloseQueryTimeout(pool, 60_000L); + QueryWorker w = pool.acquire(); + long gen = generation(w); + setDone(w, false); + + Thread.currentThread().interrupt(); + long startNanos = System.nanoTime(); + closeQuery(w, gen); + long elapsedMs = (System.nanoTime() - startNanos) / 1_000_000; + + Assert.assertTrue("close() must preserve the caller's interrupt flag", Thread.interrupted()); + Assert.assertTrue("interrupt must abort the drain promptly, elapsed=" + elapsedMs, + elapsedMs < 5_000); + Assert.assertFalse("an interrupted close() must discard the worker", + allWorkers(pool).contains(w)); + } + }); + } + + @Test(timeout = 30_000) + public void testCloseReturnsWorkerWhenAlreadyDrained() throws Exception { + TestUtils.assertMemoryLeak(() -> { + try (QueryClientPool pool = new QueryClientPool( + CFG, 0, 2, 1_000L, Long.MAX_VALUE, Long.MAX_VALUE, NO_CONNECT)) { + setCloseQueryTimeout(pool, 150L); + QueryWorker w = pool.acquire(); + long gen = generation(w); + // done stays true (no in-flight submit): close() must take the fast + // path and return the worker to the pool for reuse, not discard it. + closeQuery(w, gen); + Assert.assertTrue("an already-drained worker must be returned to the pool, not discarded", + allWorkers(pool).contains(w)); + } + }); + } + + @SuppressWarnings("unchecked") + private static ArrayList allWorkers(QueryClientPool pool) throws Exception { + Field f = QueryClientPool.class.getDeclaredField("all"); + f.setAccessible(true); + return (ArrayList) f.get(pool); + } + + private static void closeQuery(QueryWorker w, long gen) throws Exception { + Object impl = queryImpl(w); + Method close = impl.getClass().getDeclaredMethod("close", long.class); + close.setAccessible(true); + close.invoke(impl, gen); + } + + private static Thread dispatchThread(QueryWorker w) throws Exception { + Field f = QueryWorker.class.getDeclaredField("thread"); + f.setAccessible(true); + return (Thread) f.get(w); + } + + private static long generation(QueryWorker w) throws Exception { + Method m = QueryWorker.class.getDeclaredMethod("generation"); + m.setAccessible(true); + return (long) m.invoke(w); + } + + private static Object queryImpl(QueryWorker w) throws Exception { + Field queryF = QueryWorker.class.getDeclaredField("query"); + queryF.setAccessible(true); + return queryF.get(w); + } + + private static void setCloseQueryTimeout(QueryClientPool pool, long millis) throws Exception { + Field f = QueryClientPool.class.getDeclaredField("closeQueryTimeoutMillis"); + f.setAccessible(true); + f.setLong(pool, millis); + } + + private static void setDone(QueryWorker w, boolean done) throws Exception { + Object impl = queryImpl(w); + Field doneF = impl.getClass().getDeclaredField("done"); + doneF.setAccessible(true); + doneF.setBoolean(impl, done); + } +} diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryImplResetTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryImplResetTest.java index 1ff33b76..bfe3c24f 100644 --- a/core/src/test/java/io/questdb/client/test/impl/QueryImplResetTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/QueryImplResetTest.java @@ -24,11 +24,12 @@ package io.questdb.client.test.impl; -import io.questdb.client.Query; import io.questdb.client.cutlass.qwp.client.QwpBindSetter; import io.questdb.client.cutlass.qwp.client.QwpColumnBatch; import io.questdb.client.cutlass.qwp.client.QwpColumnBatchHandler; import io.questdb.client.cutlass.qwp.client.QwpServerInfo; +import io.questdb.client.std.str.StringSink; +import io.questdb.client.test.tools.TestUtils; import org.junit.Assert; import org.junit.Test; @@ -39,113 +40,101 @@ public class QueryImplResetTest { /** - * Regression test for the state-carryover bug between consecutive - * submits on the per-thread {@code QuestDB#query()} handle. + * The Javadoc on both {@code Query} and {@code QuestDB#borrowQuery()} + * promises the leased handle is handed out "reset to empty". The reset is + * {@code QueryImpl.resetForBorrow()}, invoked from {@code QueryWorker.lease()} + * when {@code borrowQuery()} hands the pre-allocated handle out. It must + * clear the builder state (SQL, binds, handler) so a follow-up + * {@code submit()} cannot silently reuse a prior borrow's handler/binds, + * and it must leave the handle idle (done). *

    - * The Javadoc on both {@code Query} and {@code QuestDB#query()} promises - * that the returned instance is "reset to empty" / "in a reset state". - * Before the fix, {@code QuestDBImpl.query()} returned the bare - * thread-local without nulling {@code userHandler} / {@code userBinds}, - * so the second call below would silently reuse {@code h1}: - *

    -     *   db.query().sql("SELECT 1").handler(h1).submit().await();
    -     *   db.query().sql("SELECT 2").submit();    // no .handler() -- reuses h1
    -     * 
    - * The {@code if (userHandler == null)} check in {@code submit()} could - * not catch the misuse because the field was still set from the prior - * submit. - *

    - * The fix is {@code QueryImpl.resetIfDone()}, invoked from - * {@code QuestDBImpl.query()} before the per-thread handle is returned. - * This test reaches into {@code QueryImpl} via reflection (the class is - * package-private and lives in a different package from this test) and - * asserts the reset clears all three configured fields when the prior - * run is in a terminal state. + * The reset is unconditional: the leased worker was just acquired from the + * pool, so it is always idle (done) at borrow time. This test reaches into + * {@code QueryImpl} by reflection (the class is package-private and lives + * in a different package from this test). Builder state is seeded directly + * via reflection rather than through the {@code Query} API because the + * lease-generation guard on the setters would dereference the (null) worker. */ @Test - public void testResetIfDoneClearsBuilderStateInTerminalState() throws Exception { - Class queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl"); - Class poolClass = Class.forName("io.questdb.client.impl.QueryClientPool"); - - Constructor ctor = queryImplClass.getDeclaredConstructor(poolClass); - ctor.setAccessible(true); - // QueryImpl never dereferences the pool outside of submit(); a null - // pool is fine for this state-only test. - Query q = (Query) ctor.newInstance(new Object[]{null}); - - // Mirror the post-submit().await() state: builder fields set, - // done flag true (the constructor default). - QwpColumnBatchHandler h = new NoopHandler(); - QwpBindSetter b = values -> { - // no-op - }; - q.sql("SELECT 1").binds(b).handler(h); - - Method reset = queryImplClass.getDeclaredMethod("resetIfDone"); - reset.setAccessible(true); - reset.invoke(q); - - Field handlerF = queryImplClass.getDeclaredField("userHandler"); - Field bindsF = queryImplClass.getDeclaredField("userBinds"); - Field sqlBufF = queryImplClass.getDeclaredField("sqlBuffer"); - handlerF.setAccessible(true); - bindsF.setAccessible(true); - sqlBufF.setAccessible(true); - - Assert.assertNull("userHandler must be cleared so a follow-up submit() without .handler() fails fast", - handlerF.get(q)); - Assert.assertNull("userBinds must be cleared so a follow-up submit() without .binds() does not reuse the prior setter", - bindsF.get(q)); - CharSequence sqlBuffer = (CharSequence) sqlBufF.get(q); - Assert.assertEquals("sqlBuffer must be empty so a follow-up submit() without .sql() throws 'sql is required'", - 0, sqlBuffer.length()); + public void testResetForBorrowClearsBuilderState() throws Exception { + TestUtils.assertMemoryLeak(() -> { + Class queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl"); + Class workerClass = Class.forName("io.questdb.client.impl.QueryWorker"); + + Constructor ctor = queryImplClass.getDeclaredConstructor(workerClass); + ctor.setAccessible(true); + // resetForBorrow() never dereferences the worker; a null worker is fine + // for this state-only test. + Object q = ctor.newInstance(new Object[]{null}); + + Field handlerF = queryImplClass.getDeclaredField("userHandler"); + Field bindsF = queryImplClass.getDeclaredField("userBinds"); + Field sqlBufF = queryImplClass.getDeclaredField("sqlBuffer"); + Field doneF = queryImplClass.getDeclaredField("done"); + handlerF.setAccessible(true); + bindsF.setAccessible(true); + sqlBufF.setAccessible(true); + doneF.setAccessible(true); + + // Seed builder state as a prior borrow would have left it. + handlerF.set(q, new NoopHandler()); + bindsF.set(q, (QwpBindSetter) values -> { + // no-op + }); + ((StringSink) sqlBufF.get(q)).put("SELECT 1"); + doneF.setBoolean(q, false); + + Method reset = queryImplClass.getDeclaredMethod("resetForBorrow"); + reset.setAccessible(true); + reset.invoke(q); + + Assert.assertNull("userHandler must be cleared so a follow-up submit() without .handler() fails fast", + handlerF.get(q)); + Assert.assertNull("userBinds must be cleared so a follow-up submit() without .binds() does not reuse the prior setter", + bindsF.get(q)); + CharSequence sqlBuffer = (CharSequence) sqlBufF.get(q); + Assert.assertEquals("sqlBuffer must be empty so a follow-up submit() without .sql() throws 'sql is required'", + 0, sqlBuffer.length()); + Assert.assertTrue("done must be true so the handle starts idle, not in flight", + doneF.getBoolean(q)); + }); } /** - * Symmetric guard: when a submit is in flight ({@code done == false}), - * {@code resetIfDone()} must NOT touch the configured fields. The - * dispatched worker thread is reading {@code sqlBuffer} in - * {@code runOn()} and {@code userHandler} via the wrapping handler; - * clearing them mid-flight would race. + * {@code QuestDB#borrowQuery()} returns a thin lease that is freshly + * allocated per borrow, but the heavy state it wraps -- the per-worker + * {@code QueryImpl} -- is pre-allocated once and reused across borrows. This + * pins that contract: two {@code lease()} calls on the same worker return + * distinct lease wrappers that delegate to the same pooled {@code QueryImpl}. + * Reaches both package-private classes by reflection. */ @Test - public void testResetIfDoneIsNoOpWhileSubmitInFlight() throws Exception { - Class queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl"); - Class poolClass = Class.forName("io.questdb.client.impl.QueryClientPool"); - - Constructor ctor = queryImplClass.getDeclaredConstructor(poolClass); - ctor.setAccessible(true); - Query q = (Query) ctor.newInstance(new Object[]{null}); - - QwpColumnBatchHandler h = new NoopHandler(); - QwpBindSetter b = values -> { - // no-op - }; - q.sql("SELECT 1").binds(b).handler(h); - - // Flip the in-flight flag by setting done=false directly. - Field doneF = queryImplClass.getDeclaredField("done"); - doneF.setAccessible(true); - doneF.setBoolean(q, false); - - Method reset = queryImplClass.getDeclaredMethod("resetIfDone"); - reset.setAccessible(true); - reset.invoke(q); - - Field handlerF = queryImplClass.getDeclaredField("userHandler"); - Field bindsF = queryImplClass.getDeclaredField("userBinds"); - Field sqlBufF = queryImplClass.getDeclaredField("sqlBuffer"); - handlerF.setAccessible(true); - bindsF.setAccessible(true); - sqlBufF.setAccessible(true); - - Assert.assertSame("userHandler must survive resetIfDone() while a submit is in flight", - h, handlerF.get(q)); - Assert.assertSame("userBinds must survive resetIfDone() while a submit is in flight", - b, bindsF.get(q)); - CharSequence sqlBuffer = (CharSequence) sqlBufF.get(q); - Assert.assertEquals("sqlBuffer must survive resetIfDone() while a submit is in flight", - "SELECT 1", sqlBuffer.toString()); + public void testLeaseWrapsSamePooledQueryImpl() throws Exception { + TestUtils.assertMemoryLeak(() -> { + Class workerClass = Class.forName("io.questdb.client.impl.QueryWorker"); + Class poolClass = Class.forName("io.questdb.client.impl.QueryClientPool"); + Class clientClass = Class.forName("io.questdb.client.cutlass.qwp.client.QwpQueryClient"); + Class leaseClass = Class.forName("io.questdb.client.impl.QueryLease"); + + // lease() never dereferences the client or pool (it only resets the + // reused QueryImpl and stamps the current generation), so nulls are fine + // for this structure-only test -- mirrors the null-worker shortcut above. + Constructor ctor = workerClass.getDeclaredConstructor(clientClass, poolClass, int.class); + ctor.setAccessible(true); + Object worker = ctor.newInstance(null, null, 0); + + Method leaseM = workerClass.getDeclaredMethod("lease"); + leaseM.setAccessible(true); + Object leaseA = leaseM.invoke(worker); + Object leaseB = leaseM.invoke(worker); + + Assert.assertNotSame("each borrow must hand back a fresh lease wrapper", leaseA, leaseB); + + Field implF = leaseClass.getDeclaredField("impl"); + implF.setAccessible(true); + Assert.assertSame("both leases must wrap the same pooled QueryImpl (zero-allocation reuse of the heavy state)", + implF.get(leaseA), implF.get(leaseB)); + }); } private static final class NoopHandler implements QwpColumnBatchHandler { diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryLeaseGenerationTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryLeaseGenerationTest.java new file mode 100644 index 00000000..f878ccd0 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/impl/QueryLeaseGenerationTest.java @@ -0,0 +1,280 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.impl; + +import io.questdb.client.cutlass.qwp.client.QwpQueryClient; +import io.questdb.client.impl.QueryClientPool; +import io.questdb.client.impl.QueryWorker; +import io.questdb.client.test.tools.TestUtils; +import org.junit.Assert; +import org.junit.Test; + +import java.lang.reflect.Field; +import java.lang.reflect.Method; +import java.util.ArrayDeque; +import java.util.concurrent.CountDownLatch; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicReference; +import java.util.concurrent.locks.ReentrantLock; + +/** + * Regression tests for M1: a stale {@code Query} lease (held after close, or a + * cached {@code Completion}) must not disturb a later borrow of the same + * worker. The reused per-worker {@code QueryImpl} alone cannot distinguish a + * stale handle from a live one -- the fix stamps each borrow with a monotonic + * generation under the pool lock and validates it on close/cancel/release. + *

    + * These exercise the package-private internals by reflection (the same + * white-box style as the other tests in this package). They construct workers + * with a non-connected {@code newPlainText} client and never start the worker + * thread, so no network or I/O thread is involved. + */ +public class QueryLeaseGenerationTest { + + /** + * A stale {@code Completion.cancel()} (its lease long since released and the + * worker re-borrowed) must NOT reach the worker's client -- otherwise it + * would cancel whatever query the current borrower is running. We observe + * "reached the client" via the client's pending-cancel latch, which + * {@code QwpQueryClient.cancel()} sets first thing. + */ + @Test + public void testStaleCancelDoesNotReachClient() throws Exception { + TestUtils.assertMemoryLeak(() -> { + Class workerClass = Class.forName("io.questdb.client.impl.QueryWorker"); + Class queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl"); + Method bump = workerClass.getDeclaredMethod("bumpGeneration"); + bump.setAccessible(true); + Field queryF = workerClass.getDeclaredField("query"); + queryF.setAccessible(true); + Field doneF = queryImplClass.getDeclaredField("done"); + doneF.setAccessible(true); + Method cancel = queryImplClass.getDeclaredMethod("cancel", long.class); + cancel.setAccessible(true); + + // cancel(gen) validates the generation under the pool lock, so the + // worker needs a real pool to lock on (the worker thread is never + // started, so no network or I/O thread is involved). + QueryClientPool pool = new QueryClientPool( + "ws::addr=localhost:9000;", + /*minSize*/ 0, /*maxSize*/ 2, + /*acquireTimeoutMillis*/ 1_000L, + /*idleTimeoutMillis*/ Long.MAX_VALUE, + /*maxLifetimeMillis*/ Long.MAX_VALUE); + try { + // Live lease: generation 1 (one acquire), query in flight -> cancel(1) + // must reach the client. + try (QwpQueryClient live = QwpQueryClient.newPlainText("localhost", 9000)) { + QueryWorker w = new QueryWorker(live, pool, 0); + bump.invoke(w); // generation -> 1 (acquire stamp) + Object impl = queryF.get(w); + doneF.setBoolean(impl, false); // pretend a submit is in flight + cancel.invoke(impl, 1L); + Assert.assertTrue("cancel() on the live lease must reach the client", + live.isPendingCancelForTest()); + } + + // Stale lease: the worker was borrowed (gen 1), released and re-borrowed + // (gen now 3). A cancel from the old lease (gen 1) must be dropped, even + // though the current query is in flight. + try (QwpQueryClient reused = QwpQueryClient.newPlainText("localhost", 9000)) { + QueryWorker w = new QueryWorker(reused, pool, 0); + bump.invoke(w); // -> 1 (first acquire) + bump.invoke(w); // -> 2 (release) + bump.invoke(w); // -> 3 (second acquire by a new borrower) + Object impl = queryF.get(w); + doneF.setBoolean(impl, false); // the new borrower's query is in flight + cancel.invoke(impl, 1L); // stale lease cancels + Assert.assertFalse("a stale lease's cancel() must NOT reach the client and " + + "cancel a different borrower's in-flight query", + reused.isPendingCancelForTest()); + } + } finally { + pool.close(); + } + }); + } + + /** + * The TOCTOU the locked cancel closes: a cross-thread watchdog calls + * {@code cancel(gen)} while its lease is live, but the lease goes stale (the + * worker is released and re-borrowed) before the wire cancel fires. The + * cancel must re-validate the generation atomically with the cancel, under + * the pool lock, or it would abort the new borrower's query. + *

    + * Driven deterministically: the test thread holds the pool lock, so the + * watchdog's cancel parks inside the pool's generation re-check. We then + * advance the generation (release + re-borrow) under the lock and release + * it. The parked cancel must observe the new generation and drop. An + * unlocked check-then-cancel would not park, would pass its check at the + * still-live generation, and would fire the wire cancel. + */ + @Test + public void testConcurrentCancelDoesNotReachClientAfterReborrow() throws Exception { + TestUtils.assertMemoryLeak(() -> { + Method bump = QueryWorker.class.getDeclaredMethod("bumpGeneration"); + bump.setAccessible(true); + Field queryF = QueryWorker.class.getDeclaredField("query"); + queryF.setAccessible(true); + Class queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl"); + Field doneF = queryImplClass.getDeclaredField("done"); + doneF.setAccessible(true); + Method cancel = queryImplClass.getDeclaredMethod("cancel", long.class); + cancel.setAccessible(true); + Field poolLockF = QueryClientPool.class.getDeclaredField("lock"); + poolLockF.setAccessible(true); + + QueryClientPool pool = new QueryClientPool( + "ws::addr=localhost:9000;", + /*minSize*/ 0, /*maxSize*/ 2, + /*acquireTimeoutMillis*/ 1_000L, + /*idleTimeoutMillis*/ Long.MAX_VALUE, + /*maxLifetimeMillis*/ Long.MAX_VALUE); + QwpQueryClient client = QwpQueryClient.newPlainText("localhost", 9000); + try { + final QueryWorker w = new QueryWorker(client, pool, 0); + bump.invoke(w); // generation -> 1; the watchdog's lease captured 1 + final Object impl = queryF.get(w); + doneF.setBoolean(impl, false); // a query is in flight + + ReentrantLock poolLock = (ReentrantLock) poolLockF.get(pool); + final CountDownLatch atCancel = new CountDownLatch(1); + final CountDownLatch cancelReturned = new CountDownLatch(1); + final AtomicReference err = new AtomicReference<>(); + + // Hold the pool lock so the watchdog's cancel cannot finish its + // generation re-check + wire cancel until we let go. + poolLock.lock(); + Thread watchdog = new Thread(() -> { + atCancel.countDown(); + try { + cancel.invoke(impl, 1L); // lease generation captured at borrow = 1 + } catch (Throwable t) { + err.set(t); + } finally { + cancelReturned.countDown(); + } + }, "watchdog-cancel"); + watchdog.start(); + Assert.assertTrue("watchdog must start", atCancel.await(5, TimeUnit.SECONDS)); + + // With the locked cancel, cancel() parks on the pool lock and cannot + // return while we hold it. An unlocked check-then-cancel would have + // already fired the wire cancel and returned. + Assert.assertFalse("cancel() must re-check the generation under the pool " + + "lock, so it cannot complete while the lock is held", + cancelReturned.await(200, TimeUnit.MILLISECONDS)); + + // The lease goes stale underneath the parked cancel: released (-> 2) + // and re-borrowed by a new owner (-> 3). + bump.invoke(w); + bump.invoke(w); + poolLock.unlock(); + + Assert.assertTrue("cancel() must return once the pool lock is free", + cancelReturned.await(5, TimeUnit.SECONDS)); + if (err.get() != null) { + throw new AssertionError("cancel() threw", err.get()); + } + Assert.assertFalse("a cancel whose lease went stale while parked on the pool " + + "lock must NOT reach the client and abort the new borrower's query", + client.isPendingCancelForTest()); + } finally { + client.close(); + pool.close(); + } + }); + } + + /** + * The pool-wide blast radius of M1: a stale (duplicate / post-reborrow) + * release must never enqueue a worker that a live borrower owns, otherwise + * the worker sits in {@code available} twice and is handed to two borrowers + * at once. The generation captured at borrow time, re-checked under the pool + * lock, makes this impossible. + */ + @Test + @SuppressWarnings("unchecked") + public void testStaleReleaseDoesNotEnqueueWorkerTwice() throws Exception { + TestUtils.assertMemoryLeak(() -> { + Class poolClass = Class.forName("io.questdb.client.impl.QueryClientPool"); + Method release = poolClass.getDeclaredMethod("release", QueryWorker.class, long.class); + release.setAccessible(true); + Field availableF = poolClass.getDeclaredField("available"); + availableF.setAccessible(true); + Method bump = QueryWorker.class.getDeclaredMethod("bumpGeneration"); + bump.setAccessible(true); + Method generation = QueryWorker.class.getDeclaredMethod("generation"); + generation.setAccessible(true); + + QueryClientPool pool = new QueryClientPool( + "ws::addr=localhost:9000;", + /*minSize*/ 0, /*maxSize*/ 2, + /*acquireTimeoutMillis*/ 1_000L, + /*idleTimeoutMillis*/ Long.MAX_VALUE, + /*maxLifetimeMillis*/ Long.MAX_VALUE); + QwpQueryClient client = QwpQueryClient.newPlainText("localhost", 9000); + try { + ArrayDeque available = (ArrayDeque) availableF.get(pool); + QueryWorker w = new QueryWorker(client, pool, 0); + + // acquire #1 stamps generation 1; the lease (A) captures 1. + bump.invoke(w); + Assert.assertEquals(1L, generation.invoke(w)); + + // close A -> release(w, 1): matches, enqueues once. + release.invoke(pool, w, 1L); + Assert.assertEquals("valid release must enqueue the worker once", 1, available.size()); + + // close A again (duplicate, e.g. explicit close + try-with-resources) + // -> release(w, 1): generation already bumped to 2, so it is dropped. + release.invoke(pool, w, 1L); + Assert.assertEquals("duplicate release of the same lease must be dropped", + 1, available.size()); + + // acquire #2 hands the worker to a new borrower (B): pull it out and + // stamp generation 3. + available.pollFirst(); + bump.invoke(w); + Assert.assertEquals(3L, generation.invoke(w)); + + // A stray close from the long-dead lease A -> release(w, 1): dropped, + // so B's worker is NOT re-enqueued while B still owns it. + release.invoke(pool, w, 1L); + Assert.assertEquals("a post-reborrow stale release must NOT enqueue the " + + "worker while another borrower owns it", + 0, available.size()); + + // B's own close -> release(w, 3): matches, enqueues legitimately. + release.invoke(pool, w, 3L); + Assert.assertEquals("the current borrower's release must still work", + 1, available.size()); + } finally { + client.close(); + pool.close(); + } + }); + } +} diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryWorkerTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryWorkerTest.java index e9041448..0dd6ee75 100644 --- a/core/src/test/java/io/questdb/client/test/impl/QueryWorkerTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/QueryWorkerTest.java @@ -26,16 +26,37 @@ import io.questdb.client.Completion; import io.questdb.client.cutlass.qwp.client.QwpQueryClient; +import io.questdb.client.impl.QueryClientPool; import io.questdb.client.impl.QueryWorker; +import io.questdb.client.test.tools.TestUtils; import org.junit.Assert; import org.junit.Test; import java.lang.reflect.Constructor; import java.lang.reflect.Field; +import java.lang.reflect.InvocationTargetException; +import java.lang.reflect.Method; import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicBoolean; import java.util.concurrent.locks.Condition; import java.util.concurrent.locks.ReentrantLock; +/** + * Unit tests for {@link QueryWorker}. + *

    + * Coverage boundary: the lost-dispatch fix for the single-flight-reuse race + * (clearing {@code current} under {@code signalLock} at the moment of + * consumption rather than in a post-{@code runOn()} finally) has no + * deterministic unit reproduction here. Reproducing the clobber needs the + * worker to be mid-{@code runOn(client)} when the user thread re-dispatches on + * the same lease, which requires a live query client to drive + * {@code client.execute(...)} to its terminal callback. That regression is + * guarded end-to-end by {@code QuestDBFacadeE2ETest.testSustainedMixedConcurrency} + * in the parent questdb repo (more threads than pool slots, repeated + * submit/await per lease). {@link #testShutdownRacingDispatchMustNotStrandCaller()} + * below covers the adjacent but distinct shutdown-vs-dispatch branch only -- + * reverting the lost-dispatch hunk would not fail it. + */ public class QueryWorkerTest { /** @@ -44,14 +65,16 @@ public class QueryWorkerTest { * connect is needed; {@code newPlainText} only allocates the client. */ @Test - public void testClientGetterReturnsConstructorInstance() { - try (QwpQueryClient client = QwpQueryClient.newPlainText("localhost", 9000)) { - QueryWorker worker = new QueryWorker(client, null, 0); - Assert.assertSame("client() must return the instance passed to the constructor", - client, worker.client()); - // Idempotent across calls -- the field is final. - Assert.assertSame(worker.client(), worker.client()); - } + public void testClientGetterReturnsConstructorInstance() throws Exception { + TestUtils.assertMemoryLeak(() -> { + try (QwpQueryClient client = QwpQueryClient.newPlainText("localhost", 9000)) { + QueryWorker worker = new QueryWorker(client, null, 0); + Assert.assertSame("client() must return the instance passed to the constructor", + client, worker.client()); + // Idempotent across calls -- the field is final. + Assert.assertSame(worker.client(), worker.client()); + } + }); } /** @@ -68,97 +91,283 @@ public void testClientGetterReturnsConstructorInstance() { * state directly: it parks the worker on its condition, then takes the * worker's own {@code signalLock} and atomically sets both * {@code current} and {@code shuttingDown} before signalling. After the - * worker thread exits, the test asserts the {@link Completion} has been - * signalled. Today the assertion fails because the run loop's early - * return strands the {@code QueryImpl}. + * worker thread exits, the test asserts the {@code QueryImpl} was signalled + * to done. Without the fix the assertion fails because the run loop's early + * return strands the {@code QueryImpl} with {@code done==false}, so any + * caller blocked in {@code Completion.await()} would hang forever. */ @Test(timeout = 30_000) public void testShutdownRacingDispatchMustNotStrandCaller() throws Exception { - Class queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl"); - Class poolClass = Class.forName("io.questdb.client.impl.QueryClientPool"); - - Field lockF = QueryWorker.class.getDeclaredField("signalLock"); - Field condF = QueryWorker.class.getDeclaredField("signalCondition"); - Field currentF = QueryWorker.class.getDeclaredField("current"); - Field shuttingF = QueryWorker.class.getDeclaredField("shuttingDown"); - Field threadF = QueryWorker.class.getDeclaredField("thread"); - for (Field f : new Field[]{lockF, condF, currentF, shuttingF, threadF}) { - f.setAccessible(true); - } + TestUtils.assertMemoryLeak(() -> { + Class queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl"); + + Field lockF = QueryWorker.class.getDeclaredField("signalLock"); + Field condF = QueryWorker.class.getDeclaredField("signalCondition"); + Field currentF = QueryWorker.class.getDeclaredField("current"); + Field shuttingF = QueryWorker.class.getDeclaredField("shuttingDown"); + Field threadF = QueryWorker.class.getDeclaredField("thread"); + for (Field f : new Field[]{lockF, condF, currentF, shuttingF, threadF}) { + f.setAccessible(true); + } + + Field doneF = queryImplClass.getDeclaredField("done"); + Field unexpectedF = queryImplClass.getDeclaredField("unexpectedError"); + doneF.setAccessible(true); + unexpectedF.setAccessible(true); - Field doneF = queryImplClass.getDeclaredField("done"); - Field completionF = queryImplClass.getDeclaredField("completion"); - doneF.setAccessible(true); - completionF.setAccessible(true); - - // No QwpQueryClient is constructed here: runLoop exits at the - // shuttingDown check before reaching the first reference to - // {@code client} or {@code pool}, so passing null for both is fine - // and keeps the test cleanly isolated from any network or socket state. - QueryWorker worker = new QueryWorker(null, null, 0); - Thread t = (Thread) threadF.get(worker); - t.start(); - - ReentrantLock lock = (ReentrantLock) lockF.get(worker); - Condition cond = (Condition) condF.get(worker); - - // Wait until the worker thread is parked on its signalCondition. - long deadlineNanos = System.nanoTime() + TimeUnit.SECONDS.toNanos(5); - while (true) { - boolean parked; + // No QwpQueryClient is constructed here: runLoop exits at the + // shuttingDown check before reaching the first reference to + // {@code client} or {@code pool}, so passing null for both is fine + // and keeps the test cleanly isolated from any network or socket state. + QueryWorker worker = new QueryWorker(null, null, 0); + Thread t = (Thread) threadF.get(worker); + t.start(); + + ReentrantLock lock = (ReentrantLock) lockF.get(worker); + Condition cond = (Condition) condF.get(worker); + + // Wait until the worker thread is parked on its signalCondition. + long deadlineNanos = System.nanoTime() + TimeUnit.SECONDS.toNanos(5); + while (true) { + boolean parked; + lock.lock(); + try { + parked = lock.hasWaiters(cond); + } finally { + lock.unlock(); + } + if (parked) { + break; + } + if (System.nanoTime() > deadlineNanos) { + Assert.fail("worker thread never parked on its signalCondition"); + } + Thread.sleep(1); + } + + // Construct a QueryImpl with done=false, mimicking the state set up + // by QueryImpl.submit() just before it calls worker.dispatch(). + Constructor ctor = queryImplClass.getDeclaredConstructor(QueryWorker.class); + ctor.setAccessible(true); + Object queryImpl = ctor.newInstance(new Object[]{null}); + doneF.setBoolean(queryImpl, false); + + // Atomically force the racy state under the worker's own lock: + // current set AND shuttingDown set before the worker wakes. lock.lock(); try { - parked = lock.hasWaiters(cond); + currentF.set(worker, queryImpl); + shuttingF.setBoolean(worker, true); + cond.signalAll(); } finally { lock.unlock(); } - if (parked) { - break; - } - if (System.nanoTime() > deadlineNanos) { - Assert.fail("worker thread never parked on its signalCondition"); + + // The worker thread must exit (it has observed shuttingDown). + t.join(5_000); + Assert.assertFalse("worker thread did not exit after shuttingDown=true", + t.isAlive()); + + // The QueryImpl must have been signalled to done. Without the fix, + // done stays false because signalDone is never called, so a caller in + // Completion.await() would hang forever. The worker reaches the + // shutdown-race branch and calls signalUnexpected("QuestDB handle is + // closed"), which sets done=true and records the unexpected error. + Assert.assertTrue("BUG: QueryWorker.runLoop returned with shuttingDown=true " + + "while current!=null, never invoking runOn or signalUnexpected. " + + "The caller's Completion.await() hangs forever.", doneF.getBoolean(queryImpl)); + Assert.assertNotNull("signalUnexpected must record the closed-handle error", + unexpectedF.get(queryImpl)); + }); + } + + /** + * Busy-worker variant of the shutdown-drop race fixed in df6f7ca + * ({@code while (!shuttingDown)} -> {@code while (true)} in + * {@link QueryWorker}'s run loop). Unlike + * {@link #testShutdownRacingDispatchMustNotStrandCaller()} -- which only + * drives the PARKED-worker branch (worker blocked in + * {@code awaitUninterruptibly} before {@code shuttingDown} flips) and stays + * green even with the fix reverted -- this test forces the worker THROUGH a + * job's {@code runOn()} and then, on the worker thread at the exact instant + * that job returns, reproduces a reused lease re-dispatching + * ({@code current = q2}) racing a shutdown ({@code shuttingDown = true}), + * both set before the loop re-enters the strand check. + *

    + * With the fix the loop re-enters the {@code signalLock} block, observes + * {@code shuttingDown}, and strands q2 (signalling its caller). With the bug + * the loop exits at the top without re-reading {@code current}, so q2 is + * dropped -- never run, never signalled -- and its caller's + * {@code Completion.await()} would hang forever. The assertion on + * {@code q2.done} fails if the fix is reverted. + *

    + * The interleaving is made deterministic with a test-only worker-thread + * barrier ({@code QueryWorker.busyWorkerTestHook}) instead of a sleep: + * {@link QueryWorker} and {@code QueryImpl} are final and + * {@code QwpQueryClient} has no test seam, so pausing between + * {@code runOn()} and the loop check is the only race-free reproduction. + * {@code client}/{@code pool} are null -- {@code q1.runOn(null)} throws an + * NPE that {@code runLoop} catches and turns into q1's terminal signal, a + * fast stand-in for a real job returning from {@code runOn()}. + */ + @Test(timeout = 30_000) + public void testBusyWorkerShutdownStrandsReDispatchedCurrent() throws Exception { + TestUtils.assertMemoryLeak(() -> { + Class queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl"); + + Field lockF = QueryWorker.class.getDeclaredField("signalLock"); + Field currentF = QueryWorker.class.getDeclaredField("current"); + Field shuttingF = QueryWorker.class.getDeclaredField("shuttingDown"); + Field threadF = QueryWorker.class.getDeclaredField("thread"); + Field hookF = QueryWorker.class.getDeclaredField("busyWorkerTestHook"); + for (Field f : new Field[]{lockF, currentF, shuttingF, threadF, hookF}) { + f.setAccessible(true); } - Thread.sleep(1); - } - // Construct a QueryImpl with done=false, mimicking the state set up - // by QueryImpl.submit() just before it calls worker.dispatch(). - Constructor ctor = queryImplClass.getDeclaredConstructor(poolClass); - ctor.setAccessible(true); - Object queryImpl = ctor.newInstance(new Object[]{null}); - doneF.setBoolean(queryImpl, false); - Completion completion = (Completion) completionF.get(queryImpl); - - // Atomically force the racy state under the worker's own lock: - // current set AND shuttingDown set before the worker wakes. - lock.lock(); - try { - currentF.set(worker, queryImpl); - shuttingF.setBoolean(worker, true); - cond.signalAll(); - } finally { - lock.unlock(); - } + Field doneF = queryImplClass.getDeclaredField("done"); + Field unexpectedF = queryImplClass.getDeclaredField("unexpectedError"); + doneF.setAccessible(true); + unexpectedF.setAccessible(true); + + // client == null: q1.runOn(null) throws NPE, which runLoop catches and + // turns into q1's terminal signal -- a fast, deterministic stand-in for + // a real job returning from runOn(). pool == null is never touched here. + QueryWorker worker = new QueryWorker(null, null, 0); + + Constructor ctor = queryImplClass.getDeclaredConstructor(QueryWorker.class); + ctor.setAccessible(true); + Object q1 = ctor.newInstance(new Object[]{worker}); + Object q2 = ctor.newInstance(new Object[]{worker}); + doneF.setBoolean(q1, false); + doneF.setBoolean(q2, false); + + ReentrantLock lock = (ReentrantLock) lockF.get(worker); + AtomicBoolean fired = new AtomicBoolean(false); + + // The busy-worker barrier: the FIRST time the worker returns from a + // job's runOn(), simulate submit() -> dispatch() re-arming current with + // q2 while shutdown() flips shuttingDown -- both set, under signalLock, + // before the loop re-checks. Runs on the worker thread. + Runnable hook = () -> { + if (fired.compareAndSet(false, true)) { + lock.lock(); + try { + currentF.set(worker, q2); + shuttingF.setBoolean(worker, true); + } catch (IllegalAccessException e) { + throw new RuntimeException(e); + } finally { + lock.unlock(); + } + } + }; + hookF.set(worker, hook); + + // Pre-arm current with q1 so the worker consumes it immediately on + // start (no need to wait for the await park); start() establishes the + // happens-before that publishes current and the hook to the worker. + currentF.set(worker, q1); + + Thread t = (Thread) threadF.get(worker); + t.start(); + + t.join(5_000); + Assert.assertFalse("worker thread must exit after shuttingDown=true", t.isAlive()); + + Assert.assertTrue( + "BUG (df6f7ca regressed): the busy worker returned from runOn() with a " + + "re-dispatched current!=null and shuttingDown=true, then exited the loop " + + "without stranding it. q2 was never signalled; its caller's await() hangs " + + "forever.", + doneF.getBoolean(q2)); + Assert.assertNotNull("the stranded busy-path job must record the closed-handle error", + unexpectedF.get(q2)); + }); + } + + /** + * Result handlers (onBatch/onEnd/onError) run inline on the worker's + * dispatch thread. The blocking lease ops -- {@code close()} and the two + * {@code await()} variants -- would there wait on a terminal event that + * only this same thread can deliver, a permanent self-deadlock. The + * reentrancy guard must turn that into an immediate IllegalStateException. + *

    + * The guard compares {@code Thread.currentThread()} to the worker's + * dispatch thread, so this test points that field at the test thread (the + * worker is never started) to stand in for a reentrant in-handler call. + * Without the guard, {@code close()}/{@code await()} would park forever and + * the method-level timeout would fail the test. + */ + @Test(timeout = 30_000) + public void testCloseAndAwaitFromWorkerThreadThrowInsteadOfDeadlocking() throws Exception { + TestUtils.assertMemoryLeak(() -> { + Class queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl"); + Field queryF = QueryWorker.class.getDeclaredField("query"); + queryF.setAccessible(true); + Field threadF = QueryWorker.class.getDeclaredField("thread"); + threadF.setAccessible(true); + Field doneF = queryImplClass.getDeclaredField("done"); + doneF.setAccessible(true); + Method bump = QueryWorker.class.getDeclaredMethod("bumpGeneration"); + bump.setAccessible(true); + Method isWorker = QueryWorker.class.getDeclaredMethod("isCurrentThreadWorker"); + isWorker.setAccessible(true); + Method close = queryImplClass.getDeclaredMethod("close", long.class); + close.setAccessible(true); + Method awaitNoTimeout = queryImplClass.getDeclaredMethod("await", long.class); + awaitNoTimeout.setAccessible(true); + Method awaitTimed = queryImplClass.getDeclaredMethod("await", long.class, long.class, TimeUnit.class); + awaitTimed.setAccessible(true); + + QueryClientPool pool = new QueryClientPool( + "ws::addr=localhost:9000;", + /*minSize*/ 0, /*maxSize*/ 2, + /*acquireTimeoutMillis*/ 1_000L, + /*idleTimeoutMillis*/ Long.MAX_VALUE, + /*maxLifetimeMillis*/ Long.MAX_VALUE); + QwpQueryClient client = QwpQueryClient.newPlainText("localhost", 9000); + try { + QueryWorker w = new QueryWorker(client, pool, 0); + bump.invoke(w); // generation -> 1: a live lease + Object impl = queryF.get(w); + doneF.setBoolean(impl, false); // a submit is in flight, as during a handler + + // Off the worker thread the guard must NOT fire. + Assert.assertFalse("guard must not fire on a normal caller thread", + (Boolean) isWorker.invoke(w)); - // The worker thread must exit (it has observed shuttingDown). - t.join(5_000); - Assert.assertFalse("worker thread did not exit after shuttingDown=true", - t.isAlive()); + // Stand in for a reentrant call from inside a result handler: the + // guard compares Thread.currentThread() to the worker's dispatch + // thread, so point that field at this thread. + threadF.set(w, Thread.currentThread()); + Assert.assertTrue((Boolean) isWorker.invoke(w)); - // The Completion must have been signalled. Without the fix, await(2s) - // returns false because signalDone is never called. - boolean completed; + assertThrowsHandlerReentry("close", () -> close.invoke(impl, 1L)); + assertThrowsHandlerReentry("await", () -> awaitNoTimeout.invoke(impl, 1L)); + assertThrowsHandlerReentry("await(timeout)", + () -> awaitTimed.invoke(impl, 1L, 5L, TimeUnit.SECONDS)); + } finally { + client.close(); + pool.close(); + } + }); + } + + private static void assertThrowsHandlerReentry(String op, ReflectiveCall call) throws Exception { try { - completed = completion.await(2, TimeUnit.SECONDS); - } catch (RuntimeException expectedAfterFix) { - // Once fixed, the worker is expected to call signalUnexpected - // with a QueryException("QuestDB handle is closed") which - // await() rethrows. Either form of "completed" is acceptable; - // the bug is the silent hang. - completed = true; + call.run(); + Assert.fail(op + "() from the worker thread must throw, not block/deadlock"); + } catch (InvocationTargetException e) { + Throwable cause = e.getCause(); + Assert.assertTrue(op + "(): expected IllegalStateException, was " + cause, + cause instanceof IllegalStateException); + Assert.assertTrue(op + "(): message must point at cancel(), was: " + cause.getMessage(), + cause.getMessage().contains("cancel()")); } - Assert.assertTrue("BUG: QueryWorker.runLoop returned with shuttingDown=true " - + "while current!=null, never invoking runOn or signalUnexpected. " - + "The caller's Completion.await() hangs forever.", completed); + } + + @FunctionalInterface + private interface ReflectiveCall { + void run() throws Exception; } } diff --git a/core/src/test/java/io/questdb/client/test/impl/QuestDBImplErrorSafetyTest.java b/core/src/test/java/io/questdb/client/test/impl/QuestDBImplErrorSafetyTest.java index 93b10301..75f89c3a 100644 --- a/core/src/test/java/io/questdb/client/test/impl/QuestDBImplErrorSafetyTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/QuestDBImplErrorSafetyTest.java @@ -27,11 +27,10 @@ import io.questdb.client.Sender; import io.questdb.client.cutlass.qwp.client.QwpQueryClient; import io.questdb.client.impl.QuestDBImpl; +import io.questdb.client.test.tools.TestUtils; import org.junit.Assert; import org.junit.Test; -import java.lang.reflect.Constructor; -import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Proxy; import java.util.concurrent.atomic.AtomicBoolean; import java.util.function.Consumer; @@ -48,9 +47,9 @@ // // Sender is an interface, faked with a Proxy whose close() flips a flag, injected // via the SenderPool senderFactory seam. The connect Error is injected via the -// QueryClientPool connectHook seam. Both are passed through the package-private -// QuestDBImpl seam constructor (reached by reflection -- the main module is -// declared `open`); production callers pass null for both. +// QueryClientPool connectHook seam. Both are passed through the @TestOnly public +// QuestDBImpl seam constructor; production uses the public overload that passes +// null for both. public class QuestDBImplErrorSafetyTest { // Non-SF http config: the SenderPool factory replaces the build, but the @@ -67,25 +66,27 @@ public class QuestDBImplErrorSafetyTest { // delegate's close() runs. @Test(timeout = 30_000) public void ctorClosesBuiltSenderPoolWhenQueryPoolConstructionThrowsError() throws Exception { - AtomicBoolean senderClosed = new AtomicBoolean(false); - // senderMin = 1 -> SenderPool prewarms one observable delegate. - IntFunction senderFactory = slotIndex -> fakeSender(senderClosed); - // queryMin = 1 -> QueryClientPool prewarm reaches connect(), which throws. - Consumer connectHook = client -> { - throw new AssertionError("injected native connect failure"); - }; + TestUtils.assertMemoryLeak(() -> { + AtomicBoolean senderClosed = new AtomicBoolean(false); + // senderMin = 1 -> SenderPool prewarms one observable delegate. + IntFunction senderFactory = slotIndex -> fakeSender(senderClosed); + // queryMin = 1 -> QueryClientPool prewarm reaches connect(), which throws. + Consumer connectHook = client -> { + throw new AssertionError("injected native connect failure"); + }; - try { - newQuestDB(senderFactory, connectHook); - Assert.fail("expected QuestDBImpl construction to propagate the injected Error"); - } catch (Throwable expected) { - // expected -- construction aborts - } + try { + newQuestDB(senderFactory, connectHook); + Assert.fail("expected QuestDBImpl construction to propagate the injected Error"); + } catch (Throwable expected) { + // expected -- construction aborts + } - Assert.assertTrue( - "QuestDBImpl ctor leaked the already-built SenderPool on an Error from " - + "QueryClientPool construction: the prewarmed delegate's close() was never called", - senderClosed.get()); + Assert.assertTrue( + "QuestDBImpl ctor leaked the already-built SenderPool on an Error from " + + "QueryClientPool construction: the prewarmed delegate's close() was never called", + senderClosed.get()); + }); } private static Sender fakeSender(AtomicBoolean closedFlag) { @@ -122,33 +123,15 @@ private static Sender fakeSender(AtomicBoolean closedFlag) { private static QuestDBImpl newQuestDB( IntFunction senderFactory, Consumer connectHook - ) throws Exception { - Constructor c = QuestDBImpl.class.getDeclaredConstructor( - String.class, String.class, int.class, int.class, int.class, int.class, - long.class, long.class, long.class, long.class, - IntFunction.class, Consumer.class); - c.setAccessible(true); - try { - return c.newInstance( - SENDER_CFG, QUERY_CFG, - /*senderMin*/ 1, /*senderMax*/ 1, - /*queryMin*/ 1, /*queryMax*/ 1, - /*acquireTimeoutMillis*/ 250L, - /*idleTimeoutMillis*/ Long.MAX_VALUE, - /*maxLifetimeMillis*/ Long.MAX_VALUE, - /*housekeeperIntervalMillis*/ Long.MAX_VALUE, - senderFactory, connectHook); - } catch (InvocationTargetException e) { - // Unwrap so the caller sees the real construction failure (Error or - // RuntimeException), matching a direct constructor invocation. - Throwable cause = e.getCause(); - if (cause instanceof RuntimeException) { - throw (RuntimeException) cause; - } - if (cause instanceof Error) { - throw (Error) cause; - } - throw e; - } + ) { + return new QuestDBImpl( + SENDER_CFG, QUERY_CFG, + /*senderMin*/ 1, /*senderMax*/ 1, + /*queryMin*/ 1, /*queryMax*/ 1, + /*acquireTimeoutMillis*/ 250L, + /*idleTimeoutMillis*/ Long.MAX_VALUE, + /*maxLifetimeMillis*/ Long.MAX_VALUE, + /*housekeeperIntervalMillis*/ Long.MAX_VALUE, + senderFactory, connectHook); } } diff --git a/core/src/test/java/io/questdb/client/test/impl/QwpConfigKeysTest.java b/core/src/test/java/io/questdb/client/test/impl/QwpConfigKeysTest.java index b0706189..526f1d74 100644 --- a/core/src/test/java/io/questdb/client/test/impl/QwpConfigKeysTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/QwpConfigKeysTest.java @@ -28,6 +28,7 @@ import io.questdb.client.cutlass.qwp.client.QwpQueryClient; import io.questdb.client.impl.ConfigSchema; import io.questdb.client.impl.ConfigView; +import io.questdb.client.test.tools.TestUtils; import org.junit.Assert; import org.junit.Test; @@ -40,71 +41,81 @@ public class QwpConfigKeysTest { @Test - public void testEverySchemaKeyIsRecognizedByBothClients() { - for (ConfigSchema.KeySpec spec : ConfigSchema.all()) { - String cfg = "ws::addr=h:9000;" + spec.name() + "=" + sampleValue(spec) + ";"; - // A key may still fail a cross-key or range check; it must NOT fail - // as an unknown key -- that would mean it is missing from the - // registry (or that a consumer rejects a key it should ignore). - assertNotUnknown(spec.name(), () -> Sender.builder(cfg)); - assertNotUnknown(spec.name(), () -> QwpQueryClient.fromConfig(cfg).close()); - } + public void testEverySchemaKeyIsRecognizedByBothClients() throws Exception { + TestUtils.assertMemoryLeak(() -> { + for (ConfigSchema.KeySpec spec : ConfigSchema.all()) { + String cfg = "ws::addr=h:9000;" + spec.name() + "=" + sampleValue(spec) + ";"; + // A key may still fail a cross-key or range check; it must NOT fail + // as an unknown key -- that would mean it is missing from the + // registry (or that a consumer rejects a key it should ignore). + assertNotUnknown(spec.name(), () -> Sender.builder(cfg)); + assertNotUnknown(spec.name(), () -> QwpQueryClient.fromConfig(cfg).close()); + } + }); } @Test - public void testJunkKeyRejectedOnBoth() { - assertRejected("ws::addr=h:9000;not_a_real_key=foo;", - "unknown configuration key: not_a_real_key"); + public void testJunkKeyRejectedOnBoth() throws Exception { + TestUtils.assertMemoryLeak(() -> { + assertRejected("ws::addr=h:9000;not_a_real_key=foo;", + "unknown configuration key: not_a_real_key"); + }); } @Test - public void testLegacyKeysRejectedWithHintOnBoth() { - String legacyHint = "(applies to legacy http/tcp/udp transports only)"; - assertRejected("ws::addr=h:9000;init_buf_size=1024;", - "unknown configuration key: init_buf_size", legacyHint); - assertRejected("ws::addr=h:9000;max_buf_size=1024;", - "unknown configuration key: max_buf_size", legacyHint); - assertRejected("ws::addr=h:9000;request_timeout=1000;", - "unknown configuration key: request_timeout", legacyHint); - assertRejected("ws::addr=h:9000;request_min_throughput=1000;", - "unknown configuration key: request_min_throughput", legacyHint); - assertRejected("ws::addr=h:9000;max_datagram_size=1400;", - "unknown configuration key: max_datagram_size", legacyHint); - assertRejected("ws::addr=h:9000;multicast_ttl=4;", - "unknown configuration key: multicast_ttl", legacyHint); - assertRejected("ws::addr=h:9000;retry_timeout=1000;", - "unknown configuration key: retry_timeout", "(use reconnect_max_duration_millis on ws/wss)"); - assertRejected("ws::addr=h:9000;protocol_version=2;", - "unknown configuration key: protocol_version", "(QWP negotiates the protocol version during the WebSocket upgrade)"); + public void testLegacyKeysRejectedWithHintOnBoth() throws Exception { + TestUtils.assertMemoryLeak(() -> { + String legacyHint = "(applies to legacy http/tcp/udp transports only)"; + assertRejected("ws::addr=h:9000;init_buf_size=1024;", + "unknown configuration key: init_buf_size", legacyHint); + assertRejected("ws::addr=h:9000;max_buf_size=1024;", + "unknown configuration key: max_buf_size", legacyHint); + assertRejected("ws::addr=h:9000;request_timeout=1000;", + "unknown configuration key: request_timeout", legacyHint); + assertRejected("ws::addr=h:9000;request_min_throughput=1000;", + "unknown configuration key: request_min_throughput", legacyHint); + assertRejected("ws::addr=h:9000;max_datagram_size=1400;", + "unknown configuration key: max_datagram_size", legacyHint); + assertRejected("ws::addr=h:9000;multicast_ttl=4;", + "unknown configuration key: multicast_ttl", legacyHint); + assertRejected("ws::addr=h:9000;retry_timeout=1000;", + "unknown configuration key: retry_timeout", "(use reconnect_max_duration_millis on ws/wss)"); + assertRejected("ws::addr=h:9000;protocol_version=2;", + "unknown configuration key: protocol_version", "(QWP negotiates the protocol version during the WebSocket upgrade)"); + }); } @Test - public void testRelocatedHintTableIsExactlyTheLegacyKeys() { - String legacyHint = "(applies to legacy http/tcp/udp transports only)"; - Assert.assertEquals(legacyHint, ConfigView.relocatedHint("init_buf_size")); - Assert.assertEquals(legacyHint, ConfigView.relocatedHint("max_buf_size")); - Assert.assertEquals(legacyHint, ConfigView.relocatedHint("request_timeout")); - Assert.assertEquals(legacyHint, ConfigView.relocatedHint("request_min_throughput")); - Assert.assertEquals(legacyHint, ConfigView.relocatedHint("max_datagram_size")); - Assert.assertEquals(legacyHint, ConfigView.relocatedHint("multicast_ttl")); - Assert.assertEquals("(use reconnect_max_duration_millis on ws/wss)", ConfigView.relocatedHint("retry_timeout")); - Assert.assertEquals("(QWP negotiates the protocol version during the WebSocket upgrade)", ConfigView.relocatedHint("protocol_version")); + public void testRelocatedHintTableIsExactlyTheLegacyKeys() throws Exception { + TestUtils.assertMemoryLeak(() -> { + String legacyHint = "(applies to legacy http/tcp/udp transports only)"; + Assert.assertEquals(legacyHint, ConfigView.relocatedHint("init_buf_size")); + Assert.assertEquals(legacyHint, ConfigView.relocatedHint("max_buf_size")); + Assert.assertEquals(legacyHint, ConfigView.relocatedHint("request_timeout")); + Assert.assertEquals(legacyHint, ConfigView.relocatedHint("request_min_throughput")); + Assert.assertEquals(legacyHint, ConfigView.relocatedHint("max_datagram_size")); + Assert.assertEquals(legacyHint, ConfigView.relocatedHint("multicast_ttl")); + Assert.assertEquals("(use reconnect_max_duration_millis on ws/wss)", ConfigView.relocatedHint("retry_timeout")); + Assert.assertEquals("(QWP negotiates the protocol version during the WebSocket upgrade)", ConfigView.relocatedHint("protocol_version")); - // No registry key (including POOL keys) carries a relocated hint. - for (ConfigSchema.KeySpec spec : ConfigSchema.all()) { - Assert.assertNull("registry key '" + spec.name() + "' must not be in the hint table", - ConfigView.relocatedHint(spec.name())); - } - // ECDSA keys are plain unknowns (only the C client handles them). - Assert.assertNull(ConfigView.relocatedHint("token_x")); - Assert.assertNull(ConfigView.relocatedHint("token_y")); - Assert.assertNull(ConfigView.relocatedHint("not_a_real_key")); + // No registry key (including POOL keys) carries a relocated hint. + for (ConfigSchema.KeySpec spec : ConfigSchema.all()) { + Assert.assertNull("registry key '" + spec.name() + "' must not be in the hint table", + ConfigView.relocatedHint(spec.name())); + } + // ECDSA keys are plain unknowns (only the C client handles them). + Assert.assertNull(ConfigView.relocatedHint("token_x")); + Assert.assertNull(ConfigView.relocatedHint("token_y")); + Assert.assertNull(ConfigView.relocatedHint("not_a_real_key")); + }); } @Test - public void testTokenXYRejectedWithoutHintOnBoth() { - assertRejectedNoHint("ws::addr=h:9000;token_x=abc;", "token_x"); - assertRejectedNoHint("ws::addr=h:9000;token_y=def;", "token_y"); + public void testTokenXYRejectedWithoutHintOnBoth() throws Exception { + TestUtils.assertMemoryLeak(() -> { + assertRejectedNoHint("ws::addr=h:9000;token_x=abc;", "token_x"); + assertRejectedNoHint("ws::addr=h:9000;token_y=def;", "token_y"); + }); } private static void assertNotUnknown(String key, Runnable action) { diff --git a/core/src/test/java/io/questdb/client/test/impl/QwpQueryClientConfigHonoredTest.java b/core/src/test/java/io/questdb/client/test/impl/QwpQueryClientConfigHonoredTest.java index c5c5edb7..00313004 100644 --- a/core/src/test/java/io/questdb/client/test/impl/QwpQueryClientConfigHonoredTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/QwpQueryClientConfigHonoredTest.java @@ -28,6 +28,7 @@ import io.questdb.client.cutlass.qwp.client.QwpQueryClient; import io.questdb.client.impl.ConfigSchema; import io.questdb.client.impl.Side; +import io.questdb.client.test.tools.TestUtils; import org.junit.Assert; import org.junit.Test; @@ -51,59 +52,62 @@ public class QwpQueryClientConfigHonoredTest { private final Set honored = new HashSet<>(); @Test - public void testEveryEgressKeyIsHonored() { - assertHonored("target=primary", "target", "primary"); - assertHonored("failover=off", "failover", false); - assertHonored("failover_max_attempts=9", "failover_max_attempts", 9); - assertHonored("failover_backoff_initial_ms=120", "failover_backoff_initial_ms", 120L); - assertHonored("failover_backoff_max_ms=99999", "failover_backoff_max_ms", 99999L); - assertHonored("failover_max_duration_ms=56000", "failover_max_duration_ms", 56000L); - assertHonored("max_batch_rows=512", "max_batch_rows", 512); - assertHonored("initial_credit=65536", "initial_credit", 65536L); - assertHonored("buffer_pool_size=3", "buffer_pool_size", 3); - assertHonored("compression=zstd", "compression", "zstd"); - assertHonored("compression_level=9", "compression_level", 9); - assertHonored("client_id=probe/1.0", "client_id", "probe/1.0"); - assertHonored("zone=us-east", "zone", "us-east"); - // COMMON applied by egress. - assertHonored("auth_timeout_ms=7777", "auth_timeout_ms", 7777L); + public void testEveryEgressKeyIsHonored() throws Exception { + TestUtils.assertMemoryLeak(() -> { + assertHonored("target=primary", "target", "primary"); + assertHonored("failover=off", "failover", false); + assertHonored("failover_max_attempts=9", "failover_max_attempts", 9); + assertHonored("failover_backoff_initial_ms=120", "failover_backoff_initial_ms", 120L); + assertHonored("failover_backoff_max_ms=99999", "failover_backoff_max_ms", 99999L); + assertHonored("failover_max_duration_ms=56000", "failover_max_duration_ms", 56000L); + assertHonored("max_batch_rows=512", "max_batch_rows", 512); + assertHonored("initial_credit=65536", "initial_credit", 65536L); + assertHonored("buffer_pool_size=3", "buffer_pool_size", 3); + assertHonored("compression=zstd", "compression", "zstd"); + assertHonored("compression_level=9", "compression_level", 9); + assertHonored("client_id=probe/1.0", "client_id", "probe/1.0"); + assertHonored("zone=us-east", "zone", "us-east"); + // COMMON applied by egress. + assertHonored("auth_timeout_ms=7777", "auth_timeout_ms", 7777L); + assertHonored("connect_timeout=6000", "connect_timeout", 6000); - // Credentials become the Authorization header, including the user/pass aliases. - String basic = "Basic " + Base64.getEncoder() - .encodeToString("alice:secret".getBytes(StandardCharsets.UTF_8)); - Assert.assertEquals(basic, snapshot("ws::addr=h:9000;username=alice;password=secret;").get("authorization_header")); - Assert.assertEquals(basic, snapshot("ws::addr=h:9000;user=alice;pass=secret;").get("authorization_header")); - Assert.assertEquals("Bearer ey.abc", snapshot("ws::addr=h:9000;token=ey.abc;").get("authorization_header")); - markHonored("username", "password", "token"); + // Credentials become the Authorization header, including the user/pass aliases. + String basic = "Basic " + Base64.getEncoder() + .encodeToString("alice:secret".getBytes(StandardCharsets.UTF_8)); + Assert.assertEquals(basic, snapshot("ws::addr=h:9000;username=alice;password=secret;").get("authorization_header")); + Assert.assertEquals(basic, snapshot("ws::addr=h:9000;user=alice;pass=secret;").get("authorization_header")); + Assert.assertEquals("Bearer ey.abc", snapshot("ws::addr=h:9000;token=ey.abc;").get("authorization_header")); + markHonored("username", "password", "token"); - // COMMON TLS keys applied by egress (require the wss schema). tls_verify - // drives the validation mode; tls_roots/tls_roots_password set the trust - // store. All three read back from the snapshot. - Assert.assertEquals(ClientTlsConfiguration.TLS_VALIDATION_MODE_NONE, - snapshot("wss::addr=h:9000;tls_verify=unsafe_off;").get("tls_verify")); - Map tls = snapshot("wss::addr=h:9000;tls_roots=/ca.p12;tls_roots_password=pw;"); - Assert.assertEquals("/ca.p12", tls.get("tls_roots")); - Assert.assertEquals("pw", tls.get("tls_roots_password")); - markHonored("tls_verify", "tls_roots", "tls_roots_password"); + // COMMON TLS keys applied by egress (require the wss schema). tls_verify + // drives the validation mode; tls_roots/tls_roots_password set the trust + // store. All three read back from the snapshot. + Assert.assertEquals(ClientTlsConfiguration.TLS_VALIDATION_MODE_NONE, + snapshot("wss::addr=h:9000;tls_verify=unsafe_off;").get("tls_verify")); + Map tls = snapshot("wss::addr=h:9000;tls_roots=/ca.p12;tls_roots_password=pw;"); + Assert.assertEquals("/ca.p12", tls.get("tls_roots")); + Assert.assertEquals("pw", tls.get("tls_roots_password")); + markHonored("tls_verify", "tls_roots", "tls_roots_password"); - // Drift guard: every egress-applied registry key must have an assertion - // above. The honored set is populated by the assertions themselves, so - // deleting one trips this -- unlike a hand-maintained list, it cannot - // silently drift from what is actually asserted. - for (ConfigSchema.KeySpec spec : ConfigSchema.all()) { - if (!spec.name().equals(spec.canonical())) { - continue; // alias (user/pass) -- covered via its canonical key + // Drift guard: every egress-applied registry key must have an assertion + // above. The honored set is populated by the assertions themselves, so + // deleting one trips this -- unlike a hand-maintained list, it cannot + // silently drift from what is actually asserted. + for (ConfigSchema.KeySpec spec : ConfigSchema.all()) { + if (!spec.name().equals(spec.canonical())) { + continue; // alias (user/pass) -- covered via its canonical key + } + // The egress client applies its own EGRESS keys plus the COMMON keys + // (credentials, TLS, auth_timeout_ms). addr is the endpoint list (the + // connection target), not a snapshot value, so it is excluded. + boolean egressApplied = spec.side() == Side.EGRESS + || (spec.side() == Side.COMMON && !spec.name().equals("addr")); + if (egressApplied) { + Assert.assertTrue("registry egress key '" + spec.name() + "' has no honored assertion", + honored.contains(spec.name())); + } } - // The egress client applies its own EGRESS keys plus the COMMON keys - // (credentials, TLS, auth_timeout_ms). addr is the endpoint list (the - // connection target), not a snapshot value, so it is excluded. - boolean egressApplied = spec.side() == Side.EGRESS - || (spec.side() == Side.COMMON && !spec.name().equals("addr")); - if (egressApplied) { - Assert.assertTrue("registry egress key '" + spec.name() + "' has no honored assertion", - honored.contains(spec.name())); - } - } + }); } private void assertHonored(String kv, String snapKey, Object expected) { diff --git a/core/src/test/java/io/questdb/client/test/impl/SenderLeaseGenerationTest.java b/core/src/test/java/io/questdb/client/test/impl/SenderLeaseGenerationTest.java new file mode 100644 index 00000000..7b5b627a --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/impl/SenderLeaseGenerationTest.java @@ -0,0 +1,152 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.impl; + +import io.questdb.client.Sender; +import io.questdb.client.impl.PooledSender; +import io.questdb.client.impl.SenderPool; +import io.questdb.client.test.tools.TestUtils; +import org.junit.Assert; +import org.junit.Test; + +import java.lang.reflect.Constructor; +import java.lang.reflect.Field; +import java.lang.reflect.Method; +import java.util.ArrayDeque; + +/** + * Ingest-side mirror of {@code QueryLeaseGenerationTest}: a stale pooled-Sender + * handle (held after close, with the slot since re-borrowed) must not disturb a + * later borrow of the same slot. {@code PooledSender} is now a fresh per-borrow + * wrapper carrying the lease generation; the reused {@code SenderSlot} validates + * it under the pool lock so a stale close/write is dropped. + *

    + * Reaches package-private internals by reflection (same white-box style as the + * other tests here); {@code SenderSlot} is constructed with a {@code null} + * delegate, which the paths under test never dereference. + */ +public class SenderLeaseGenerationTest { + + private static final String DEAD_HTTP_CONFIG = + "http::addr=127.0.0.1:1;protocol_version=2;auto_flush=off;"; + + /** + * The pool-wide blast radius: a stale (duplicate / post-reborrow) close must + * never enqueue a slot a live borrower owns, or two borrowers would write + * into one delegate's buffer at once. {@code giveBack} validates the lease + * generation under the pool lock, so this is impossible. + */ + @Test + @SuppressWarnings("unchecked") + public void testStaleGiveBackDoesNotEnqueueSlotTwice() throws Exception { + TestUtils.assertMemoryLeak(() -> { + Class slotClass = Class.forName("io.questdb.client.impl.SenderSlot"); + Constructor slotCtor = slotClass.getDeclaredConstructor(Sender.class, SenderPool.class, int.class); + slotCtor.setAccessible(true); + Method bump = slotClass.getDeclaredMethod("bumpGeneration"); + bump.setAccessible(true); + Method generation = slotClass.getDeclaredMethod("generation"); + generation.setAccessible(true); + Constructor leaseCtor = + PooledSender.class.getDeclaredConstructor(slotClass, long.class); + leaseCtor.setAccessible(true); + Field availableF = SenderPool.class.getDeclaredField("available"); + availableF.setAccessible(true); + + try (SenderPool pool = new SenderPool( + DEAD_HTTP_CONFIG, /*minSize*/ 0, /*maxSize*/ 2, + /*acquireTimeoutMillis*/ 1_000L, + /*idleTimeoutMillis*/ Long.MAX_VALUE, + /*maxLifetimeMillis*/ Long.MAX_VALUE)) { + ArrayDeque available = (ArrayDeque) availableF.get(pool); + Object slot = slotCtor.newInstance(null, pool, -1); + + // borrow #1 stamps generation 1; lease A captures 1. + bump.invoke(slot); + Assert.assertEquals(1L, generation.invoke(slot)); + PooledSender leaseA = leaseCtor.newInstance(slot, 1L); + + // close A -> giveBack(A): matches, enqueues once. + pool.giveBack(leaseA); + Assert.assertEquals("valid close must enqueue the slot once", 1, available.size()); + + // duplicate close A (e.g. explicit close + try-with-resources) + // -> giveBack(A): generation already bumped to 2, so it is dropped. + pool.giveBack(leaseA); + Assert.assertEquals("duplicate close of the same lease must be dropped", + 1, available.size()); + + // borrow #2 hands the slot to a new borrower B: pull it out, stamp 3. + available.pollFirst(); + bump.invoke(slot); + Assert.assertEquals(3L, generation.invoke(slot)); + PooledSender leaseB = leaseCtor.newInstance(slot, 3L); + + // A stray close from the long-dead lease A -> dropped, so B's slot is + // NOT re-enqueued while B still owns it. + pool.giveBack(leaseA); + Assert.assertEquals("a post-reborrow stale close must NOT enqueue the slot " + + "while another borrower owns it", 0, available.size()); + + // B's own close -> giveBack(B): matches, enqueues legitimately. + pool.giveBack(leaseB); + Assert.assertEquals("the current borrower's close must still work", + 1, available.size()); + } + }); + } + + /** + * A stale lease's data write must be rejected (not silently land in a slot a + * later borrower now owns). The generation guard in + * {@code SenderSlot.live()} throws before the delegate is touched. + */ + @Test + public void testStaleWriteIsRejected() throws Exception { + TestUtils.assertMemoryLeak(() -> { + Class slotClass = Class.forName("io.questdb.client.impl.SenderSlot"); + Constructor slotCtor = slotClass.getDeclaredConstructor(Sender.class, SenderPool.class, int.class); + slotCtor.setAccessible(true); + Method bump = slotClass.getDeclaredMethod("bumpGeneration"); + bump.setAccessible(true); + Constructor leaseCtor = + PooledSender.class.getDeclaredConstructor(slotClass, long.class); + leaseCtor.setAccessible(true); + + Object slot = slotCtor.newInstance(null, null, -1); + bump.invoke(slot); // generation -> 1, lease A captures 1 + PooledSender leaseA = leaseCtor.newInstance(slot, 1L); + bump.invoke(slot); // released + bump.invoke(slot); // re-borrowed -> generation 3 + + try { + leaseA.table("x"); + Assert.fail("a stale lease's write must throw, not reach the re-borrowed slot"); + } catch (IllegalStateException expected) { + Assert.assertTrue(expected.getMessage(), expected.getMessage().contains("closed")); + } + }); + } +} diff --git a/core/src/test/java/io/questdb/client/test/impl/SenderPoolErrorSafetyTest.java b/core/src/test/java/io/questdb/client/test/impl/SenderPoolErrorSafetyTest.java index b7b56e7a..81055bb6 100644 --- a/core/src/test/java/io/questdb/client/test/impl/SenderPoolErrorSafetyTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/SenderPoolErrorSafetyTest.java @@ -25,11 +25,13 @@ package io.questdb.client.test.impl; import io.questdb.client.Sender; +import io.questdb.client.impl.PooledSender; import io.questdb.client.impl.SenderPool; +import io.questdb.client.test.tools.TestUtils; import org.junit.Assert; import org.junit.Test; -import java.lang.reflect.Constructor; +import java.lang.reflect.Field; import java.lang.reflect.Proxy; import java.nio.file.Paths; import java.util.concurrent.atomic.AtomicBoolean; @@ -58,25 +60,27 @@ public class SenderPoolErrorSafetyTest { // GREEN: catch (Throwable) -> the cleanup loop closes the 1st delegate. @Test(timeout = 30_000) public void preWarmClosesBuiltDelegatesWhenBuildThrowsError() throws Exception { - AtomicBoolean firstClosed = new AtomicBoolean(false); - AtomicInteger calls = new AtomicInteger(); - IntFunction factory = slotIndex -> { - if (calls.incrementAndGet() >= 2) { - throw new AssertionError("injected native build failure"); + TestUtils.assertMemoryLeak(() -> { + AtomicBoolean firstClosed = new AtomicBoolean(false); + AtomicInteger calls = new AtomicInteger(); + IntFunction factory = slotIndex -> { + if (calls.incrementAndGet() >= 2) { + throw new AssertionError("injected native build failure"); + } + return fakeSender(firstClosed); + }; + + try { + newPool(CFG, 2, 2, 250, factory); + Assert.fail("expected prewarm to propagate the injected Error"); + } catch (Throwable expected) { + // expected -- construction aborts } - return fakeSender(firstClosed); - }; - - try { - newPool(CFG, 2, 2, 250, factory); - Assert.fail("expected prewarm to propagate the injected Error"); - } catch (Throwable expected) { - // expected -- construction aborts - } - - Assert.assertTrue( - "prewarm leaked an already-built delegate: its close() was never called on an Error", - firstClosed.get()); + + Assert.assertTrue( + "prewarm leaked an already-built delegate: its close() was never called on an Error", + firstClosed.get()); + }); } // Companion to the catch (RuntimeException) -> track-normal-completion fix in @@ -92,31 +96,46 @@ public void preWarmClosesBuiltDelegatesWhenBuildThrowsError() throws Exception { // discardBroken() -> the next borrow() builds a fresh wrapper. @Test(timeout = 30_000) public void flushErrorDiscardsBrokenSenderInsteadOfRecycling() throws Exception { - IntFunction factory = slotIndex -> flushThrowingSender(); + TestUtils.assertMemoryLeak(() -> { + IntFunction factory = slotIndex -> flushThrowingSender(); - try (SenderPool pool = newPool(CFG, 1, 1, 1_000, factory)) { - Sender first = pool.borrow(); - try { - first.close(); - Assert.fail("close() must propagate the Error thrown by flush()"); - } catch (AssertionError expected) { - // expected: the original throwable propagates naturally - } + try (SenderPool pool = newPool(CFG, 1, 1, 1_000, factory)) { + Sender first = pool.borrow(); + // Capture the underlying slot before close(): borrow() always hands + // out a FRESH PooledSender wrapper, so assertNotSame(first, second) + // on the wrappers is vacuously true and proves nothing -- it stays + // true whether or not the broken slot was discarded. The pool + // recycles slots, not wrappers, so a broken slot leaking back to + // the next borrower shows up as the SAME slot. Assert on the slot. + Object firstSlot = slotOf(first); + try { + first.close(); + Assert.fail("close() must propagate the Error thrown by flush()"); + } catch (AssertionError expected) { + // expected: the original throwable propagates naturally + } - Sender second = pool.borrow(); - try { - Assert.assertNotSame( - "a sender whose flush() exited with an Error must be discarded, not recycled", - first, second); - } finally { - // second's flush() also throws on close(); swallow on teardown. + Sender second = pool.borrow(); try { - second.close(); - } catch (AssertionError ignored) { - // expected + Assert.assertNotSame( + "a sender whose flush() exited with an Error must be discarded, not recycled", + firstSlot, slotOf(second)); + } finally { + // second's flush() also throws on close(); swallow on teardown. + try { + second.close(); + } catch (AssertionError ignored) { + // expected + } } } - } + }); + } + + private static Object slotOf(Sender pooledWrapper) throws Exception { + Field f = PooledSender.class.getDeclaredField("slot"); + f.setAccessible(true); + return f.get(pooledWrapper); } // Like fakeSender(), but flush() throws an Error to drive the @@ -173,42 +192,44 @@ private static Sender flushThrowingSender() { // succeeds, proving capacity survived the failed grow. @Test(timeout = 30_000) public void borrowReleasesSfSlotIndexWhenCreationFails() throws Exception { - // Unique, non-existent sf_dir: minSize=0 means no pre-warm, so the dir - // is never created and the constructor's startup SF recovery is a no-op. - // The factory replaces createUnlocked(), so localhost:1 is never dialed. - String sfDir = Paths.get(System.getProperty("java.io.tmpdir"), - "qdb-sf-borrowfail-" + System.nanoTime()).toString(); - String sfCfg = "ws::addr=localhost:1;sf_dir=" + sfDir + ";"; - - AtomicInteger calls = new AtomicInteger(); - IntFunction factory = slotIndex -> { - // First borrow-triggered build fails (the slot index reserved for - // it must be released); later builds succeed. - if (calls.getAndIncrement() == 0) { - throw new AssertionError("injected native build failure on first grow"); - } - return fakeSender(new AtomicBoolean()); - }; + TestUtils.assertMemoryLeak(() -> { + // Unique, non-existent sf_dir: minSize=0 means no pre-warm, so the dir + // is never created and the constructor's startup SF recovery is a no-op. + // The factory replaces createUnlocked(), so localhost:1 is never dialed. + String sfDir = Paths.get(System.getProperty("java.io.tmpdir"), + "qdb-sf-borrowfail-" + System.nanoTime()).toString(); + String sfCfg = "ws::addr=localhost:1;sf_dir=" + sfDir + ";"; - try (SenderPool pool = newPool(sfCfg, 0, 1, 2_000, factory)) { - try { - pool.borrow(); - Assert.fail("borrow() must propagate the Error from the failed build"); - } catch (AssertionError expected) { - // expected: the original throwable propagates out of borrow() - } + AtomicInteger calls = new AtomicInteger(); + IntFunction factory = slotIndex -> { + // First borrow-triggered build fails (the slot index reserved for + // it must be released); later builds succeed. + if (calls.getAndIncrement() == 0) { + throw new AssertionError("injected native build failure on first grow"); + } + return fakeSender(new AtomicBoolean()); + }; - // The single SF slot index must have been returned to the free set. - // If it leaked, this borrow() trips the capacity invariant (or, in - // the timeout-only variant, exhausts the acquire budget). - Sender second = pool.borrow(); - try { - Assert.assertNotNull( - "after a failed grow the SF slot index must be reusable", second); - } finally { - second.close(); + try (SenderPool pool = newPool(sfCfg, 0, 1, 2_000, factory)) { + try { + pool.borrow(); + Assert.fail("borrow() must propagate the Error from the failed build"); + } catch (AssertionError expected) { + // expected: the original throwable propagates out of borrow() + } + + // The single SF slot index must have been returned to the free set. + // If it leaked, this borrow() trips the capacity invariant (or, in + // the timeout-only variant, exhausts the acquire budget). + Sender second = pool.borrow(); + try { + Assert.assertNotNull( + "after a failed grow the SF slot index must be reusable", second); + } finally { + second.close(); + } } - } + }); } private static Sender fakeSender(AtomicBoolean closedFlag) { @@ -246,10 +267,7 @@ private static Sender fakeSender(AtomicBoolean closedFlag) { private static SenderPool newPool( String cfg, int min, int max, long acquireMs, IntFunction senderFactory - ) throws Exception { - Constructor c = SenderPool.class.getDeclaredConstructor( - String.class, int.class, int.class, long.class, long.class, long.class, IntFunction.class); - c.setAccessible(true); - return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, senderFactory); + ) { + return new SenderPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, senderFactory); } } diff --git a/core/src/test/java/io/questdb/client/test/impl/SenderPoolSfTest.java b/core/src/test/java/io/questdb/client/test/impl/SenderPoolSfTest.java index e4b2b49a..2c76997d 100644 --- a/core/src/test/java/io/questdb/client/test/impl/SenderPoolSfTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/SenderPoolSfTest.java @@ -43,7 +43,6 @@ import org.slf4j.LoggerFactory; import java.io.IOException; -import java.lang.reflect.Constructor; import java.lang.reflect.Field; import java.lang.reflect.Method; import java.nio.ByteBuffer; @@ -207,7 +206,10 @@ public void testReturnedSenderReusesSameSlot() throws Exception { first.close(); PooledSender second = pool.borrow(); try { - Assert.assertSame("returned slot must be recycled", first, second); + // borrow() now returns a fresh wrapper each time; the + // recycled thing is the underlying slot. + Assert.assertSame("returned slot must be recycled", + getField(first, "slot"), getField(second, "slot")); Assert.assertEquals("no new slot dir on recycle", 1, countSlotDirs()); Assert.assertTrue(Files.exists(slot("default-0"))); } finally { @@ -1883,9 +1885,12 @@ private static void rmDir(String dir) { } private static Sender getDelegate(PooledSender ps) throws Exception { - Field f = PooledSender.class.getDeclaredField("delegate"); + Field slotF = PooledSender.class.getDeclaredField("slot"); + slotF.setAccessible(true); + Object slot = slotF.get(ps); + Field f = slot.getClass().getDeclaredField("delegate"); f.setAccessible(true); - return (Sender) f.get(ps); + return (Sender) f.get(slot); } // Invokes one of the pool's private managed-slot delegate factories @@ -1931,27 +1936,20 @@ private static void invokeDiscardBroken(SenderPool pool, PooledSender ps) throws m.invoke(pool, ps); } - // Reaches the package-private senderFactory test seam by reflection so a - // test can inject a fake/forged delegate (mirrors SenderPoolErrorSafetyTest). + // Uses the @TestOnly senderFactory seam so a test can inject a fake/forged + // delegate (mirrors SenderPoolErrorSafetyTest). private static SenderPool newPoolWithFactory( String cfg, int min, int max, long acquireMs, IntFunction senderFactory - ) throws Exception { - Constructor c = SenderPool.class.getDeclaredConstructor( - String.class, int.class, int.class, long.class, long.class, long.class, IntFunction.class); - c.setAccessible(true); - return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, senderFactory); + ) { + return new SenderPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, senderFactory); } - // Reaches the package-private 8-arg constructor (deferStartupRecovery=true) - // by reflection so a test can build a pool whose SF startup recovery is NOT - // run inline -- mirroring the pooled QuestDB handle, which defers it to the - // housekeeper. senderFactory=null -> the real defaultSender(). - private static SenderPool newDeferredPool(String cfg, int min, int max, long acquireMs) throws Exception { - Constructor c = SenderPool.class.getDeclaredConstructor( - String.class, int.class, int.class, long.class, long.class, long.class, - IntFunction.class, boolean.class); - c.setAccessible(true); - return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, null, true); + // Uses the @TestOnly 8-arg constructor (deferStartupRecovery=true) so a test + // can build a pool whose SF startup recovery is NOT run inline -- mirroring + // the pooled QuestDB handle, which defers it to the housekeeper. + // senderFactory=null -> the real defaultSender(). + private static SenderPool newDeferredPool(String cfg, int min, int max, long acquireMs) { + return new SenderPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, null, true); } // Drives a deferred pool's startup recovery to completion (the housekeeper @@ -1982,12 +1980,8 @@ private static void invokeMarkClosing(SenderPool pool) throws Exception { // test can drive the housekeeper recovery path against fully controlled // (fake) recoverers. private static SenderPool newDeferredPoolWithFactory( - String cfg, int min, int max, long acquireMs, IntFunction factory) throws Exception { - Constructor c = SenderPool.class.getDeclaredConstructor( - String.class, int.class, int.class, long.class, long.class, long.class, - IntFunction.class, boolean.class); - c.setAccessible(true); - return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, factory, true); + String cfg, int min, int max, long acquireMs, IntFunction factory) { + return new SenderPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, factory, true); } // Fake Sender whose drain() (for slot 0 only) parks until released, opening a diff --git a/core/src/test/java/io/questdb/client/test/impl/SenderPoolTest.java b/core/src/test/java/io/questdb/client/test/impl/SenderPoolTest.java index 85952f85..3f16b965 100644 --- a/core/src/test/java/io/questdb/client/test/impl/SenderPoolTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/SenderPoolTest.java @@ -34,10 +34,7 @@ import java.lang.reflect.Field; import java.lang.reflect.Proxy; -import java.util.concurrent.CountDownLatch; -import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicInteger; -import java.util.concurrent.atomic.AtomicReference; /** * Unit tests for the {@link SenderPool} borrow/return semantics. Uses the @@ -57,26 +54,36 @@ public class SenderPoolTest { "http::addr=127.0.0.1:1;protocol_version=2;auto_flush=off;"; @Test - public void testBorrowReturnRecyclesSameDecorator() { + public void testBorrowReturnRecyclesSameDecorator() throws Exception { try (SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE)) { Sender first = pool.borrow(); first.close(); Sender second = pool.borrow(); - Assert.assertSame("returned decorator should be reused after close()", first, second); + // Each borrow is a fresh PooledSender wrapper; what the pool recycles + // is the underlying slot, so compare those rather than the handles. + Assert.assertSame("returned slot should be recycled after close()", + slotOf(first), slotOf(second)); second.close(); } } + private static Object slotOf(Sender pooledWrapper) throws Exception { + Field f = PooledSender.class.getDeclaredField("slot"); + f.setAccessible(true); + return f.get(pooledWrapper); + } + @Test - public void testBrokenSenderIsNotReturnedToPool() { + public void testBrokenSenderIsNotReturnedToPool() throws Exception { // Borrowing, buffering a row, and then closing forces flush() against - // the unreachable address, which throws. The broken wrapper must not - // be returned to the pool: its delegate's buffer still holds the - // failed row, and on transports with terminal-failure semantics the - // delegate is also unusable. Either way, the next borrower must get - // a fresh wrapper. + // the unreachable address, which throws. The broken slot must not be + // returned to the pool: its delegate's buffer still holds the failed + // row, and on transports with terminal-failure semantics the delegate + // is also unusable. Either way, the next borrower must get a fresh + // slot, not the broken one. try (SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE)) { Sender first = pool.borrow(); + Object firstSlot = slotOf(first); first.table("t").longColumn("v", 1).atNow(); try { first.close(); @@ -86,11 +93,23 @@ public void testBrokenSenderIsNotReturnedToPool() { } Sender second = pool.borrow(); try { - Assert.assertNotSame("broken sender must not be handed back to next borrower", - first, second); + // borrow() always hands out a FRESH PooledSender wrapper, so + // assertNotSame(first, second) on the wrappers is vacuously + // true and proves nothing -- it stays true whether or not the + // broken slot was discarded. What the pool recycles is the + // underlying slot, so a broken slot leaking back to the next + // borrower shows up as the SAME slot. Assert the slot differs. + Assert.assertNotSame("broken slot must not be handed back to next borrower", + firstSlot, slotOf(second)); } finally { - if (second != first) { + // On the failing path (broken slot recycled) second.close() + // re-throws, since its delegate's buffer still holds the + // failed row; swallow it so the assertion above is what + // surfaces rather than this incidental close() failure. + try { second.close(); + } catch (LineSenderException ignored) { + // expected only when the regression is present } } } @@ -319,180 +338,6 @@ public void testReapIdleRespectsMinSize() throws InterruptedException { } } - @Test - public void testPinAfterCloseRejectsStaleEntry() throws Exception { - // Pin from a worker thread, close the pool from main. The worker's - // ThreadLocal still references its PooledSender, but the underlying - // delegate has been closed. The next pinToCurrentThread() on the - // worker must reject the stale entry instead of handing it back. - SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE); - CountDownLatch pinned = new CountDownLatch(1); - CountDownLatch closed = new CountDownLatch(1); - AtomicReference secondCallError = new AtomicReference<>(); - Thread worker = new Thread(() -> { - try { - pool.pinToCurrentThread(); - pinned.countDown(); - Assert.assertTrue(closed.await(2, TimeUnit.SECONDS)); - try { - pool.pinToCurrentThread(); - secondCallError.set(new AssertionError("pinToCurrentThread after close must throw")); - } catch (LineSenderException e) { - // expected - } - } catch (Throwable t) { - secondCallError.set(t); - } - }); - worker.start(); - Assert.assertTrue(pinned.await(2, TimeUnit.SECONDS)); - pool.close(); - closed.countDown(); - worker.join(2_000); - if (secondCallError.get() != null) { - throw new AssertionError(secondCallError.get()); - } - } - - @Test - public void testPinAfterUserCloseDoesNotShareWrapper() { - // Same-thread reproducer for the pinToCurrentThread() sharing bug. - // The user closes a pinned Sender (the natural try-with-resources - // idiom on the public Sender API), then another consumer borrows - // the slot. pinToCurrentThread() must not hand that wrapper back: - // it is now owned by the second consumer. - // - // Pool size 1 collapses the race window into a linear sequence: - // the second borrower deterministically receives the same slot - // that was just returned, so the bug is observable at the - // wrapper-identity level without timing. - try (SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 100, Long.MAX_VALUE, Long.MAX_VALUE)) { - Sender pinned = pool.pinToCurrentThread(); - pinned.close(); // pool slot returned; ThreadLocal still points at it - Sender stolen = pool.borrow(); // pollFirst hands the same wrapper to a new consumer - try { - Sender repinned = pool.pinToCurrentThread(); - Assert.fail("pinToCurrentThread() returned wrapper " + repinned - + " already borrowed by another consumer " + stolen); - } catch (LineSenderException expected) { - // After fix: TL cleared (or owner-thread invalidated) on close; - // re-pin tries to borrow, pool is empty, acquireTimeout fires. - } finally { - stolen.close(); - } - } - } - - @Test - public void testPinAfterUserCloseDoesNotShareWrapperCrossThread() throws InterruptedException { - // Cross-thread variant of the same bug, mirroring the originally - // reported trigger: Thread A pins, closes, then re-pins while - // Thread B has borrowed the slot in between. A's ThreadLocal still - // references the wrapper, and pinToCurrentThread() hands it back -- - // so A and B end up writing to the same underlying Sender. - try (SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 100, Long.MAX_VALUE, Long.MAX_VALUE)) { - CountDownLatch aClosed = new CountDownLatch(1); - CountDownLatch bBorrowed = new CountDownLatch(1); - AtomicReference bSender = new AtomicReference<>(); - AtomicReference failure = new AtomicReference<>(); - - Thread a = new Thread(() -> { - try { - Sender s = pool.pinToCurrentThread(); - s.close(); - aClosed.countDown(); - Assert.assertTrue(bBorrowed.await(2, TimeUnit.SECONDS)); - try { - Sender repinned = pool.pinToCurrentThread(); - failure.compareAndSet(null, new AssertionError( - "pinToCurrentThread() returned wrapper " + repinned - + " already borrowed by another thread " + bSender.get())); - } catch (LineSenderException expected) { - // After fix: re-pin tries to borrow, pool is empty, times out. - } - } catch (Throwable t) { - failure.compareAndSet(null, t); - } - }); - Thread b = new Thread(() -> { - try { - Assert.assertTrue(aClosed.await(2, TimeUnit.SECONDS)); - bSender.set(pool.borrow()); - } catch (Throwable t) { - failure.compareAndSet(null, t); - } finally { - bBorrowed.countDown(); - } - }); - - a.start(); - b.start(); - a.join(4_000); - b.join(4_000); - - if (bSender.get() != null) { - bSender.get().close(); - } - if (failure.get() != null) { - throw new AssertionError(failure.get()); - } - } - } - - @Test - public void testReleaseAfterCloseIsSafe() throws Exception { - // Same setup as the pin test, but exercise releaseCurrentThread() - // instead. With a closed delegate underneath, the release path must - // not invoke flush() on the dead Sender. - SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE); - CountDownLatch pinned = new CountDownLatch(1); - CountDownLatch closed = new CountDownLatch(1); - AtomicReference releaseError = new AtomicReference<>(); - Thread worker = new Thread(() -> { - try { - pool.pinToCurrentThread(); - pinned.countDown(); - Assert.assertTrue(closed.await(2, TimeUnit.SECONDS)); - pool.releaseCurrentThread(); - } catch (Throwable t) { - releaseError.set(t); - } - }); - worker.start(); - Assert.assertTrue(pinned.await(2, TimeUnit.SECONDS)); - pool.close(); - closed.countDown(); - worker.join(2_000); - if (releaseError.get() != null) { - throw new AssertionError(releaseError.get()); - } - } - - @Test - public void testThreadAffinityIsPerThread() throws InterruptedException { - try (SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 2, 2, 1_000, Long.MAX_VALUE, Long.MAX_VALUE)) { - Sender mainPinned = pool.pinToCurrentThread(); - Assert.assertSame("re-pin on same thread returns same instance", - mainPinned, pool.pinToCurrentThread()); - - AtomicReference otherPinned = new AtomicReference<>(); - CountDownLatch done = new CountDownLatch(1); - Thread t = new Thread(() -> { - try { - otherPinned.set(pool.pinToCurrentThread()); - } finally { - done.countDown(); - } - }); - t.start(); - Assert.assertTrue(done.await(2, TimeUnit.SECONDS)); - Assert.assertNotSame("different threads must get different pinned Senders", - mainPinned, otherPinned.get()); - - pool.releaseCurrentThread(); - } - } - // ---------------------------------------------------------------------- // Teardown robustness: a delegate close() can throw an Error (e.g. an // -ea AssertionError), not just a RuntimeException. The pool's best-effort @@ -578,9 +423,12 @@ public void testCloseSurvivesDelegateCloseError() throws Exception { * while the test does not leak native memory. */ private static void installFailingCloseDelegate(PooledSender ps, AtomicInteger closeAttempts) throws Exception { - Field f = PooledSender.class.getDeclaredField("delegate"); + Field slotF = PooledSender.class.getDeclaredField("slot"); + slotF.setAccessible(true); + Object slot = slotF.get(ps); + Field f = slot.getClass().getDeclaredField("delegate"); f.setAccessible(true); - Sender real = (Sender) f.get(ps); + Sender real = (Sender) f.get(slot); Sender failing = (Sender) Proxy.newProxyInstance( Sender.class.getClassLoader(), new Class[]{Sender.class}, @@ -601,6 +449,6 @@ private static void installFailingCloseDelegate(PooledSender ps, AtomicInteger c } return method.invoke(real, args); }); - f.set(ps, failing); + f.set(slot, failing); } } diff --git a/core/src/test/java/io/questdb/client/test/impl/WsSenderConfigHonoredTest.java b/core/src/test/java/io/questdb/client/test/impl/WsSenderConfigHonoredTest.java index 69453c77..51003bfc 100644 --- a/core/src/test/java/io/questdb/client/test/impl/WsSenderConfigHonoredTest.java +++ b/core/src/test/java/io/questdb/client/test/impl/WsSenderConfigHonoredTest.java @@ -77,6 +77,7 @@ public void testEveryIngressKeyIsHonored() { assertHonored("connection_listener_inbox_capacity=64", "connection_listener_inbox_capacity", 64); assertHonored("token=ey.abc", "token", "ey.abc"); assertHonored("auth_timeout_ms=4321", "auth_timeout_ms", 4321L); + assertHonored("connect_timeout=7000", "connect_timeout", 7000); // username/password together (both-or-neither), and the user/pass aliases. Map creds = snapshot("ws::addr=h:9000;username=alice;password=secret;"); diff --git a/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketHandshakeOverflowTest.java b/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketHandshakeOverflowTest.java index 25b138bd..8d4ca755 100644 --- a/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketHandshakeOverflowTest.java +++ b/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketHandshakeOverflowTest.java @@ -81,7 +81,8 @@ public void testHandshakeWrapOverflowWithNonEmptyBufferShouldNotLoopForever() th CountDownLatch done = new CountDownLatch(1); t = new Thread(() -> { try { - socket.startTlsSession("test.host"); + socket.startTlsSession("test.host", op -> { + }); } catch (Throwable ignored) { // Expected: a healthy handshake loop should fail loudly here, // not spin forever. Any exception (AssertionError, SSLException, diff --git a/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketTest.java b/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketTest.java index 506ce783..05313368 100644 --- a/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketTest.java +++ b/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketTest.java @@ -25,9 +25,11 @@ package io.questdb.client.test.network; import io.questdb.client.ClientTlsConfiguration; +import io.questdb.client.network.IOOperation; import io.questdb.client.network.JavaTlsClientSocket; import io.questdb.client.network.NetworkFacade; import io.questdb.client.network.NetworkFacadeImpl; +import io.questdb.client.network.SocketReadinessWaiter; import io.questdb.client.std.MemoryTag; import io.questdb.client.std.Unsafe; import io.questdb.client.test.tools.TestUtils; @@ -40,9 +42,11 @@ import javax.net.ssl.SSLParameters; import javax.net.ssl.SSLSession; import java.lang.reflect.Field; +import java.lang.reflect.InvocationTargetException; import java.lang.reflect.Method; import java.nio.ByteBuffer; import java.util.List; +import java.util.concurrent.atomic.AtomicInteger; import java.util.function.BiFunction; import static org.junit.Assert.assertEquals; @@ -190,6 +194,136 @@ public void testRecvProcessesBufferedRecordAfterEmptyOkUnwrap() throws Exception } } + /** + * Regression test for the TLS handshake busy-spin / unbounded handshake. + * On a non-blocking socket, a peer that completes TCP but stalls before + * sending its half of the handshake leaves the engine in NEED_UNWRAP with + * the socket returning "would block" (recv == 0). The handshake must hand + * control to the readiness waiter -- which in production parks on the event + * loop bounded by the connect deadline -- instead of re-reading in a tight + * loop. Here the waiter stands in for that deadline: it records the wait + * and then throws, exactly as the bounded ioWait() does once the budget is + * spent. The method-level timeout fails the test if the handshake ever + * busy-spins past the waiter (i.e. if the deadline-aware wait is removed). + */ + @Test(timeout = 30_000) + public void testHandshakeWaitsForReadabilityInsteadOfBusySpinning() throws Exception { + TestUtils.assertMemoryLeak(() -> { + try (JavaTlsClientSocket socket = newSocket()) { + invoke(socket, "prepareInternalBuffers"); + setField(socket, "sslEngine", new StallingUnwrapSslEngine()); + // Mark the session as TLS so try-with-resources close() frees the internal buffers + // allocated above. Without this the socket stays STATE_EMPTY and close() returns early, + // leaking the 3x256KB NATIVE_TLS_RSS buffers. + setIntField(socket, "state", 2); + + Method runHandshake = JavaTlsClientSocket.class.getDeclaredMethod( + "runHandshake", SocketReadinessWaiter.class); + runHandshake.setAccessible(true); + + AtomicInteger readWaits = new AtomicInteger(); + AtomicInteger writeWaits = new AtomicInteger(); + SocketReadinessWaiter waiter = op -> { + if (op == IOOperation.READ) { + readWaits.incrementAndGet(); + } else { + writeWaits.incrementAndGet(); + } + // Stand in for the connect deadline firing inside ioWait(). + throw new DeadlineReached(); + }; + + try { + runHandshake.invoke(socket, waiter); + Assert.fail("runHandshake must not complete the handshake against a stalled peer"); + } catch (InvocationTargetException e) { + Assert.assertTrue( + "handshake must surface the readiness waiter's deadline, was: " + e.getCause(), + e.getCause() instanceof DeadlineReached); + } + + Assert.assertEquals( + "handshake must wait for the socket to become readable instead of busy-spinning", + 1, readWaits.get()); + Assert.assertEquals( + "a NEED_UNWRAP stall must not trigger a write wait", 0, writeWaits.get()); + } + }); + } + + /** + * Happy-path guard for the refactor: when the engine makes progress (a + * complete record is available, unwrap returns OK and the handshake + * finishes), runHandshake must complete without ever parking on socket + * readiness. The would-block waits only fire on recv/send == 0, so a + * responsive peer never triggers them. + */ + @Test(timeout = 30_000) + public void testHandshakeCompletesWithoutWaitingWhenEngineMakesProgress() throws Exception { + TestUtils.assertMemoryLeak(() -> { + try (JavaTlsClientSocket socket = newSocket()) { + invoke(socket, "prepareInternalBuffers"); + setField(socket, "sslEngine", new ProgressingUnwrapSslEngine()); + // Mark the session as TLS so try-with-resources close() frees the internal buffers + // allocated above. Without this the socket stays STATE_EMPTY and close() returns early, + // leaking the 3x256KB NATIVE_TLS_RSS buffers. + setIntField(socket, "state", 2); + + Method runHandshake = JavaTlsClientSocket.class.getDeclaredMethod( + "runHandshake", SocketReadinessWaiter.class); + runHandshake.setAccessible(true); + + AtomicInteger waits = new AtomicInteger(); + SocketReadinessWaiter waiter = op -> waits.incrementAndGet(); + + runHandshake.invoke(socket, waiter); // must return normally (handshake finished) + + Assert.assertEquals( + "a handshake that makes progress must not wait on socket readiness", + 0, waits.get()); + } + }); + } + + /** + * Regression guard for the NOT_HANDSHAKING loop exit. Per the JSSE + * contract, {@code getHandshakeStatus()} never returns FINISHED -- once a + * delegated task is the TERMINAL handshake step, the re-polled status is + * NOT_HANDSHAKING. runHandshake must treat that as completion; without + * the explicit NOT_HANDSHAKING exit clause the status matches no switch + * case and the loop busy-spins forever with no deadline escape (this + * method's timeout is the tripwire). + */ + @Test(timeout = 30_000) + public void testHandshakeExitsOnNotHandshakingAfterTerminalDelegatedTask() throws Exception { + TestUtils.assertMemoryLeak(() -> { + try (JavaTlsClientSocket socket = newSocket()) { + invoke(socket, "prepareInternalBuffers"); + TerminalDelegatedTaskSslEngine engine = new TerminalDelegatedTaskSslEngine(); + setField(socket, "sslEngine", engine); + // Mark the session as TLS so try-with-resources close() frees the internal buffers + // allocated above. Without this the socket stays STATE_EMPTY and close() returns early, + // leaking the 3x256KB NATIVE_TLS_RSS buffers. + setIntField(socket, "state", 2); + + Method runHandshake = JavaTlsClientSocket.class.getDeclaredMethod( + "runHandshake", SocketReadinessWaiter.class); + runHandshake.setAccessible(true); + + AtomicInteger waits = new AtomicInteger(); + SocketReadinessWaiter waiter = op -> waits.incrementAndGet(); + + runHandshake.invoke(socket, waiter); // must return: NOT_HANDSHAKING == done + + Assert.assertEquals("the terminal delegated task must run exactly once", + 1, engine.tasksRun.get()); + Assert.assertEquals( + "completion via NOT_HANDSHAKING must not park on socket readiness", + 0, waits.get()); + } + }); + } + private static void assertBytes(String expected, long ptr, int len) { Assert.assertEquals(expected.length(), len); for (int i = 0; i < len; i++) { @@ -333,6 +467,80 @@ public SSLEngineResult unwrap(ByteBuffer src, ByteBuffer[] dsts, int offset, int } } + private static final class DeadlineReached extends RuntimeException { + } + + private static final class ProgressingUnwrapSslEngine extends StubSslEngine { + @Override + public SSLEngineResult.HandshakeStatus getHandshakeStatus() { + return SSLEngineResult.HandshakeStatus.NEED_UNWRAP; + } + + @Override + public SSLEngineResult unwrap(ByteBuffer src, ByteBuffer[] dsts, int offset, int length) { + // A complete record was available: consume it and finish the + // handshake, so the loop exits without waiting. + return new SSLEngineResult( + SSLEngineResult.Status.OK, + SSLEngineResult.HandshakeStatus.FINISHED, + 0, + 0 + ); + } + } + + private static final class StallingUnwrapSslEngine extends StubSslEngine { + @Override + public SSLEngineResult.HandshakeStatus getHandshakeStatus() { + return SSLEngineResult.HandshakeStatus.NEED_UNWRAP; + } + + @Override + public SSLEngineResult unwrap(ByteBuffer src, ByteBuffer[] dsts, int offset, int length) { + // No complete TLS record buffered yet: ask for more bytes from the + // socket. The stalled peer never sends them, so the handshake must + // wait on readability rather than spin. + return new SSLEngineResult( + SSLEngineResult.Status.BUFFER_UNDERFLOW, + SSLEngineResult.HandshakeStatus.NEED_UNWRAP, + 0, + 0 + ); + } + } + + /** + * Models the JSSE terminal-delegated-task shape: NEED_TASK until the + * handed-out task has run, then NOT_HANDSHAKING (never FINISHED -- + * getHandshakeStatus() cannot return it per the JSSE contract). + */ + private static final class TerminalDelegatedTaskSslEngine extends StubSslEngine { + final AtomicInteger tasksRun = new AtomicInteger(); + private boolean taskHandedOut; + + @Override + public Runnable getDelegatedTask() { + if (taskHandedOut) { + return null; + } + taskHandedOut = true; + return tasksRun::incrementAndGet; + } + + @Override + public SSLEngineResult.HandshakeStatus getHandshakeStatus() { + return tasksRun.get() == 0 + ? SSLEngineResult.HandshakeStatus.NEED_TASK + : SSLEngineResult.HandshakeStatus.NOT_HANDSHAKING; + } + + @Override + public SSLEngineResult unwrap(ByteBuffer src, ByteBuffer[] dsts, int offset, int length) { + throw new IllegalStateException( + "NEED_TASK -> NOT_HANDSHAKING completion must not unwrap"); + } + } + private static abstract class StubSslEngine extends SSLEngine { @Override public void beginHandshake() { diff --git a/core/src/test/java/io/questdb/client/test/network/NetConnectTimeoutTest.java b/core/src/test/java/io/questdb/client/test/network/NetConnectTimeoutTest.java new file mode 100644 index 00000000..b5d2c5d0 --- /dev/null +++ b/core/src/test/java/io/questdb/client/test/network/NetConnectTimeoutTest.java @@ -0,0 +1,118 @@ +/*+***************************************************************************** + * ___ _ ____ ____ + * / _ \ _ _ ___ ___| |_| _ \| __ ) + * | | | | | | |/ _ \/ __| __| | | | _ \ + * | |_| | |_| | __/\__ \ |_| |_| | |_) | + * \__\_\\__,_|\___||___/\__|____/|____/ + * + * Copyright (c) 2014-2019 Appsicle + * Copyright (c) 2019-2026 QuestDB + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + ******************************************************************************/ + +package io.questdb.client.test.network; + +import io.questdb.client.network.NetworkFacade; +import io.questdb.client.network.NetworkFacadeImpl; +import org.junit.Assert; +import org.junit.Assume; +import org.junit.Test; + +import java.net.InetSocketAddress; +import java.net.ServerSocket; + +/** + * Exercises the native non-blocking connect-with-timeout primitive + * ({@link NetworkFacade#connectAddrInfoTimeout}). + */ +public class NetConnectTimeoutTest { + + private static final NetworkFacade NF = NetworkFacadeImpl.INSTANCE; + + @Test + public void testConnectRefusedReturnsErrorNotTimeout() throws Exception { + // Bind then immediately close to obtain a port with no listener; a + // connect to it is refused (RST) rather than timed out. + int port; + try (ServerSocket ss = new ServerSocket()) { + ss.bind(new InetSocketAddress("127.0.0.1", 0)); + port = ss.getLocalPort(); + } + + long addrInfo = NF.getAddrInfo("127.0.0.1", port); + Assert.assertNotEquals(-1, addrInfo); + int fd = NF.socketTcp(true); + try { + int rc = NF.connectAddrInfoTimeout(fd, addrInfo, 5_000); + Assert.assertNotEquals("refused connect must not report success", 0, rc); + Assert.assertNotEquals("refused connect must not be reported as a timeout", + NetworkFacade.CONNECT_TIMEOUT, rc); + } finally { + NF.freeAddrInfo(addrInfo); + NF.close(fd); + } + } + + @Test + public void testConnectSucceedsWithinTimeout() throws Exception { + try (ServerSocket ss = new ServerSocket()) { + ss.bind(new InetSocketAddress("127.0.0.1", 0)); + int port = ss.getLocalPort(); + + long addrInfo = NF.getAddrInfo("127.0.0.1", port); + Assert.assertNotEquals(-1, addrInfo); + int fd = NF.socketTcp(true); + try { + int rc = NF.connectAddrInfoTimeout(fd, addrInfo, 5_000); + Assert.assertEquals("loopback connect should succeed", 0, rc); + } finally { + NF.freeAddrInfo(addrInfo); + NF.close(fd); + } + } + } + + @Test + public void testConnectToBlackholeTimesOut() { + // 192.0.2.0/24 is TEST-NET-1 (RFC 5737); packets are silently dropped on + // a normal network, so the SYN goes unanswered and the timeout fires + // instead of the (much longer) OS connect timeout. + long addrInfo = NF.getAddrInfo("192.0.2.1", 9009); + Assert.assertNotEquals(-1, addrInfo); + int fd = NF.socketTcp(true); + try { + long start = System.nanoTime(); + int rc = NF.connectAddrInfoTimeout(fd, addrInfo, 500); + long elapsedMs = (System.nanoTime() - start) / 1_000_000L; + + // Whatever the outcome, the key guarantee is that we never blocked + // on the (multi-minute) OS connect timeout. + Assert.assertTrue("connect must return near the budget, was " + elapsedMs + "ms", elapsedMs < 5_000); + + // The deterministic outcome depends on the runner's routing for + // TEST-NET-1: a dropped SYN yields a real timeout (the path under + // test), while a runner with no route to 192.0.2.0/24 fails fast + // with ENETUNREACH/EHOSTUNREACH (rc == -1) and a rare appliance may + // even accept it (rc == 0). Only the timeout case is assertable; the + // others can't exercise the timeout, so skip rather than flake. + Assume.assumeTrue("no route to blackhole on this runner (rc=" + rc + ")", + rc == NetworkFacade.CONNECT_TIMEOUT); + Assert.assertEquals("blackhole connect should time out", NetworkFacade.CONNECT_TIMEOUT, rc); + } finally { + NF.freeAddrInfo(addrInfo); + NF.close(fd); + } + } +} diff --git a/core/src/test/java/io/questdb/client/test/tools/TestUtils.java b/core/src/test/java/io/questdb/client/test/tools/TestUtils.java index 60cbc9ef..270311cf 100644 --- a/core/src/test/java/io/questdb/client/test/tools/TestUtils.java +++ b/core/src/test/java/io/questdb/client/test/tools/TestUtils.java @@ -266,20 +266,23 @@ public void close() { return; } - // Checks that the same tag used for allocation and freeing native memory + // Every tag must return to its baseline. The previous shape + // (ported from upstream, which exempts NATIVE_SQL_COMPILER only) + // absorbed any growth confined to a single tag into a tolerated + // diff, so a lone-tag leak (e.g. NATIVE_DEFAULT) passed the check. + // This client has no SQL-compiler tag, so no exemption applies: + // assert strict per-tag equality, then total equality. long memAfter = Unsafe.getMemUsed(); - long memNativeSqlCompilerDiff = 0; Assert.assertTrue(memAfter > -1); - if (mem != memAfter) { - for (int i = MemoryTag.MMAP_DEFAULT; i < MemoryTag.SIZE; i++) { - long actualMemByTag = Unsafe.getMemUsedByTag(i); - if (memoryUsageByTag[i] != actualMemByTag) { - Assert.assertTrue(actualMemByTag >= memoryUsageByTag[i]); - memNativeSqlCompilerDiff = actualMemByTag - memoryUsageByTag[i]; - } + for (int i = MemoryTag.MMAP_DEFAULT; i < MemoryTag.SIZE; i++) { + long actualMemByTag = Unsafe.getMemUsedByTag(i); + if (memoryUsageByTag[i] != actualMemByTag) { + Assert.assertEquals( + "native memory leaked or over-freed under tag " + MemoryTag.nameOf(i), + memoryUsageByTag[i], actualMemByTag); } - Assert.assertEquals(mem + memNativeSqlCompilerDiff, memAfter); } + Assert.assertEquals("total native memory", mem, memAfter); } public void skipChecks() { diff --git a/core/src/test/java/io/questdb/client/test/tools/TlsProxy.java b/core/src/test/java/io/questdb/client/test/tools/TlsProxy.java deleted file mode 100644 index 69511007..00000000 --- a/core/src/test/java/io/questdb/client/test/tools/TlsProxy.java +++ /dev/null @@ -1,248 +0,0 @@ -/*+***************************************************************************** - * ___ _ ____ ____ - * / _ \ _ _ ___ ___| |_| _ \| __ ) - * | | | | | | |/ _ \/ __| __| | | | _ \ - * | |_| | |_| | __/\__ \ |_| |_| | |_) | - * \__\_\\__,_|\___||___/\__|____/|____/ - * - * Copyright (c) 2014-2019 Appsicle - * Copyright (c) 2019-2026 QuestDB - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - * - ******************************************************************************/ - -package io.questdb.client.test.tools; - -import javax.net.SocketFactory; -import javax.net.ssl.KeyManagerFactory; -import javax.net.ssl.SSLContext; -import javax.net.ssl.SSLServerSocketFactory; -import javax.net.ssl.TrustManagerFactory; -import java.io.Closeable; -import java.io.IOException; -import java.io.InputStream; -import java.io.OutputStream; -import java.net.ServerSocket; -import java.net.Socket; -import java.security.KeyStore; -import java.security.SecureRandom; -import java.util.Collections; -import java.util.Iterator; -import java.util.Set; -import java.util.concurrent.ConcurrentHashMap; -import java.util.concurrent.atomic.AtomicInteger; - -public final class TlsProxy { - private final String dstHost; - private final int dstPort; - private final String keystore; - private final char[] keystorePassword; - private final Set links = Collections.newSetFromMap(new ConcurrentHashMap<>()); - private Thread acceptorThread; - private volatile boolean killAfterAccepting; - private ServerSocket serverSocket; - private volatile boolean shutdownRequested; - - public TlsProxy(String dstHost, int dstPort, String keystore, char[] keystorePassword) { - this.dstHost = dstHost; - this.dstPort = dstPort; - this.keystore = keystore; - this.keystorePassword = keystorePassword; - } - - public synchronized void killAfterAccepting() { - killAfterAccepting = true; - } - - public synchronized void killConnections() { - Iterator iterator = links.iterator(); - while (iterator.hasNext()) { - Link link = iterator.next(); - link.kill(); - iterator.remove(); - } - } - - public int start() { - return TestUtils.unchecked(() -> { - SSLContext sslContext = SSLContext.getInstance("TLS"); - TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm()); - tmf.init(KeyStore.getInstance(KeyStore.getDefaultType())); - - KeyStore myKeyStore = KeyStore.getInstance(KeyStore.getDefaultType()); - myKeyStore.load(TlsProxy.class.getResourceAsStream(keystore), keystorePassword); - - KeyManagerFactory kmf = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm()); - kmf.init(myKeyStore, keystorePassword); - sslContext.init(kmf.getKeyManagers(), tmf.getTrustManagers(), new SecureRandom()); - SSLServerSocketFactory factory = sslContext.getServerSocketFactory(); - serverSocket = factory.createServerSocket(); - serverSocket.bind(null); - - acceptorThread = new Thread(() -> acceptorLoop(serverSocket)); - acceptorThread.start(); - return serverSocket.getLocalPort(); - }); - } - - public synchronized void stop() { - shutdownRequested = true; - TestUtils.unchecked(() -> serverSocket.close()); - acceptorThread.interrupt(); - TestUtils.unchecked(() -> acceptorThread.join()); - for (Link link : links) { - link.shutDown(); - } - } - - private static void closeQuietly(Closeable closeable) { - if (closeable != null) { - try { - closeable.close(); - } catch (IOException e) { - // whatever - } - } - } - - private void acceptorLoop(ServerSocket socket) { - while (!shutdownRequested) { - Socket frontendSocket = null; - Socket backendSocket; - try { - frontendSocket = socket.accept(); - backendSocket = SocketFactory.getDefault().createSocket(dstHost, dstPort); - } catch (IOException e) { - if (shutdownRequested) { - return; - } - closeQuietly(frontendSocket); - continue; - } - synchronized (this) { - if (shutdownRequested) { - closeQuietly(frontendSocket); - closeQuietly(backendSocket); - return; - } - if (killAfterAccepting) { - closeQuietly(frontendSocket); - closeQuietly(backendSocket); - continue; - } - Link link = new Link(frontendSocket, backendSocket); - links.add(link); - link.start(); - } - } - } - - private static class Link { - private final Socket backend; - private final Pump backendToFrontend; - private final Socket frontend; - private final Pump frontendToBackend; - - private Link(Socket frontend, Socket backend) { - AtomicInteger race = new AtomicInteger(2); - this.frontend = frontend; - this.backend = backend; - frontendToBackend = TestUtils.unchecked(() -> new Pump(frontend.getInputStream(), backend.getOutputStream(), race, "front->backend")); - backendToFrontend = TestUtils.unchecked(() -> new Pump(backend.getInputStream(), frontend.getOutputStream(), race, "backend->frontend")); - } - - private void kill() { - closeQuietly(frontend); - closeQuietly(backend); - } - - private void shutDown() { - frontendToBackend.shutdown(); - backendToFrontend.shutdown(); - } - - private void start() { - Thread frontToBackThread = new Thread(frontendToBackend); - frontToBackThread.setName("front-to-back"); - frontendToBackend.setOwningThread(frontToBackThread); - frontToBackThread.start(); - Thread backToFrontThread = new Thread(backendToFrontend); - backToFrontThread.setName("back-to-front"); - backendToFrontend.setOwningThread(backToFrontThread); - backToFrontThread.start(); - } - } - - private static final class Pump implements Runnable { - private final InputStream from; - private final String name; - private final AtomicInteger race; - private final OutputStream to; - private volatile Thread owningThread; - private volatile boolean shutdownRequested; - - private Pump(InputStream from, OutputStream to, AtomicInteger race, String name) { - this.from = from; - this.to = to; - this.race = race; - this.name = name; - } - - @Override - public void run() { - byte[] buffer = new byte[1024]; - long totalRead = 0; - long totalWritten = 0; - while (!shutdownRequested) { - int i; - try { - i = from.read(buffer); - if (i < 0) { - break; - } - totalRead += i; - } catch (IOException e) { - break; - } - try { - to.write(buffer, 0, i); - to.flush(); - totalWritten += i; - } catch (IOException e) { - break; - } - } - try { - to.flush(); - } catch (IOException e) { - // already closed, no problem - } - System.out.println(name + "Total read: " + totalRead + ", Total written: " + totalWritten); - if (race.decrementAndGet() == 0) { - closeQuietly(from); - closeQuietly(to); - } - } - - public void setOwningThread(Thread owningThread) { - this.owningThread = owningThread; - } - - private void shutdown() { - shutdownRequested = true; - owningThread.interrupt(); - TestUtils.unchecked(() -> owningThread.join()); - } - } -} diff --git a/design/qwp-cursor-durability-todo.md b/design/qwp-cursor-durability-todo.md deleted file mode 100644 index 2598af51..00000000 --- a/design/qwp-cursor-durability-todo.md +++ /dev/null @@ -1,126 +0,0 @@ -# Cursor SF — remaining work - -Branch: `vi_sf` (off `main`). -Spec: `design/qwp-cursor-durability.md` (decisions 1–14 locked). -Memory: project memory `project_sf_self_sufficient_frames.md` documents the "every frame on disk carries full schema" decision — load-bearing for replay/drainer correctness, do not undo without revisiting. - -## What's already done on this branch - -Every locked spec decision (1–14), every knob in the spec table, every counter accessor, plus four bugs uncovered along the way. Recent commits, newest first: - -- `c25773f` background drainer pool — adopt orphan slots and replay them -- `fa5c838` recovery replays sealed segments from baseSeq, not active (3-bug fix: start-position, ackedFsn-seed, fileGeneration-seed) -- `520231c` cursor frames are self-sufficient — full schemas, full dict -- `b9b6e2f` orphan-slot scanner + .failed sentinel + drain_orphans knob -- `40f9742` initial-connect retry opt-in + replay/attempt counters -- `f152583` slot directory model — sender_id + advisory exclusive .lock -- `8828038` cursor reconnect policy — backoff cap + auth-terminal - -Test count: 788 in `io.questdb.client.test.cutlass.qwp.client.**`, 0 failures, 1 skipped (pre-existing). - -## TODO - -### 1. Multi-host failover (HIGH — needs server access) - -The connect-string parses `addr=h1:p1,h2:p2,h3:p3` and stores all hosts in `hosts/ports` lists, but `Sender.build()` only passes `hosts.getQuick(0)` and `ports.getQuick(0)` to `QwpWebSocketSender.connect`. Every reconnect, initial-connect retry, and drainer connect uses the same single host. If host A is down for the per-outage cap, host B is never tried. - -**What to change:** -- `QwpWebSocketSender.buildAndConnect()` — currently builds `WebSocketClient` against `host:port` (single string fields). Either: - - Take a list of (host, port) pairs and round-robin / try-in-order each attempt, OR - - Take a `Supplier` that yields the next endpoint to try and let the sender / loop round-robin externally. -- The reconnect retry-with-backoff loop in `CursorWebSocketSendLoop.fail()` and the helper `connectWithRetry` should treat each host as one attempt — backoff applies *after* exhausting the host list once. -- `Sender.build()` plumbs the full list down (don't drop hosts 1..n). -- `BackgroundDrainer` inherits the same failover via the `ReconnectFactory` it gets from the sender. -- Auth-terminal still terminal across all hosts (one host returning 401 means config is wrong; trying others is unlikely to help — but spec doesn't pin this; could be argued either way). - -**Why server access matters:** to verify failover actually crosses hosts, you want a real multi-server setup (or two `TestWebSocketServer` instances on different ports) with one going down mid-stream and traffic landing on the other. The existing `TestWebSocketServer` is fine for this — but server-side validation that frames arrive intact and dedup-by-messageSequence handles cross-host duplicates is the value-add of the server-side environment. - -**Tests to add:** -- 3 hosts, kill the first connected one, expect reconnect to land on host 2 inside the cap. -- All hosts down at startup → init-connect retry exhausts → terminal. -- Auth failure on host 1 — does it fall through to host 2 or stay terminal? (Spec ambiguity; pick one and document.) - -### 2. `sf_durability=flush` and `sf_durability=append` (deferred per spec) - -Cursor today only supports `sf_durability=memory` (page cache) and rejects `flush`/`append` at build time. Spec line 1001: - -```java -if (sfDurability != SfDurability.MEMORY) { - throw new LineSenderException(... + "is not yet supported (deferred follow-up; use sf_durability=memory)"); -} -``` - -**What to change:** -- `flush` semantics: producer returns from `flush()` only after the engine has called `Files.fsync(fd)` on the active segment up to the just-published cursor position. -- `append` semantics: every `appendBlocking` call fsyncs before returning the FSN. -- Plumb a per-segment `fsync()` method on `MmapSegment` (low-level Files.fsync wrapper exists already). -- Backpressure cost is significant — fsync per-batch (`flush`) is acceptable; fsync per-frame (`append`) is the slow setting. -- Re-enable the rejected paths in `Sender.build()`. - -**Tests:** -- After `flush()` returns and a `kill -9` of the JVM, recovery picks up every flushed frame. Hard to write portably; a soft equivalent: after `flush()`, the file's `fsync` was called (instrumented). -- Throughput regression test for `append` mode (10x slowdown is expected). - -### 3. Drainer + terminal upgrade error e2e test - -Today the drainer's "exhausts cap → drops `.failed`" path is exercised only by unit-level reasoning. There's a synthetic `OrphanScanner.markFailed()` test, but no integration test where: -1. Ghost slot has data, -2. Drainer's connect attempts hit a 401-emitting fixture (or unreachable host), -3. Cap exhausts, -4. `.failed` sentinel ends up in the slot, -5. Future foreground scans skip it. - -The blocker today: the drainer inherits its `ReconnectFactory` from the foreground sender, so they share a target host. To exercise the drainer-fails-while-foreground-succeeds path, the drainer needs a configurable `ReconnectFactory` distinct from the foreground's. OR: stand up two servers on different ports and have the foreground point at the live one while the drainer is wired to point at the dead one. - -This is small once the multi-host failover work clarifies how connection params flow through the drainer. - -### 4. Run the full `core` test suite - -Only `io.questdb.client.test.cutlass.qwp.client.**` was run after each commit. A `mvn -pl core test` end-to-end would catch any unrelated regressions in non-QWP code paths. Last run before this branch: presumably clean (the changes are confined to QWP). - -### 5. JMH benchmark sanity check - -`core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpIngressLatencyBenchmark.java` exists. Self-sufficient frames bloat per-batch bytes vs the prior delta-encoded format — the perf delta should be measured. Run, compare to a baseline from before commit `520231c`, document the result. - -### 6. Cleanups (LOW) - -- `connectionGeneration` retry loop in `QwpWebSocketSender.flushPendingRows` is now dead code — the race it guarded (encode using stale schema state mid-reconnect) can't fire because encode no longer reads `maxSentSchemaId` / `maxSentSymbolId`. Worth ripping out to shrink surface area, but it's harmless as-is (one volatile read per encode). -- `OrphanScanner.hasAnySegmentFile` reports a slot as a candidate orphan if any `.sfa` file exists, including stale empty hot-spares. The drainer no-ops on empty slots (engine.publishedFsn = -1 → ackedFsn already past), but log noise. Filter on actual frame content via a header read. -- README / public-API docs untouched. New connect-string keys, new builder methods, new accessors all have Javadoc but no top-level doc reference. - -### 7. Spec coverage check - -`design/qwp-cursor-durability.md` decision table claims `max_backoff_millis` is "reuse existing". I added `reconnect_max_backoff_millis` as a new key. If `max_backoff_millis` already exists somewhere in the codebase (likely for HTTP retries elsewhere), align names — either rename mine to match, or document that they're distinct. - -## How to run things - -```bash -# Compile everything -mvn -pl core compile test-compile - -# QWP-only suite (fast, ~30s) -mvn -pl core test -Dtest='io.questdb.client.test.cutlass.qwp.client.**' - -# Single test -mvn -pl core test -Dtest=ReconnectTest - -# Full core suite -mvn -pl core test -``` - -Native lib for macOS-aarch64 is already in the repo -(`core/src/main/resources/io/questdb/client/bin/darwin-aarch64/libquestdb.dylib`); -no rebuild needed unless touching `Files.java` natives. - -## Files to know - -- `core/src/main/java/io/questdb/client/Sender.java` — top-level builder + connect-string parser. Scroll to `LineSenderBuilder` (line ~571) for the builder, `build()` for the WS branch (line ~989), and the connect-string switch (line ~2330). -- `core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java` — main sender. `buildAndConnect()` is the host:port-bound connect path (line ~1408 area). -- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java` — I/O thread, reconnect retry loop, replay positioning. -- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java` — engine + slot lock + recovery. -- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainer.java` and `BackgroundDrainerPool.java` — orphan adoption. -- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/OrphanScanner.java` and `SlotLock.java` — slot model. - -## Notes on the testing environment - -The QWP test suite uses `TestWebSocketServer` (in-process, hand-rolled WS server) for everything. It receives binary frames as opaque bytes — does NOT parse the QWP wire format. So tests assert wire behavior (frame counts, byte equivalence, connection lifecycle) but cannot assert server-side semantic correctness (does the server accept these schemas? are messageSequence dedups working?). Validating the wire-protocol bytes against a real QuestDB server is the part that needs the server-code repo. diff --git a/design/qwp-cursor-durability.md b/design/qwp-cursor-durability.md deleted file mode 100644 index 686a2687..00000000 --- a/design/qwp-cursor-durability.md +++ /dev/null @@ -1,174 +0,0 @@ -# QWP WebSocket sender — durability & reconnect spec - -Status: **draft v3**, working notes for the cursor SF refactor on `vi_sf`. - -## Goals -- **Reduce data loss.** SF mode preserves every batch the producer has handed to the engine until the server has ACK'd it, surviving JVM crashes, process restarts, and transient network outages. -- Memory mode (`ws::addr=...;` no `sf_dir`) is reliable enough for typical use under transient network blips. -- SF mode (`ws::...;sf_dir=...`) survives process restarts and JVM crashes; disk does not grow under steady-state traffic (only ACK'd data is trimmed). -- Failure surfaces are loud and distinguishable: "server slow" ≠ "server unreachable" ≠ "data refused". - -## Modes -| | Memory | SF | -|---|---|---| -| Storage | malloc'd ring | mmap'd files under sender's slot dir | -| Cap | `sf_max_total_bytes` (default 128 MiB) | `sf_max_total_bytes` (default 10 GiB) | -| Cap-full behavior | Producer's `flush()`/`at()` blocks up to `sf_append_deadline_millis`, then throws | Same | -| Survives JVM exit | No | Yes (recovered on next startup; orphans optionally drained by another sender) | -| Reconnect retries | Yes | Yes | - -## flush() contract -- Encodes accumulated rows into the cursor engine. -- Returns when data is **published into the engine** (in-RAM for memory mode, on-disk for SF). **Never** waits for server ACK — ACKs are asynchronous and not every flush correlates to one. -- The I/O loop drains in the background and retries on reconnect until either ACK or the cap forces backpressure → hard error to the producer. - -## close() contract -- One knob: `close_flush_timeout_millis`. - - **Default `5000`**: close() blocks waiting for `engine.ackedFsn() >= engine.publishedFsn()` (server ACK'd everything published) for up to 5 s, then logs WARN and proceeds with stop. - - **`0` or `-1`**: close() does not flush at all — fast exit. Pending data is lost (memory mode) or recovered by next sender (SF mode). - - Any other positive value: that timeout in millis. - -## Reconnect policy (both modes) -- I/O loop catches any wire error (send fail, recv fail, server close, ACK timeout). Logs WARN and enters reconnect. -- Backoff: exponential with jitter. Reuse `LineSenderBuilder.maxBackoffMillis` (initial 100 ms, cap as configured). -- **Budget: `reconnect_max_duration_millis`** — per-outage time cap (resets on each successful reconnect). Once total elapsed time since the first failure of *this* outage exceeds the cap, the I/O loop gives up. - - **Default 300_000 ms (5 min).** Long enough to ride out most server restarts and brief outages where the cause needs investigation; short enough that a permanently-gone server surfaces within minutes. -- **Auth failure on reconnect (401, 403, non-101 upgrade reject) is terminal** — don't burn the retry budget on errors that won't fix themselves. -- On successful reconnect: I/O loop restarts `nextWireSeq=0`, sets `fsnAtZero = engine.ackedFsn() + 1`, walks segments forward from there, and replays. Producer thread is signaled (volatile counter bump) so the next encoded batch carries full schema definitions instead of refs. -- On budget exhaustion: connection error recorded → next user-thread API call throws. - -### Initial connect -- **Default: terminal.** Initial-connect failures (DNS, refused, bad auth, version mismatch) usually mean misconfig; throw immediately so the user sees the error, not a 5-minute hang. -- **Opt-in: `initial_connect_retry=true`** uses the same backoff + `reconnect_max_duration_millis` cap as reconnect. Useful for "publisher comes up before server" scenarios (k8s ordering, dev environments). - -### Logging cadence -- WARN at first failure of an outage: `"disconnected from , reconnecting"`. -- WARN throttled to once per `BACKPRESSURE_LOG_THROTTLE_NANOS` (5 s) during the retry storm — not one per backoff sleep, otherwise a 5-min outage at 100 ms backoff = 3000 lines. -- INFO on each successful reconnect: `"reconnected to after , attempts"`. -- ERROR on budget exhaustion: `"giving up reconnecting to after , attempts"`. - -## Backpressure semantics -- Engine cap full → `appendBlocking` spins for `sf_append_deadline_millis` (default 30 s) → throws. -- Error message must distinguish: - - `"backpressured for Xms — wire path is not draining (server slow?)"` (engine published, but server hasn't ACKed) - - `"backpressured for Xms — Y reconnect attempts in progress (server unreachable since Z)"` (the I/O loop is in retry-backoff) - -## Schema state on reconnect -- Single volatile counter, single writer (I/O thread), shared across two roles: - ```java - private volatile long connectionGeneration; // bumped by I/O loop on every successful reconnect AND on initial recovery from disk - ``` -- Producer's `flushPendingRows` does: - ```java - int retries = 0; - while (true) { - long genBefore = connectionGeneration; - if (genBefore != lastSeenGeneration) { - resetSchemaStateForNewConnection(); - lastSeenGeneration = genBefore; - } - encoder.beginMessage(...); /* encode all tables */ - int messageSize = encoder.finishMessage(); - if (connectionGeneration == genBefore) break; // common case - if (++retries >= MAX_SCHEMA_RACE_RETRIES /* =10 */) throw new LineSenderException("schema-reset race exceeded retry limit"); - // gen advanced mid-encode → bytes are poisoned, discard + loop. - // Table buffers are NOT reset until after this loop, so source rows are intact. - } - ``` -- **On initial open with on-disk recovery** (SF mode, non-empty slot): `connectionGeneration` starts at 1, not 0. Recovered FSNs were never seen by *this* server connection, so the first batch must publish full schemas. - -## Slot directory model - -**`sf_dir` is a parent (group root)**, not a slot. The actual slot is `//`. - -### Identity -- **`sender_id` defaults to `"default"`.** Single-sender users get zero-config: their slot is `/default/`. -- **Multi-sender users must set `sender_id` explicitly.** Two senders trying to use the default name will collide on the lock — surfaced loudly as `"sf slot already in use by PID X"`. -- The slot dir holds segments + `.lock` (advisory exclusive `FileChannel.tryLock`). -- Lock released on `engine.close()` or OS-level process exit (kernel releases `fcntl`/`LockFileEx` locks automatically on crash). - -### Foreground sender -- Locks `//.lock`. -- Recovers segments via `SegmentRing.openExisting`. Recovery is per-slot, in baseSeq order — preserves publishing order trivially. -- Seeds `SegmentManager.fileGeneration` to `max(existing sf-.sfa hex) + 1` to avoid filename collisions with recovered files. - -### Background drainers (orphan adoption) -- **Opt-in: `drain_orphans=true`** (default false). -- At foreground sender startup, scan `/*/` for sibling slots that are (a) unlocked and (b) contain unacked segments. -- For each orphan, spawn a background drainer: - - Locks the orphan's `.lock` - - Opens its own `WebSocketClient` (separate connection from the foreground sender) - - Recovers segments, drains them in baseSeq order - - Releases lock and exits when the slot is fully ACK'd and empty -- **Drain-only**: no user appends, no public API for writing. -- **Cap concurrent drainers: `max_background_drainers=4`** (default). Excess orphans are queued and started as earlier drainers finish. -- **Drain failure policy**: drainer's reconnect cap exhausts, or auth fails, or segments are corrupt → drainer drops a `.failed` sentinel in the slot, releases the lock, exits. Future foreground startups skip slots with `.failed` until the user clears the sentinel manually. Bounded automatic retry, then human-in-the-loop. -- **No automatic cleanup of empty slot dirs.** Goal is data preservation; only ACK'd data is trimmed (within a slot, by the segment manager). Empty slot dirs are cheap and stay forever unless the user removes them. - -### Visibility -- Three WS-only counter accessors on `QwpWebSocketSender`: - - `getActiveBackgroundDrainers()` — current count of running drainers - - `getTotalBackgroundDrainersSucceeded()` — cumulative since startup - - `getTotalBackgroundDrainersFailed()` — cumulative since startup -- Per-drainer event observation goes through the existing - `BackgroundDrainerListener` callback. The pool's `.failed` sentinels - on disk remain the canonical record of giveup events; the three - counters are for dashboards and post-startup health checks. - -### Per-sender threading cost -- Each engine (foreground + each background drainer) has its own `SegmentManager`. That's 1 manager thread + 1 I/O thread per engine. With `max_background_drainers=4`, worst case is 1 (foreground) + 4 (drainers) = 5 engines = 10 threads + 5 sockets per `Sender.fromConfig` call. Acceptable for typical deployments; users with hundreds of senders per JVM should set `max_background_drainers` low. - -## Configuration knobs (connect string) -| Key | Default | Mode | Status | -|---|---|---|---| -| `sf_dir` | unset | both | existing (semantics: now a parent dir) | -| `sender_id` | `"default"` | SF | **NEW** | -| `sf_max_bytes` | 4 MiB | both | existing | -| `sf_max_total_bytes` | 128 MiB / 10 GiB | both | existing | -| `sf_durability` | `memory` | SF | existing (`flush`/`append` reserved) | -| `sf_append_deadline_millis` | 30000 | both | **NEW** (currently a constant) | -| `reconnect_max_duration_millis` | 300000 | both | **NEW** | -| `reconnect_initial_backoff_millis` | 100 | both | **NEW** | -| `max_backoff_millis` | already exists | both | reuse existing | -| `initial_connect_retry` | `false` | both | **NEW** | -| `close_flush_timeout_millis` | 5000 (0/-1 = fast close) | both | **NEW** | -| `drain_orphans` | `false` | SF | **NEW** | -| `max_background_drainers` | 4 | SF | **NEW** | - -Each new knob also gets a `LineSenderBuilder` setter. - -## Counter accessors (WS-only, on QwpWebSocketSender) -- `getTotalBackpressureStalls()` -- `getTotalReconnectAttempts()` -- `getTotalReconnectsSucceeded()` -- `getTotalFramesReplayed()` -- `getActiveBackgroundDrainers()` -- `getTotalBackgroundDrainersSucceeded()` -- `getTotalBackgroundDrainersFailed()` - -## Stated assumptions (server contract) -- Server **dedups** replayed batches by `messageSequence`. Replay-after-reconnect produces duplicates; without server-side dedup, every reconnect = double-write. Legacy code already relied on this; the new design continues to. -- Server's dedup window must be ≥ a sender's `sf_max_total_bytes` worth of FSNs (else replay = double-write under sustained outage + full cap). -- Coordination/testing of the recovery + dedup contract is **outside this repo's scope**. - -## Self-sufficient frames (locked 2026-04-27) -Every frame written through the cursor SF path **must carry its full schema definition and the complete symbol-dictionary delta from id 0**. No schema-by-id refs, no incremental delta-dicts. The bytes survive process restart and replay against fresh server connections (post-reconnect, post-restart, drainer adopting an orphan slot) — frames with refs to IDs the new server has never seen are unrecoverable. Costs more bytes per batch; pays for replay correctness across every recovery path. Producer-side `maxSentSchemaId` / `maxSentSymbolId` retention is treated as a no-op for the cursor path; the encode call always passes `confirmedMaxId=-1` and `useSchemaRef=false`. - -## Decisions locked -1. ✅ flush() never waits for ACK (ACKs are async). -2. ✅ Reconnect cap is per-outage time-based, default 300s. -3. ✅ close() drains by default with 5s timeout; `close_flush_timeout_millis=0|-1` opts out for fast close. -4. ✅ Schema-reset is also fired on disk recovery (recovered state == post-reconnect state). -5. ✅ Encode-mid-reconnect race closed via single volatile `connectionGeneration` counter + retry loop in `flushPendingRows`. -6. ✅ Slot dir model: `sf_dir` is parent; per-sender slots `//`; default `sender_id="default"`. -7. ✅ Orphan adoption is opt-in (`drain_orphans=true`); foreground sender spawns background drainers per orphan, capped at `max_background_drainers`. -8. ✅ Drain failure → `.failed` sentinel; bounded retry + human-in-the-loop. -9. ✅ Initial connect terminal by default; opt-in retry via `initial_connect_retry=true`. -10. ✅ Auth failures (401/403/non-101) terminal even on reconnect. -11. ✅ Logging: WARN on outage entry/exit-attempt, INFO on reconnect success, ERROR on budget exhaustion; throttled. -12. ✅ Counters and orphan-drainer visibility on `QwpWebSocketSender` (WS-only). -13. ✅ No automatic cleanup of empty slot dirs — preserve goal of data-loss reduction. -14. ✅ Frames on disk are self-sufficient — every frame carries its full schema + full symbol-dict delta from id 0; refs forbidden. - -## Open -None. Ready to implement. diff --git a/design/qwp-cursor-error-api-todo.md b/design/qwp-cursor-error-api-todo.md deleted file mode 100644 index 82e42f4c..00000000 --- a/design/qwp-cursor-error-api-todo.md +++ /dev/null @@ -1,234 +0,0 @@ -# Cursor SF — server error API: implementation plan - -Branch: `vi_sf` (continues off the cursor SF work). -Spec: `design/qwp-cursor-error-api.md` (decisions 1–14 locked). -Depends on: `qwp-cursor-durability.md` (the SF substrate this builds on). - -## Shipped on `vi_sf` - -| Step | Status | Notes | -|---|---|---| -| 1. Public types | ✅ | `SenderError`, `SenderErrorHandler`, `LineSenderServerException` (all in `io.questdb.client`); 11 unit tests in `SenderErrorTest`. | -| 2. Typed terminal-error stash | ✅ | Sibling `volatile SenderError lastTerminalServerError` on `CursorWebSocketSendLoop`; `recordFatal(Throwable, SenderError)` overload; `getLastTerminalServerError()` on the loop, `getLastTerminalError()` on `QwpWebSocketSender`. | -| 3. Wire-byte classification + DROP/HALT branches | ✅ | `classify()`, `defaultPolicyFor()`, `handleServerRejection()` in `CursorWebSocketSendLoop`; HALT routes through typed `LineSenderServerException`, DROP advances `engine.acknowledge` and continues. 12 tests in `CursorWebSocketSendLoopErrorClassificationTest`. | -| 4. WS close-frame routing | ✅ | `isTerminalCloseCode()` splits PROTOCOL_ERROR/UNSUPPORTED_DATA/INVALID_PAYLOAD_DATA/POLICY_VIOLATION/MESSAGE_TOO_BIG/MANDATORY_EXTENSION as terminal `PROTOCOL_VIOLATION`; reconnect-eligible codes preserve existing `fail()` retry. Auth-terminal upgrade and reconnect-budget exhaustion now stash typed `SenderError` payloads. | -| 5. Bounded inbox + dispatcher daemon | ✅ | `SenderErrorDispatcher` (lazy-start daemon, bounded `ArrayBlockingQueue`, idempotent close, drained handler exceptions). 11 tests in `SenderErrorDispatcherTest`. | -| 6. Default error handler | ✅ | `DefaultSenderErrorHandler.INSTANCE` — ERROR for HALT, WARN for DROP, full structured payload in the log line. | -| 7. Builder + connect-string knobs | ✅ (partial) | Builder: `errorHandler(SenderErrorHandler)`, `errorInboxCapacity(int)` — both gated to WebSocket. Connect string: `error_inbox_capacity=N`. **Per-category policy override (`errorPolicy(Category, Policy)`, `errorPolicyResolver(...)`, `on_*_error` keys) deferred — see § Deferred follow-ups.** 9 tests in `SenderBuilderErrorApiTest`. | -| 8. New `Sender` API | ✅ (partial) | `flushAndGetSequence(): long`, `getLastTerminalError()`, `getTotalServerErrors()`, `getDroppedErrorNotifications()`, `getTotalErrorNotificationsDelivered()`. **`resumeAfterHalt()` deferred** — the I/O loop is one-shot today; restart primitive is non-trivial. Workaround: close + rebuild the sender. | -| 9. End-to-end per-category integration tests | ⏭️ deferred | Lands in the `questdb` repo (`TestWebSocketServer` doesn't parse QWP wire format, so it cannot be scripted to emit category-specific frames in this repo without significant fixture work). | -| 10. `tableName` wiring | ✅ | Best-effort: populates `tableName` from `response.tableNames` when single-table; null otherwise. Today the response parser does not populate `tableNames` on error frames (only on STATUS_OK), so `tableName` is null on error frames until both client parser and server are extended. The wiring is forward-compatible. | -| 11. Docs | this doc | Spec + this implementation log. README/javadoc updates pending. | - -Test totals on `vi_sf`: 154 non-mmap tests pass on linux x86_64. (`Files.mmap0` UnsatisfiedLinkError on linux — pre-existing, repo only ships macOS-aarch64 native lib. The mmap-dependent tests will run green on macOS / when the linux native lib is added.) - -## Deferred follow-ups (not blocking) - -1. **Per-category policy override** (`errorPolicy(Category, Policy)` + `errorPolicyResolver(...)`). Spec § "User overrides — one knob, two grains" describes the resolver composition (programmatic resolver > per-category map > global default). Today every category uses `defaultPolicyFor` baked into the loop. The most-asked variant — strict-mode `on_server_error=halt` — needs the connect-string parser side too. Moderate-sized addition; fits in a focused commit. -2. **`resumeAfterHalt()` escape hatch.** The cursor I/O loop today is one-shot (`running` is volatile boolean, no restart primitive). To resume, the loop needs: clear `lastError` / `lastTerminalServerError`, reopen the wire client via the reconnect factory, restart the thread. Today's workaround: close + rebuild the sender; SF data on disk survives. Document that. -3. **End-to-end integration tests in the `questdb` repo.** Use a real `ServerMain` to drive each `STATUS_*` byte against this client, asserting category, policy, FSN span, callback delivery, and producer-thread typed throw. -4. **Server-side gaps tracked in the spec § "Server-side follow-ups"**: split `0x06`/`0x09` for retry semantics, add retryable bit, per-table attribution. Each unblocks a corresponding client follow-up — e.g. retryable bit unblocks `RETRY_TRANSIENT` policy and full strict-ETL semantics. -5. **README + public Javadoc.** Document the new connect-string keys, builder methods, and accessor surface. The spec is locked but user-facing docs aren't yet. - -## Context - -The cursor SF send loop today (`CursorWebSocketSendLoop.ResponseHandler.onBinaryMessage`, line 712 onward) classifies inbound frames as `STATUS_OK` (advance ackedFsn) vs everything-else (always terminal via `recordFatal`). The "everything-else" branch is what we're refining: classify by status byte → category, resolve policy, surface to user via callback (async) and / or typed exception (next API call). - -Wire codes already exist (`WebSocketResponse.java:74-83`, `WebSocketResponse.getStatusName()`). Nothing new on the wire. - -## Discrete deliverables - -### 1. Public API surfaces (do first, in isolation) -New types in `core/src/main/java/io/questdb/client/`: -- `SenderError.java` — immutable, public. Fields per spec § "SenderError". Include `Category` and `Policy` as nested public enums. -- `SenderErrorHandler.java` — `@FunctionalInterface` with `void onError(SenderError)`. -- `LineSenderServerException.java` — `extends LineSenderException`. Single field `SenderError serverError`; `getServerError()` accessor; `getMessage()` synthesizes from category + FSN span + serverMessage. - -These are leaf types — write them and their unit tests first; nothing else depends on internals. - -### 2. Typed terminal-error stash on the I/O loop -**Note:** the `connectionGeneration` field described in `qwp-cursor-durability.md` is an idealization — it didn't ship. The actual code already has the producer-side latch infrastructure: -- `CursorWebSocketSendLoop.lastError` (`volatile Throwable`, line 122) — terminal error, set by `recordFatal(...)`. -- `QwpWebSocketSender.connectionError` (`AtomicReference`, line 119) — connection-level latch. -- `QwpWebSocketSender.checkConnectionError()` (line 1417) polls both on every public API entry. - -So the cache-line / `@Contended` extraction is unnecessary — the volatile that the producer thread already reads on every API call is the latch we need. What's left: - -- Add `private volatile SenderError lastTerminalServerError` on `CursorWebSocketSendLoop`, sibling to `lastError`. Null in steady state. -- Overload `recordFatal(Throwable t)` → `recordFatal(Throwable t, SenderError serverError)`. Existing callers (wire-level failures) call the original signature with implicit `null`. Server-rejection callers (deliverable #3) pass the `SenderError`. Idempotent — only the first failure wins. -- Add `public SenderError getLastTerminalServerError()` accessor on the loop. -- Add `public SenderError getLastTerminalError()` on `QwpWebSocketSender`, delegating to the loop (with the standard `cursorSendLoop == null ? null` guard used by other accessors). - -That's the whole change for #2. The producer-thread typed throw lands automatically once #3 starts stuffing `LineSenderServerException` (which extends `LineSenderException`) into `lastError` — `checkError()` already throws whatever `lastError` is; user code can `instanceof LineSenderServerException` to unpack the typed payload. - -### 3. Error frame classification (`CursorWebSocketSendLoop.ResponseHandler.onBinaryMessage`) -Replace the current `else` branch (lines ~734-751) with classification: -```java -SenderError.Category category = classify(response.getStatus()); // wire byte → enum -SenderError.Policy policy = policyResolver.resolve(category); // user override > per-cat > default -String tableName = response.getTableEntryCount() == 1 - ? response.getTableName(0) - : null; -long fromFsn = fsnAtZero + Math.max(0, response.getSequence()); // single-frame span today -long toFsn = fromFsn; -SenderError err = new SenderError(category, policy, response.getStatus(), - response.getErrorMessage(), response.getSequence(), - fromFsn, toFsn, tableName, System.nanoTime()); -totalServerErrors.incrementAndGet(); -lastTerminalError = (policy == HALT) ? err : lastTerminalError; - -if (policy == HALT) { - signal.terminalError = err; // memory-ordered write before inbox offer - errorInbox.offer(err); // non-blocking; drop+count if full - recordFatal(new LineSenderServerException(err)); // breaks the loop; existing path -} else { // DROP_AND_CONTINUE - errorInbox.offer(err); - engine.acknowledge(fromFsn); // advance past the rejected span - totalAcks.incrementAndGet(); // for parity with success path counters -} -``` -- Keep the success path untouched. -- Verify `WebSocketResponse` already exposes the error message after parsing a non-OK status (the `errorMessage` field is read by `getErrorMessage()` — confirm parser populates it on the error path). -- `STATUS_DURABLE_ACK` (0x02) handling stays as-is; it is not an error. - -Helper: -```java -private static SenderError.Category classify(byte status) { - switch (status) { - case STATUS_SCHEMA_MISMATCH: return Category.SCHEMA_MISMATCH; - case STATUS_PARSE_ERROR: return Category.PARSE_ERROR; - case STATUS_INTERNAL_ERROR: return Category.INTERNAL_ERROR; - case STATUS_SECURITY_ERROR: return Category.SECURITY_ERROR; - case STATUS_WRITE_ERROR: return Category.WRITE_ERROR; - default: return Category.UNKNOWN; - } -} -``` - -### 4. WS close-frame routing -`ResponseHandler.onClose(int code, String reason)` (line 708) currently builds a `LineSenderException` directly and calls `fail(...)` → reconnect. Two cases: -- **Reconnect-eligible close** (server idle close, network blip): keep existing behavior — `fail(...)` enters reconnect loop. -- **Terminal close** (PROTOCOL_ERROR 1002, UNSUPPORTED_DATA 1003, MESSAGE_TOO_BIG 1009, policy violation 1008, custom server reason that asserts terminal): build a `SenderError(category=PROTOCOL_VIOLATION, status=-1, seq=-1, message="ws-close[]: " + reason, fsn=ackedFsn+1..publishedFsn, tableName=null, policy=HALT)`, write `signal.terminalError`, inbox, then `recordFatal`. - -Decision boundary between the two: the existing reconnect logic already differentiates terminal codes (see auth-terminal handling in commit `8828038`). Mirror that taxonomy here — anything currently treated as terminal becomes a `PROTOCOL_VIOLATION` with the same FSN span. - -### 5. Bounded inbox + dispatcher daemon -- Implement as `ArrayBlockingQueue` for v1 (single producer = I/O thread; single consumer = dispatcher; capacity from builder). Project idiom prefers `QwpSpscQueue` — use it if a generic version exists, else `ArrayBlockingQueue` is fine for the off-hot-path side channel. -- Dispatcher thread: lazy-start on first `inbox.offer` success. Daemon, named `qwp-error-dispatcher-`. Loop: `take()` → `try { handler.onError(err); } catch (Throwable t) { LOG.error(...); }`. Stops when `engine.close()` interrupts it; drains remaining queue entries on stop with a short deadline (~100ms) before giving up. -- Overflow handling on `offer`: returns false; I/O thread bumps `droppedErrorNotifications` and continues. Never block. - -### 6. Default error handler -```java -class DefaultErrorHandler implements SenderErrorHandler { - public void onError(SenderError e) { - LogRecord r = (e.appliedPolicy == HALT) ? LOG.error() : LOG.advisory(); - r.$("server error: ").$(e.category) - .$(" status=0x").$hex(e.serverStatusByte) - .$(" fsn=[").$(e.fromFsn).$(',').$(e.toFsn).$(']') - .$(" table=").$(e.tableName != null ? e.tableName : "(multi)") - .$(" msg=").$(e.serverMessage) - .$(); - } -} -``` -Wire as the default if the user does not call `errorHandler(...)` on the builder. Match the project's logging idioms (use `LogFactory.getLog`, etc). - -### 7. Builder + connect-string knobs -- `LineSenderBuilder.errorHandler(SenderErrorHandler)`, `errorPolicy(Category, Policy)`, `errorPolicyResolver(...)`, `errorInboxCapacity(int)`. -- Connect-string parser additions in `Sender.fromConfig` / `LineSenderBuilder.fromConfig`: - - `on_server_error` (auto/halt/drop) - - `on_schema_error`, `on_parse_error`, `on_internal_error`, `on_security_error`, `on_write_error` (halt/drop) - - `error_inbox_capacity` (int) -- Internal `PolicyResolver`: composes user resolver (highest) → per-category map → global → per-spec defaults. Single method `Policy resolve(Category)`. - -### 8. New public API methods on `Sender` / `QwpWebSocketSender` -- `Sender.flushAndGetSequence(): long` — returns `engine.publishedFsn()` after the publish, before returning. The existing `flush()` keeps `void` return — call the new method internally or have `flush()` discard the return. -- `Sender.resumeAfterHalt()` — only meaningful on QWP WS sender; default impl on `Sender` interface throws `UnsupportedOperationException("only WS senders support resumeAfterHalt")`. Implementation: - ```java - signal.terminalError = null; - loop.requestReconnect(); // existing primitive used by reconnect path - LOG.warn("resumeAfterHalt: clearing terminal error and restarting I/O loop"); - ``` -- WS-only accessors on `QwpWebSocketSender`: `getTotalServerErrors()`, `getDroppedErrorNotifications()`, `getLastTerminalError()`. Match the existing accessor style (see § "Counter accessors" in `qwp-cursor-durability.md`). - -### 9. Tests (mirror existing `io.questdb.client.test.cutlass.qwp.client.**` layout) - -Per category: -- `ServerErrorSchemaMismatchTest` — `TestWebSocketServer` is augmented to send a `STATUS_SCHEMA_MISMATCH` frame; assert callback fires, FSN span correct, ackedFsn advances (DROP), `flush()` does NOT throw, error counter increments. -- `ServerErrorParseErrorTest` — same with `STATUS_PARSE_ERROR`; assert HALT, terminal latched, next `flush()` throws `LineSenderServerException` with correct `getServerError()`. -- `ServerErrorInternalErrorTest`, `ServerErrorSecurityErrorTest`, `ServerErrorWriteErrorTest` — similar. -- `ServerErrorUnknownStatusTest` — server sends 0xFF; assert `Category.UNKNOWN` + HALT. -- `ServerErrorWsCloseTest` — server sends WS close 1002; assert `Category.PROTOCOL_VIOLATION`, FSN span = unacked window. - -Behavioral: -- `ErrorPolicyOverrideTest` — connect string `on_schema_error=halt` flips SCHEMA_MISMATCH default; assert HALT. -- `ErrorPolicyResolverTest` — programmatic resolver returns DROP for everything; assert no terminal latch even on PARSE_ERROR. -- `ErrorInboxOverflowTest` — slow handler + flood of errors; assert `droppedErrorNotifications > 0`, no I/O thread stall. -- `ResumeAfterHaltTest` — induce HALT, call `resumeAfterHalt()`, send fresh batch, assert it lands. -- `FlushAndGetSequenceTest` — assert returned FSN matches the FSN span surfaced in a synthesized rejection. - -Hot-path: -- `ErrorPathHotPathBenchmark` (JMH, sibling of `QwpIngressLatencyBenchmark`) — measure per-batch publish latency with no errors before/after the change. Target: zero measurable regression. - -Concurrency: -- `ErrorRaceTest` — fire HALT and a producer `flush()` simultaneously, repeat 10k times, assert: producer always sees the latch, never observes "callback fired but flush passed" or vice versa. - -### 10. Wire `SenderError.tableName` from existing response state -`WebSocketResponse` already carries `tableNames` (list, see line 224 area). When the response has exactly 1 entry, we have a single-table batch; pass it as `tableName`. Multi-entry → null per spec. Verify the parser populates `tableNames` even on error frames (it might only populate on `STATUS_OK` today — if so, that's a server-side gap and `tableName` will always be null on the error path until both sides extend it). - -### 11. README / public-API docs -- Connect-string reference table needs the new keys. -- New `LineSenderBuilder` setters documented. -- Worked example in javadoc of `SenderErrorHandler`: dead-letter to file from an error callback. - -## Order of work - -Recommended sequence (each step compiles + tests pass independently): - -1. Public types (#1) — pure leaves, no risk. -2. ProducerSignal refactor (#2) — internal, behavior-preserving. -3. Default handler + dispatcher + inbox (#5, #6) — wire as plumbing; not yet hooked. -4. Classification + DROP/HALT branches in `ResponseHandler.onBinaryMessage` (#3) — flips behavior. -5. WS close routing (#4). -6. Builder + connect-string knobs (#7). -7. Public methods on `Sender` (#8). -8. Tests (#9), per category as you implement. -9. `tableName` wiring (#10) — last, depends on parser audit. -10. Docs (#11). - -## How to run things - -```bash -# QWP-only suite (fast, ~30s) -mvn -pl core test -Dtest='io.questdb.client.test.cutlass.qwp.client.**' - -# Single test -mvn -pl core test -Dtest=ServerErrorSchemaMismatchTest - -# Full core suite (run before merge) -mvn -pl core test - -# Hot-path benchmark -mvn -pl core test -Dtest=ErrorPathHotPathBenchmark -``` - -## Files to know - -Existing: -- `core/src/main/java/io/questdb/client/cutlass/qwp/client/WebSocketResponse.java` — status-byte constants, error frame parser (`readFrom`, `getStatusName`, `getErrorMessage`, `getSequence`). -- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java` — I/O thread, ResponseHandler at line 706, current terminal-on-error path at line 734. -- `core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java` — the Sender impl. Holds `connectionGeneration`, `flushPendingRows` is the producer entry point. -- `core/src/main/java/io/questdb/client/Sender.java` — top-level interface + `LineSenderBuilder` + connect-string parser. -- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java` — `engine.acknowledge(fsn)` is the trim hook used by DROP path. - -New (per #1): -- `core/src/main/java/io/questdb/client/SenderError.java` -- `core/src/main/java/io/questdb/client/SenderErrorHandler.java` -- `core/src/main/java/io/questdb/client/LineSenderServerException.java` - -## Notes on the testing environment - -`TestWebSocketServer` (in-process, hand-rolled) does NOT parse QWP wire format — it sees opaque binary frames. To test server error frames we need to extend it with a small "responder" hook: `setNextResponse(byte status, long seq, String msg)` that builds a synthetic error frame and sends it on the next inbound batch. Match the binary layout from `WebSocketResponse.readFrom` (line 256 onward). One such helper covers all category tests. - -## Open -None. Ready to implement step 1. diff --git a/design/qwp-cursor-error-api.md b/design/qwp-cursor-error-api.md deleted file mode 100644 index eae99bc0..00000000 --- a/design/qwp-cursor-error-api.md +++ /dev/null @@ -1,220 +0,0 @@ -# QWP cursor SF — server error API spec - -Status: **draft v1**, follow-on to `qwp-cursor-durability.md`. Targets branch `vi_sf`. - -## Goals -- **Surface server-side rejections** (schema mismatch, parse, security, write, internal) to user code without compromising the async `flush()` contract. -- **Match the wire**: client categories align 1:1 with the stable status bytes already shipped by the server (`WebSocketResponse` + `QwpProcessorState` mapping). No client-side category the wire can't actually distinguish. -- **Zero hot-path cost** in the no-error case. One volatile load per batch boundary, no allocations, no locks. -- **Two surfacing paths**: builder-registered `errorHandler` for async dead-lettering, typed exception on next API call for connect-string-only users. Both deliver the same `SenderError` payload. -- **Loud defaults** — silence is forbidden. The default handler logs ERROR for HALT and WARN for DROP, with category + FSN span + table. - -## Non-goals (this spec) -- Retryable / transient distinction. Server does not ship a retryable bit today; everything potentially transient is folded into `STATUS_INTERNAL_ERROR (0x06)` / `STATUS_WRITE_ERROR (0x09)`. The `RETRY_TRANSIENT` policy is reserved but not implemented; revisit when the server splits codes. -- Per-table attribution in multi-table batches. Server NACKs the whole batch atomically; `tableName` is best-effort and may be null. -- Per-row attribution (which row in the batch was bad). Out of scope until the wire format grows a row index field. - -## Wire anchor (server-side, already shipped) -Server error frame layout (binary, **not** a WS close frame): -``` -1 byte status -8 byte messageSequence (LE) — server's per-frame counter, mirrored back -2 byte message length (LE) -≤1024 byte UTF-8 message -``` -Source: `QwpWebSocketUpgradeProcessor.java:895-956` (server repo). - -Stable status bytes (`WebSocketResponse.java:74-83`, mirrored from server `QwpConstants.java:174-190`): - -| Code | Constant | Server triggers | -|---|---|---| -| 0x00 | `STATUS_OK` | accepted | -| 0x02 | `STATUS_DURABLE_ACK` | post-fsync ack (per-table) | -| 0x03 | `STATUS_SCHEMA_MISMATCH` | `QwpParseException.SCHEMA_MISMATCH` | -| 0x05 | `STATUS_PARSE_ERROR` | other `QwpParseException` | -| 0x06 | `STATUS_INTERNAL_ERROR` | `CairoException.isCritical()` + catch-all `Throwable` | -| 0x08 | `STATUS_SECURITY_ERROR` | `CairoException.isAuthorizationError()` | -| 0x09 | `STATUS_WRITE_ERROR` | non-critical Cairo errors / table not accepting writes | - -WS-level violations (fragmented binary, text frame, oversized payload, malformed header) come as **WebSocket close frames** with codes PROTOCOL_ERROR / UNSUPPORTED_DATA / MESSAGE_TOO_BIG, not QWP error frames. These need to be funnelled into the same surface. - -## Client `Category` enum - -```java -public enum Category { - SCHEMA_MISMATCH, // 0x03 - PARSE_ERROR, // 0x05 — QWP-level malformed payload (likely client bug) - INTERNAL_ERROR, // 0x06 — catch-all server fault; bundles resource/transient - SECURITY_ERROR, // 0x08 — auth / ACL - WRITE_ERROR, // 0x09 — table not accepting writes; bundles rate-limit-style - PROTOCOL_VIOLATION, // n/a — WS-level close frame - UNKNOWN // forward-compat for any new server status byte -} -``` - -Forward-compat: unknown bytes map to `UNKNOWN`, the raw byte is preserved on `SenderError.serverStatusByte` for debugging. - -## `Policy` enum - -```java -public enum Policy { - DROP_AND_CONTINUE, // ackedFsn advances past the bad span; loop keeps draining - HALT // terminalError latched; next producer API call throws -} -``` - -`RETRY_TRANSIENT` is **not** implemented — the wire has no retryable bit to drive it. The enum is binary today; expand later. - -## Default category → policy - -| Category | Default | Reasoning | -|---|---|---| -| SCHEMA_MISMATCH | DROP_AND_CONTINUE | Replay reproduces the same rejection; halting blocks unrelated tables on the same connection. | -| PARSE_ERROR | HALT | Almost certainly a client bug (we sent malformed bytes). Halt preserves the on-disk frames for postmortem. | -| INTERNAL_ERROR | HALT | Catch-all server fault; conservatively halt — could be transient, could be poison. Without a retryable bit we cannot tell. | -| SECURITY_ERROR | HALT | Misconfig; loud failure wanted. | -| WRITE_ERROR | DROP_AND_CONTINUE | "Non-critical Cairo errors / table not accepting writes" — per-batch in character. Halting blocks other tables. **Debatable; revisit once server splits 0x09 into transient vs permanent.** | -| PROTOCOL_VIOLATION | HALT (forced) | Connection is gone — no choice. | -| UNKNOWN | HALT | Never silently drop something we don't understand. | - -User overrides via builder (`errorPolicy(Category, Policy)` or full `errorPolicyResolver`) and via connect-string knobs (see below). - -## `SenderError` (public, immutable) - -```java -/** - * @param appliedPolicy what the loop actually did - * @param serverStatusByte raw byte (0x03/0x05/...); -1 for PROTOCOL_VIOLATION - * @param serverMessage ≤1024 UTF-8 from frame, or WS close reason - * @param messageSequence server's per-frame seq (mirrors what server logs); -1 for PROTOCOL_VIOLATION - * @param fromFsn client-side FSN span — load-bearing for correlation - * @param toFsn inclusive - * @param tableName best-effort; null if multi-table batch - * @param detectedAtNanos System.nanoTime() at I/O thread receipt */ -public record SenderError(Category category, Policy appliedPolicy, int serverStatusByte, String serverMessage, - long messageSequence, long fromFsn, long toFsn, String tableName, long detectedAtNanos) { - // accessors only; no mutation -} -``` - -**Load-bearing fields**: `[fromFsn, toFsn]` and `appliedPolicy`. The FSN span is what the user joins to their producer-side log to identify the rejected data. `appliedPolicy` tells the user whether the data was dropped (must dead-letter) or halted (will be re-throw on next call) or — when retry lands — observed only. - -`messageSequence` is preserved for cross-team debugging (server-side ops think in `messageSequence`). - -## Mechanism — surfacing paths - -### Path 1: async callback -- Builder-time `errorHandler(SenderErrorHandler)`. Default impl: ERROR log for HALT, WARN log for DROP, both with `category`, `[fromFsn, toFsn]`, `tableName`, `serverMessage`. Bumps a counter. -- I/O thread, on rejection frame, builds `SenderError` and `errorInbox.offer(err)` on a bounded SPSC queue. -- Bounded inbox: default cap 256. Overflow → drop the notification, bump `droppedErrorNotifications` counter, never block the I/O thread. -- Dispatcher daemon thread (`QwpSender-error-dispatcher-`, lazy-start on first error) does `take()` + invokes user handler; catches `Throwable` so a buggy handler can't poison the dispatcher. - -### Path 2: producer-side typed throw -- Single volatile field on the existing producer-signal object (the one that already holds `connectionGeneration`): - ```java - @Contended - final class ProducerSignal { - volatile long connectionGeneration; // existing - volatile SenderError terminalError; // new - } - ``` -- I/O thread, on a HALT-policy error (or PROTOCOL_VIOLATION, or UNKNOWN), writes `signal.terminalError = err` **before** `errorInbox.offer(err)`. Ordering matters: producer must see the latch no later than the dispatcher delivers, otherwise a `flush()` post-callback could still pass. -- Producer: `flushPendingRows` reads `signal.terminalError` once at batch entry (same cache line as `connectionGeneration` — single load-acquire). If non-null, throws `LineSenderServerException` carrying the `SenderError`. - -### Producer hot path -- Per `at()` / `column*()`: zero change. -- Per batch boundary (`flush()` or implicit batch publish): one volatile load that piggybacks on the existing `connectionGeneration` read. Same cache line. In steady state the line stays in producer L1; the I/O thread does not write to it on the ACK path. - -### I/O thread allocation -- Per ACK (common case): zero change. -- Per rejection: one `SenderError`, one queue node. NACK rate is bounded by batch rate, not row rate, and is rare in steady state. Pooling not justified. - -## WS close frames - -WS-level violations from `WebSocketCloseCode`-style paths (PROTOCOL_ERROR, UNSUPPORTED_DATA, MESSAGE_TOO_BIG, generic close-with-reason) surface as a `SenderError` with: -- `category = PROTOCOL_VIOLATION` -- `serverStatusByte = -1` -- `messageSequence = -1` -- `serverMessage = "ws-close[]: "` or whatever `onClose(code, reason)` was given -- `appliedPolicy = HALT` (always — the connection is gone) -- FSN span = `[engine.ackedFsn() + 1, engine.publishedFsn()]` (the unacked window at close time) - -This routes the existing `ResponseHandler.onClose` through the new sink instead of just calling `fail(...)`. - -## Configuration knobs (connect string) - -| Key | Default | Values | Notes | -|---|---|---|---| -| `on_server_error` | `auto` | `auto` \| `halt` \| `drop` | global default; `auto` uses per-category table | -| `on_schema_error` | `drop` | `halt` \| `drop` | overrides global for SCHEMA_MISMATCH | -| `on_parse_error` | `halt` | `halt` \| `drop` | | -| `on_internal_error` | `halt` | `halt` \| `drop` | | -| `on_security_error` | `halt` | `halt` \| `drop` | | -| `on_write_error` | `drop` | `halt` \| `drop` | | -| `error_inbox_capacity` | `256` | int ≥ 16 | bounded SPSC capacity | - -PROTOCOL_VIOLATION and UNKNOWN are not user-configurable — both forced HALT. - -Per-category knob takes precedence over `on_server_error` if both are set. - -## Builder additions (`LineSenderBuilder`) - -```java -.errorHandler(SenderErrorHandler) // default: log ERROR/WARN + counter -.errorPolicy(Category, Policy) // overrides for one category -.errorPolicyResolver(SenderError -> Policy) // full programmatic control; takes precedence -.errorInboxCapacity(int) -``` - -## Public API surface - -- `SenderError` — public, final, immutable, in `io.questdb.client` package. -- `SenderError.Category`, `SenderError.Policy` — public enums on `SenderError`. -- `SenderErrorHandler` — `@FunctionalInterface` with `void onError(SenderError)`. -- `LineSenderServerException extends LineSenderException` — `getServerError(): SenderError` accessor. -- `Sender.flushAndGetSequence(): long` — returns FSN published; existing `flush()` kept verbatim. The returned FSN is the user's correlation handle for matching against `SenderError.fromFsn`. -- `Sender.resumeAfterHalt()` — opt-in escape hatch: clears `terminalError`, restarts I/O loop reconnect, logs WARN. No auto-resume. -- WS-only counter accessors on `QwpWebSocketSender`: - - `getTotalServerErrors(): long` - - `getDroppedErrorNotifications(): long` - - `getLastTerminalError(): SenderError` (snapshot; null if none). - -## Interaction with existing reconnect / ack paths - -- `CursorWebSocketSendLoop.ResponseHandler.onBinaryMessage` (line 712 onward, current branch): currently routes any non-`STATUS_OK` to `recordFatal(...)`, always terminal. New behavior: classify by status byte → category, resolve policy, build `SenderError`, then either: - - `DROP_AND_CONTINUE`: call `engine.acknowledge(fsnAtZero + wireSeq)` to advance past the bad span (the server already rejected it; we're not going to land it), inbox the error, continue. - - `HALT`: write `terminalError`, inbox the error, then call `recordFatal(...)` to break the loop. The `LineSenderException` raised by `recordFatal` carries the `SenderError` via `LineSenderServerException`. -- `STATUS_DURABLE_ACK` (0x02) is unchanged — it's an upload-confirmation, not an error, and the existing handler already keeps it separate. -- Reconnect budget exhaustion remains terminal (existing behavior). Surfaces as a synthesized `SenderError` with `category = PROTOCOL_VIOLATION` and FSN span = unacked window at giveup time. -- Auth-terminal on reconnect (existing) is preserved as `category = SECURITY_ERROR` for consistency. - -## DROP_AND_CONTINUE: what about the disk? - -When the loop drops a rejected batch, the on-disk segment for that FSN range becomes garbage from the server's perspective — but the bytes are still there. Trim happens via the existing `engine.acknowledge(...)` → `SegmentManager.trim` path. Calling `acknowledge` with the rejected wireSeq advances `ackedFsn` past the bad batch, which trims it from disk on the next maintenance pass. - -This means the dropped bytes are **lost forever** from the sender's perspective. The user must dead-letter via `errorHandler` if they want a record. This is by design: SF preserves data until the server acks; once the server has explicitly rejected, the data is no longer the sender's responsibility. - -## Decisions locked -1. ✅ 6 wire-aligned categories + `PROTOCOL_VIOLATION` + `UNKNOWN`. No abstracted-up category not distinguishable on the wire. -2. ✅ Two policies only: `DROP_AND_CONTINUE`, `HALT`. `RETRY_TRANSIENT` reserved for post-server-split. -3. ✅ Defaults per the table above. WRITE_ERROR is DROP (debatable; revisit when server splits). -4. ✅ `SenderError` is public API, immutable, carries both `messageSequence` and `[fromFsn, toFsn]`. -5. ✅ Multi-table batches: `tableName` may be null; user correlates via FSN span. -6. ✅ WS close frames surface as `PROTOCOL_VIOLATION` with `serverStatusByte = -1`, `messageSequence = -1`, always HALT. -7. ✅ Connect string carries policy knobs + inbox capacity. Callbacks require builder. Typed exception covers connect-string-only users. -8. ✅ Producer hot path: zero allocations, one volatile load per batch (piggybacks `connectionGeneration` cache line). -9. ✅ I/O thread never invokes user code. Bounded inbox + lazy-start dispatcher daemon. Inbox overflow drops + counts. -10. ✅ Default handler is loud (ERROR for HALT, WARN for DROP). Silence forbidden. -11. ✅ Counters and `getLastTerminalError()` accessor for ops visibility. -12. ✅ `resumeAfterHalt()` is opt-in escape hatch; never auto-resume. -13. ✅ `DROP_AND_CONTINUE` advances `ackedFsn` past the rejected span; data is dropped from disk via existing trim path. -14. ✅ `flush()` signature unchanged. New `flushAndGetSequence()` returns FSN for user-side correlation. - -## Server-side follow-ups (track separately, not blocking client work) -1. Split `0x06` and `0x09` to add explicit `RESOURCE_EXHAUSTED`, `RATE_LIMITED`, `TRANSIENT` codes — unblocks `RETRY_TRANSIENT` client policy. -2. Or: add an explicit retryable bit (1 reserved byte in the error frame) — alternative to (1). -3. Per-table attribution in multi-table batch errors — extend the error frame with an optional table index (`-1` = batch-level). -4. Document whether rejected `messageSequence` values count toward the server's dedup window or are excluded. - -## Open -None. Ready to implement.