diff --git a/.claude/skills/review-pr/SKILL.md b/.claude/skills/review-pr/SKILL.md
index 9f0362a0..997ff8ca 100644
--- a/.claude/skills/review-pr/SKILL.md
+++ b/.claude/skills/review-pr/SKILL.md
@@ -51,9 +51,18 @@ Capture the PR identifier in `$PR` (the part of `$ARGUMENTS` left after strippin
 PR='<PR number or URL from $ARGUMENTS, with any --level=N / -lN / bare-digit level token removed>'
 gh pr view "$PR" --json number,title,body,labels,state
 gh pr diff "$PR"
+gh pr diff "$PR" --numstat   # binary files show as `-<TAB>-<TAB><path>`
 gh pr view "$PR" --comments
 ```
 
+**Committed-binary gate (runs at every level).** Scan the `--numstat` output for
+any added/modified file git reports as binary (`-`/`-` in the added/deleted
+columns). This repo builds its native/C libraries from source in CI and does not
+commit build outputs, so any such file is a **Critical** finding regardless of
+review level — report it even at level 0. See the "Committed build artifacts"
+checklist for the rationale and the acceptable-exception (genuine test-input
+fixtures only).
+
 ## Step 2: PR title and description
 
 Check against CLAUDE.md conventions:
@@ -144,7 +153,7 @@ Every agent receives:
 
 Launch the following agents in parallel.
 
-**Agent 1 — Correctness & bugs:** NULL handling, edge cases, logic errors, off-by-one, operator precedence, error paths. Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite.
+**Agent 1 — Correctness & bugs:** NULL handling, edge cases, logic errors, off-by-one, operator precedence, error paths. Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite. When the diff touches the store-and-forward sender, the async drainer / send loop, primary reconnect/failover, or pool startup (`lazy_connect` / `initial_connect_retry` / `SenderPool` / `QueryClientPool`), also verify the "Store-and-forward & pool startup invariants" checklist — a running drainer that propagates a transport error to the caller, imposes a reconnect time budget, or hard-fails on a transient outage is a Critical (data-loss) finding.
 
 **Agent 2 — Concurrency:** Race conditions, shared mutable state, missing volatile, lock ordering, thread-safety of data structures. Use the implicit contract list (lock order, thread-affinity) and check every callsite from 2.5b for violations of the new contract.
 
@@ -154,7 +163,7 @@ Launch the following agents in parallel.
 
 **Agent 5 — Test coverage:** Coverage gaps, error path tests, NULL tests, boundary conditions, regression tests exist, `assertMemoryLeak()` usage. Cross-reference 2.5d: every cross-context exposure should have a test that exercises the changed symbol from that context. Missing tests for cross-context callsites is a high-priority finding. Test *efficacy* (whether those tests actually exercise the change and could fail) and test-*code* quality are handled by Agents 11-13 — here focus only on whether coverage exists for every new or changed path.
 
-**Agent 6 — Code quality & standards:** Code smell, member ordering, naming conventions, modern Java features, dead code, third-party dependencies.
+**Agent 6 — Code quality & standards:** Code smell, member ordering, naming conventions, modern Java features, dead code, third-party dependencies. Also scan the diff for any committed compiled binary / build artifact (run `git diff --numstat`/`--stat` and flag files git reports as binary) — the native/C libraries are built from source in CI, so a committed binary is a **Critical** finding (see the "Committed build artifacts" checklist).
 
 **Agent 7 — PR metadata & conventions:** Title format, description quality, commit messages, labels, SQL style in tests.
 
@@ -278,6 +287,26 @@ Review the diff for:
 - Code smell: overly complex methods, deep nesting, unclear intent, dead code
 - No third-party Java dependencies on data paths
 
+### Committed build artifacts
+- **A newly committed compiled binary is always Critical.** This repo builds its
+  native/C libraries from source in CI (`rebuild_native_libs.yml`,
+  `build_native.yaml`, guarded by `check-glibc-floor.sh`) and does not commit
+  build outputs. A binary added or modified in the diff cannot be reviewed,
+  audited, or reproduced from source, can smuggle in unaudited or malicious
+  code, and bloats the repo history irreversibly — so it blocks the merge.
+- Detect it structurally, not by extension alone: run `git diff --stat` /
+  `git diff --numstat` on the PR and flag every added/modified file git reports
+  as binary (`numstat` shows `-`/`-` for added/deleted lines; `--stat` shows a
+  `Bin … -> … bytes` marker). Typical offenders: `.so`, `.dylib`, `.dll`, `.a`,
+  `.o`, `.lib`, `.exe`, `.class`, `.jar`, `.war`, `.wasm`, `.node`, `.bin`.
+- The finding stands even when the binary "looks" legitimate (e.g. a rebuilt
+  `libquestdb.*`): the correct source of these artifacts is the CI native-build
+  pipeline plus release packaging, never a PR diff. The only acceptable binaries
+  are genuine test-input fixtures/resources (data a test reads), not build
+  outputs — and even those must be justified.
+- Suggested fix: drop the binary from the PR, confirm a `.gitignore` entry
+  covers it, and let CI native-build + release packaging produce it.
+
 ### QuestDB coding standards
 - Class members grouped by kind (static vs instance) and visibility, sorted alphabetically
 - Boolean names use `is...` / `has...` prefix
@@ -288,6 +317,68 @@ Review the diff for:
 - try-with-resources used where applicable
 - Native memory freed correctly
 
+### Store-and-forward & pool startup invariants (QWP facade)
+Apply this whenever the diff touches the SF sender, the async drainer / send
+loop, primary reconnect/failover, `SenderPool` / `QueryClientPool` startup,
+`lazy_connect`, or `initial_connect_retry`. A violation here is a **Critical**
+finding: the whole point of store-and-forward is that a running producer never
+loses data and never hard-fails on a transient outage.
+
+**Drainer (steady state — once the pool is running).**
+- Once the pool is running, an async drainer thread ships buffered SF data to
+  the server. It MUST NOT propagate server / transport errors back to the
+  client (`Sender` producer calls, `flush()`, the pooled handle). The ONLY
+  error a running drainer may surface to the caller is **SF out of space** (the
+  on-disk / backing buffer is full and can accept no more rows). Flag any other
+  failure class (connect-refused, DNS, unreachable/black-hole, TLS/cert, auth,
+  role-reject, upgrade/protocol timeout, reset) that can escape the drainer
+  onto a producer or borrow call.
+- Primary reconnect MUST be fully contained inside the drainer thread and MUST
+  have **no time limit** — no `reconnect_max_duration_millis`-style budget, no
+  deadline, no "give up and latch terminal after N ms". A budget that latches
+  the sender terminal on a long outage is a Critical violation: it drops a
+  producer that store-and-forward promised to keep alive. Flag any bounded
+  reconnect loop, `deadlineNanos` / `while (now < deadline)`, or terminal
+  `SenderError` reachable from the running drainer's reconnect path.
+- The drainer must retry with **exponential backoff** and handle every connect
+  failure class gracefully, without a hard fail — it keeps buffering and keeps
+  retrying until the wire is back. The per-attempt backoff may be capped (a max
+  delay between attempts), but the RETRY LOOP ITSELF must be unbounded. Flag a
+  capped total retry duration or an attempt-count cap on the steady-state
+  drainer.
+- **Sanctioned terminals (orphan-slot drainer only).** The orphan drainer
+  (`BackgroundDrainer`) MAY quarantine its slot (`.failed` sentinel,
+  human-in-the-loop) on conditions that are terminal by design: auth failure,
+  a non-421 upgrade reject, and a genuine cluster-wide durable-ack capability
+  gap that exhausted its documented settle budget (16 consecutive
+  capability-gap sweeps, or a wall-clock budget anchored at the FIRST
+  capability-gap error of the episode — whichever is hit first). These are
+  NOT violations of the no-budget rule above. The settle budget applies ONLY
+  to consecutive capability-gap attempts: transient classes (role reject,
+  transport error) must never increment it or burn its wall clock — a
+  transient state consuming the terminal budget (shared attempt counter,
+  entry-anchored deadline) IS a Critical violation of this checklist.
+
+**Pool startup — two modes; the mode decides who sees connectivity errors.**
+- `lazy_connect=true`: `build()` MUST succeed with **no server present**. The
+  producing `Sender` must work immediately (writes buffer via SF), and once the
+  server comes up the read side must also connect and read (reads are deferred,
+  not disabled). Verify `build()` does not fail-fast, the sender does not throw
+  on the first write while the server is down, and a later `borrowQuery()`
+  succeeds once the server is up.
+- `lazy_connect=false` (default): `build()` / the initial connect MUST expose
+  connectivity problems to the caller — DNS errors, connect-refused /
+  unreachable, TLS/cert, authentication/authorization, and connect/upgrade
+  timeouts must all surface as a thrown exception at startup, not be swallowed.
+  Verify each of those failure classes reaches the user during initialization.
+- **In BOTH modes the boundary is the same:** connectivity errors are only
+  ever the caller's problem DURING initialization. Once the client has
+  connected and is past initialization, the running drainer reverts to the
+  steady-state contract above — it must NEVER expose transport problems, NEVER
+  impose a reconnect time budget, and NEVER hard-fail on a transient outage.
+  Anything that undermines the store-and-forward guarantee past init is
+  Critical.
+
 ### SQL conventions (if tests or SQL involved)
 - Keywords in UPPERCASE
 - `expr::TYPE` cast syntax preferred over CAST()
@@ -340,7 +431,10 @@ Review the diff for:
 Present ONLY verified findings (false positives are excluded). Structure as:
 
 ### Critical
-Issues that must be fixed before merge. Each must include:
+Issues that must be fixed before merge. **A newly committed compiled binary or
+other build artifact (see the "Committed build artifacts" checklist) is always
+Critical, no matter how legitimate it looks — native/C libraries are built from
+source in CI, so a binary in the diff is never acceptable.** Each must include:
 - Exact file path and line numbers (including out-of-diff files)
 - Whether the finding is **in-diff** or **out-of-diff**
 - Code path trace showing why the bug is real
diff --git a/.github/scripts/check-glibc-floor.sh b/.github/scripts/check-glibc-floor.sh
new file mode 100755
index 00000000..77204943
--- /dev/null
+++ b/.github/scripts/check-glibc-floor.sh
@@ -0,0 +1,80 @@
+#!/usr/bin/env bash
+# Assert the glibc runtime floor of a Linux native library.
+#
+# Usage: check-glibc-floor.sh <path-to-libquestdb.so> <max-glibc-version>
+#   e.g. check-glibc-floor.sh core/.../linux-x86-64/libquestdb.so 2.14
+#        check-glibc-floor.sh core/.../linux-aarch64/libquestdb.so 2.17
+#
+# The dynamic linker resolves .gnu.version_r at load time, so the HIGHEST
+# GLIBC_x.y version node the library imports is its hard load floor: a host
+# whose glibc is older than that node fails System.loadLibrary/dlopen with
+# `version 'GLIBC_x.y' not found`. This script extracts every versioned import
+# and fails if the highest one exceeds the allowed floor.
+#
+# Why the floors are what they are:
+#   * linux-x86-64 -> 2.14. The oldest node we intentionally keep is
+#     memcpy@GLIBC_2.14; clock_gettime is pinned back to GLIBC_2.2.5 by
+#     src/main/c/share/glibc_compat.h, and stat/fstat resolve to the inline
+#     __xstat/__fxstat@GLIBC_2.2.5 wrappers when built in a low-glibc container.
+#     A build on a modern host (glibc >= 2.33) instead emits stat@GLIBC_2.33 /
+#     fstat@GLIBC_2.33 and trips this guard -- that is exactly the regression it
+#     exists to catch.
+#   * linux-aarch64 -> 2.17. glibc gained aarch64 support in 2.17, so 2.17 is
+#     the lowest floor physically achievable on that architecture.
+#
+# Portable to bash 3.2 (no mapfile / no negative array indices) so it can be run
+# locally on macOS as well as in the glibc build containers.
+set -euo pipefail
+
+lib="${1:?usage: check-glibc-floor.sh <lib.so> <max-glibc-version>}"
+floor="${2:?usage: check-glibc-floor.sh <lib.so> <max-glibc-version>}"
+
+if [ ! -f "$lib" ]; then
+  echo "::error::check-glibc-floor: library not found: $lib"
+  exit 1
+fi
+
+# All distinct versioned GLIBC nodes (e.g. 2.14, 2.2.5), sorted ascending.
+# objdump prints them as (GLIBC_x.y) or GLIBC_x.y depending on the toolchain;
+# the -o regex captures the token regardless of surrounding parentheses.
+# GLIBC_PRIVATE has no digit after the underscore, so it is naturally excluded.
+versions="$(
+  objdump -T "$lib" \
+    | grep -oE 'GLIBC_[0-9]+(\.[0-9]+)+' \
+    | sed 's/^GLIBC_//' \
+    | sort -Vu
+)"
+
+if [ -z "$versions" ]; then
+  echo "::error::check-glibc-floor: no versioned GLIBC symbols found in $lib (unexpected)."
+  exit 1
+fi
+
+highest="$(printf '%s\n' "$versions" | tail -n1)"
+
+echo "GLIBC version nodes required by $lib:"
+printf '%s\n' "$versions" | sed 's/^/  GLIBC_/'
+echo "Highest required: GLIBC_${highest} (allowed floor: GLIBC_${floor})"
+
+# leq A B -> succeeds when version A <= version B: sorting {A, B} with -V puts B
+# last, or they are equal.
+leq() {
+  [ "$1" = "$2" ] && return 0
+  [ "$(printf '%s\n%s\n' "$1" "$2" | sort -V | tail -n1)" = "$2" ]
+}
+
+if leq "$highest" "$floor"; then
+  echo "OK: $lib floor is GLIBC_${highest} (<= GLIBC_${floor})."
+  exit 0
+fi
+
+echo "::error::GLIBC floor regression in $lib: requires GLIBC_${highest}, above the GLIBC_${floor} floor."
+echo "::error::This library will fail to load on hosts with glibc < ${highest}."
+echo "Offending nodes above the floor and the symbols that pull them in:"
+printf '%s\n' "$versions" | while IFS= read -r v; do
+  if ! leq "$v" "$floor"; then
+    echo "  GLIBC_${v}:"
+    objdump -T "$lib" | grep -E "GLIBC_${v//./\\.}([^0-9]|\$)" | awk '{print "    " $NF}' | sort -u
+  fi
+done
+exit 1
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 5ccfaa64..f6a0cd74 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -17,10 +17,10 @@ defaults:
 jobs:
   # JDK 8 is the source of truth: the client ships as a Java 8 artifact
   # (io.questdb:questdb-client) and is released from JDK 8, so on JDK 8 it must
-  # compile, the full test suite must pass against the committed native
-  # libraries, and the javadoc jar must build (-P javadoc attaches it at the
-  # package phase). The committed native .so/.dylib/.dll are enough -- the only
-  # git submodule (zstd) is needed solely for C++ native rebuilds, not here.
+  # compile, the full test suite must pass, and the javadoc jar must build
+  # (-P javadoc attaches it at the package phase). The native libraries are no
+  # longer committed, so this job compiles libquestdb.so from source (hence the
+  # zstd submodule + cmake/nasm/build-essential toolchain) before the tests run.
   build-jdk8:
     name: Build, test & javadoc (JDK 8)
     runs-on: ubuntu-latest
@@ -28,6 +28,9 @@ jobs:
     steps:
       - name: Check out
         uses: actions/checkout@v4
+        with:
+          # zstd is required to compile the native library.
+          submodules: recursive
 
       - name: Set up JDK 8
         uses: actions/setup-java@v4
@@ -36,6 +39,26 @@ jobs:
           java-version: "8"
           cache: maven
 
+      - name: Install native build toolchain
+        run: sudo apt-get update && sudo apt-get install -y cmake nasm build-essential
+
+      - name: Build native libquestdb.so
+        # JAVA_HOME points at the JDK 8 above, so the lib is compiled against the
+        # Java 8 JNI headers -- the artifact's Java floor. Copy it into src
+        # resources (not target/) so it survives the `mvn clean` in the next step
+        # and gets packaged + loaded via the production bin/<platform> path.
+        # NOTE: this builds on ubuntu-latest for FUNCTIONAL testing only; the
+        # library's glibc runtime floor is validated separately by the
+        # `glibc-floor` job, which rebuilds in the release low-glibc container.
+        run: |
+          cd core
+          cmake -DCMAKE_BUILD_TYPE=Release -B cmake-build-release -S.
+          cmake --build cmake-build-release --config Release
+          test -f target/classes/io/questdb/client/bin-local/libquestdb.so
+          mkdir -p src/main/resources/io/questdb/client/bin/linux-x86-64
+          cp target/classes/io/questdb/client/bin-local/libquestdb.so \
+             src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so
+
       - name: Compile, test, and build javadoc
         run: mvn -B -ntp -P javadoc clean install
 
@@ -61,3 +84,80 @@ jobs:
 
       - name: Compile (main + test) and build javadoc (no tests run)
         run: mvn -B -ntp -P javadoc -DskipTests clean package
+
+  # GLIBC floor guard. The native libraries are built at release time in
+  # low-glibc manylinux containers (see maven_central_release.yml) and are NOT
+  # committed, so a floor regression is invisible to the functional test job
+  # above (it builds on ubuntu-latest, whose glibc is new enough to load almost
+  # anything). This job rebuilds the linux libraries in the SAME low-glibc
+  # environment as release and asserts the runtime floor with objdump, so a
+  # change that raises the floor (e.g. a new stat/fstat call pulling in
+  # stat@GLIBC_2.33 on a modern build host) fails the PR instead of silently
+  # shipping a library that cannot load on older distros.
+  #
+  #   * linux-x86-64  -> GLIBC_2.14 (the intended floor: memcpy@GLIBC_2.14).
+  #   * linux-aarch64 -> GLIBC_2.17 (the lowest floor glibc offers on aarch64).
+  #
+  # Uses manylinux_2_28 for both arches (stock Node 24, no glibc-2.17 shadow
+  # hack). The x86-64 floor is identical in manylinux2014 (2.17) and
+  # manylinux_2_28 (2.28) -- both resolve stat/fstat to the inline
+  # __xstat/__fxstat@GLIBC_2.2.5 wrappers -- so this validates the real shipped
+  # floor without the heavier manylinux2014 release toolchain.
+  glibc-floor:
+    name: GLIBC floor guard (${{ matrix.platform }})
+    strategy:
+      fail-fast: false
+      matrix:
+        include:
+          - platform: linux-x86-64
+            os: ubuntu-latest
+            image: quay.io/pypa/manylinux_2_28_x86_64
+            jdk_arch: x64
+            floor: "2.14"
+            cmake_args: ""
+            build_dir: cmake-build-release
+          - platform: linux-aarch64
+            os: ubuntu-22.04-arm
+            image: quay.io/pypa/manylinux_2_28_aarch64
+            jdk_arch: aarch64
+            floor: "2.17"
+            cmake_args: "-DCMAKE_TOOLCHAIN_FILE=./src/main/c/toolchains/linux-arm64.cmake"
+            build_dir: cmake-build-release-arm64
+    runs-on: ${{ matrix.os }}
+    timeout-minutes: 45
+    container:
+      image: ${{ matrix.image }}
+    steps:
+      - name: Check out
+        uses: actions/checkout@v4
+        with:
+          # zstd is required to compile the native library.
+          submodules: recursive
+
+      - name: Install tooling
+        # binutils provides objdump for the floor check; nasm/zstd are build deps.
+        run: |
+          yum update -y
+          yum install -y wget nasm zstd binutils
+
+      - name: Install Temurin JDK 8 (for jni.h)
+        # Build against the Java 8 JNI headers -- JDK 8 is the artifact's floor.
+        # The JDK version does not affect the glibc floor; it only supplies jni.h.
+        run: |
+          wget -v --timeout=180 -O jdk8.tar.gz \
+            "https://api.adoptium.net/v3/binary/latest/8/ga/linux/${{ matrix.jdk_arch }}/jdk/hotspot/normal/eclipse"
+          mkdir jdk8
+          tar xfz jdk8.tar.gz -C jdk8 --strip-components=1
+          echo "JAVA_HOME=$(pwd)/jdk8" >> "$GITHUB_ENV"
+
+      - name: Build native libquestdb.so
+        run: |
+          cd core
+          cmake ${{ matrix.cmake_args }} -DCMAKE_BUILD_TYPE=Release -B ${{ matrix.build_dir }} -S.
+          cmake --build ${{ matrix.build_dir }} --config Release
+
+      - name: Assert GLIBC floor
+        run: |
+          ./.github/scripts/check-glibc-floor.sh \
+            core/target/classes/io/questdb/client/bin-local/libquestdb.so \
+            "${{ matrix.floor }}"
diff --git a/.github/workflows/maven_central_release.yml b/.github/workflows/maven_central_release.yml
index 56508328..77f52891 100644
--- a/.github/workflows/maven_central_release.yml
+++ b/.github/workflows/maven_central_release.yml
@@ -295,6 +295,8 @@ jobs:
             echo "::error::libquestdb.so has unresolved dependencies."
             exit 1
           fi
+          # Refuse to ship if a symbol raised the glibc floor above 2.14.
+          ./.github/scripts/check-glibc-floor.sh "$lib" 2.14
           cat > LoadCheck.java <<'EOF'
           public class LoadCheck {
               public static void main(String[] args) {
@@ -360,6 +362,8 @@ jobs:
             echo "::error::libquestdb.so has unresolved dependencies."
             exit 1
           fi
+          # 2.17 is the lowest floor glibc offers on aarch64.
+          ./.github/scripts/check-glibc-floor.sh "$lib" 2.17
           cat > LoadCheck.java <<'EOF'
           public class LoadCheck {
               public static void main(String[] args) {
diff --git a/.github/workflows/rebuild_native_libs.yml b/.github/workflows/rebuild_native_libs.yml
index 026d3c3e..6878f16b 100644
--- a/.github/workflows/rebuild_native_libs.yml
+++ b/.github/workflows/rebuild_native_libs.yml
@@ -68,57 +68,38 @@ jobs:
           key: nativelibs-osx-${{ github.sha }}
   build-all-linux-x86-64:
     runs-on: ubuntu-latest
-    # manylinux2014 is a container with new-ish compilers and tools, but old glibc - 2.17
-    # 2.17 is old enough to be compatible with most Linux distributions out there
+    # manylinux_2_28 (glibc 2.28) replaces the previous manylinux2014 (glibc
+    # 2.17) container: GitHub Actions now forces actions (checkout, cache) onto
+    # Node 24, whose binary requires glibc >= 2.27, so it can no longer run
+    # inside the glibc-2.17 image (the old Node-20-glibc-217 override hack only
+    # patched /__e/node20, not /__e/node24). 2.28 still runs stock Node 24 and
+    # matches the linux-aarch64 job, which already ships glibc-2.28 binaries.
+    #
+    # NOTE: the build container's glibc (2.28) does NOT dictate the artifact's
+    # runtime glibc floor. clock_gettime is pinned back to GLIBC_2.2.5 via
+    # src/main/c/share/glibc_compat.h so the linux-x86-64 .so keeps loading on
+    # glibc 2.14+ (its floor is memcpy@GLIBC_2.14), unchanged from before the
+    # container move. If you add a symbol with a higher version node here, the
+    # floor will rise -- check with: objdump -T libquestdb.so | grep GLIBC_.
     container:
-      image: quay.io/pypa/manylinux2014_x86_64
-      volumes:
-        - /node20217:/node20217
-        - /node20217:/__e/node20
+      image: quay.io/pypa/manylinux_2_28_x86_64
     steps:
-      - name: Install tools, most are needed to build nasm
-        run: |
-          ldd --version
-          yum update -y
-          yum install 'perl(Env)' perl-Font-TTF perl-Sort-Versions gcc wget perf asciidoc xmlto ghostscript adobe-source-sans-pro-fonts adobe-source-code-pro-fonts rpm-build zstd curl -y
-      - name: Build nasm
-        # we need nasm 2.14+ due to this bug https://bugzilla.nasm.us/show_bug.cgi?id=3392205
-        # manylinux2014 distribution includes nasm 2.10
-        # the nasm project itself provides RPMs, but they built against a newer glibc and other dependencies too
-        # thus we take src.rpm from nasm project and rebuild it in the manylinux2014 container
-        # this way we get a nasm binary that is compatible with the manylinux2014 environment
-        run: |
-          wget https://www.nasm.us/pub/nasm/releasebuilds/2.16.03/linux/nasm-2.16.03-0.fc39.src.rpm
-          rpmbuild --rebuild ./nasm-2.16.03-0.fc39.src.rpm
-          rpm -i ~/rpmbuild/RPMS/x86_64/nasm-2.16.03-0.el7.x86_64.rpm
-      - name: Install Node.js 20 glibc2.17
-        # A hack to override default nodejs 20 to a build compatible with older glibc.
-        # Inspired by https://github.com/pytorch/test-infra/pull/5959 If it's good for pytorch, it's good for us too! :)
-        # Q: Why do we need this hack at all? A: Because many github actions, include action/checkout@v4, depend on nodejs 20.
-        # GitHub Actions runner provides a build of nodejs 20 that requires a newer glibc than manylinux2014 has.
-        # Thus we download a build of nodejs 20 that is compatible with manylinux2014 and override the default one.
-        run: |
-          curl -LO https://unofficial-builds.nodejs.org/download/release/v20.9.0/node-v20.9.0-linux-x64-glibc-217.tar.xz
-          tar -xf node-v20.9.0-linux-x64-glibc-217.tar.xz --strip-components 1 -C /node20217
-          ldd /__e/node20/bin/node
       - uses: actions/checkout@v4
         with:
           submodules: true
-      - name: Install up-to-date CMake
+      - name: Install tooling
         run: |
-          wget -nv https://github.com/Kitware/CMake/releases/download/v3.29.2/cmake-3.29.2-linux-x86_64.tar.gz
-          tar -zxf cmake-3.29.2-linux-x86_64.tar.gz
-          echo "PATH=`pwd`/cmake-3.29.2-linux-x86_64/bin/:$PATH" >> "$GITHUB_ENV"
+          yum update -y
+          yum install wget nasm zstd -y
       - name: Install GraalVM JDK 25 (for jni.h)
         run: |
-          wget -nv -O graalvm.tar.gz https://download.oracle.com/graalvm/25/latest/graalvm-jdk-25_linux-x64_bin.tar.gz
+          wget -v --timeout=180 -O graalvm.tar.gz https://download.oracle.com/graalvm/25/latest/graalvm-jdk-25_linux-x64_bin.tar.gz
           mkdir graalvm
           tar xfz graalvm.tar.gz -C graalvm --strip-components=1
           echo "JAVA_HOME=`pwd`/graalvm" >> "$GITHUB_ENV"
       - name: Generate Makefiles
         run: |
           cd ./core
-          # git submodule update --init
           cmake -DCMAKE_BUILD_TYPE=Release -B cmake-build-release -S.
       - name: Build linux-x86-64 CXX Library
         run: |
@@ -127,6 +108,11 @@ jobs:
           mkdir -p src/main/resources/io/questdb/client/bin/linux-x86-64/
           mkdir -p src/main/bin/linux-x86-64/
           cp target/classes/io/questdb/client/bin-local/libquestdb.so src/main/resources/io/questdb/client/bin/linux-x86-64/
+      - name: Assert GLIBC floor (2.14)
+        # Never commit a library whose glibc floor regressed above 2.14.
+        run: |
+          bash ./.github/scripts/check-glibc-floor.sh \
+            core/src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so 2.14
       - name: Save linux-x86-64 Libraries to Cache
         uses: actions/cache/save@v3
         with:
@@ -162,6 +148,11 @@ jobs:
           mkdir -p src/main/resources/io/questdb/client/bin/linux-aarch64/
           mkdir -p src/main/bin/linux-aarch64/
           cp target/classes/io/questdb/client/bin-local/libquestdb.so src/main/resources/io/questdb/client/bin/linux-aarch64/
+      - name: Assert GLIBC floor (2.17)
+        # 2.17 is the lowest floor glibc offers on aarch64.
+        run: |
+          bash ./.github/scripts/check-glibc-floor.sh \
+            core/src/main/resources/io/questdb/client/bin/linux-aarch64/libquestdb.so 2.17
       - name: Save linux-aarch64 Libraries to Cache
         uses: actions/cache/save@v3
         with:
diff --git a/.gitignore b/.gitignore
index 9859a7c6..2a7284c6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -29,4 +29,9 @@ core/CMakeCache.txt
 **/build
 **/CMakeFiles
 .envrc
-.vscode
\ No newline at end of file
+.vscode
+# Root-level Maven build output
+/target
+
+# pi subagents runtime artifacts
+.pi-subagents/
diff --git a/.pi/skills/review-pr/SKILL.md b/.pi/skills/review-pr/SKILL.md
index 7a2767c6..0c210421 100644
--- a/.pi/skills/review-pr/SKILL.md
+++ b/.pi/skills/review-pr/SKILL.md
@@ -60,9 +60,18 @@ Capture the PR identifier in `$PR` (the part of `$ARGUMENTS` left after strippin
 PR='<PR number or URL from $ARGUMENTS, with any --level=N / -lN / bare-digit level token removed>'
 gh pr view "$PR" --json number,title,body,labels,state
 gh pr diff "$PR"
+gh pr diff "$PR" --numstat   # binary files show as `-<TAB>-<TAB><path>`
 gh pr view "$PR" --comments
 ```
 
+**Committed-binary gate (runs at every level).** Scan the `--numstat` output for
+any added/modified file git reports as binary (`-`/`-` in the added/deleted
+columns). This repo builds its native/C libraries from source in CI and does not
+commit build outputs, so any such file is a **Critical** finding regardless of
+review level — report it even at level 0. See the "Committed build artifacts"
+checklist for the rationale and the acceptable-exception (genuine test-input
+fixtures only).
+
 ## Step 2: PR title and description
 
 Check against CLAUDE.md conventions:
@@ -155,7 +164,7 @@ Launch the reviewers below with the `subagent` tool in `context: "fresh"` mode,
 
 Launch the following reviewers in parallel.
 
-**Reviewer 1 — Correctness & bugs:** NULL handling, edge cases, logic errors, off-by-one, operator precedence, error paths. Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite.
+**Reviewer 1 — Correctness & bugs:** NULL handling, edge cases, logic errors, off-by-one, operator precedence, error paths. Cross-reference every changed symbol against its callsite inventory and verify the new behavior is correct at each callsite. When the diff touches the store-and-forward sender, the async drainer / send loop, primary reconnect/failover, or pool startup (`lazy_connect` / `initial_connect_retry` / `SenderPool` / `QueryClientPool`), also verify the "Store-and-forward & pool startup invariants" checklist — a running drainer that propagates a transport error to the caller, imposes a reconnect time budget, or hard-fails on a transient outage is a Critical (data-loss) finding.
 
 **Reviewer 2 — Concurrency:** Race conditions, shared mutable state, missing volatile, lock ordering, thread-safety of data structures. Use the implicit contract list (lock order, thread-affinity) and check every callsite from 2.5b for violations of the new contract.
 
@@ -165,7 +174,7 @@ Launch the following reviewers in parallel.
 
 **Reviewer 5 — Test coverage:** Coverage gaps, error path tests, NULL tests, boundary conditions, regression tests exist, `assertMemoryLeak()` usage. Cross-reference 2.5d: every cross-context exposure should have a test that exercises the changed symbol from that context. Missing tests for cross-context callsites is a high-priority finding. Test *efficacy* (whether those tests actually exercise the change and could fail) and test-*code* quality are handled by Reviewers 11-13 — here focus only on whether coverage exists for every new or changed path.
 
-**Reviewer 6 — Code quality & standards:** Code smell, member ordering, naming conventions, modern Java features, dead code, third-party dependencies.
+**Reviewer 6 — Code quality & standards:** Code smell, member ordering, naming conventions, modern Java features, dead code, third-party dependencies. Also scan the diff for any committed compiled binary / build artifact (run `git diff --numstat`/`--stat` and flag files git reports as binary) — the native/C libraries are built from source in CI, so a committed binary is a **Critical** finding (see the "Committed build artifacts" checklist).
 
 **Reviewer 7 — PR metadata & conventions:** Title format, description quality, commit messages, labels, SQL style in tests.
 
@@ -289,6 +298,26 @@ Review the diff for:
 - Code smell: overly complex methods, deep nesting, unclear intent, dead code
 - No third-party Java dependencies on data paths
 
+### Committed build artifacts
+- **A newly committed compiled binary is always Critical.** This repo builds its
+  native/C libraries from source in CI (`rebuild_native_libs.yml`,
+  `build_native.yaml`, guarded by `check-glibc-floor.sh`) and does not commit
+  build outputs. A binary added or modified in the diff cannot be reviewed,
+  audited, or reproduced from source, can smuggle in unaudited or malicious
+  code, and bloats the repo history irreversibly — so it blocks the merge.
+- Detect it structurally, not by extension alone: run `git diff --stat` /
+  `git diff --numstat` on the PR and flag every added/modified file git reports
+  as binary (`numstat` shows `-`/`-` for added/deleted lines; `--stat` shows a
+  `Bin … -> … bytes` marker). Typical offenders: `.so`, `.dylib`, `.dll`, `.a`,
+  `.o`, `.lib`, `.exe`, `.class`, `.jar`, `.war`, `.wasm`, `.node`, `.bin`.
+- The finding stands even when the binary "looks" legitimate (e.g. a rebuilt
+  `libquestdb.*`): the correct source of these artifacts is the CI native-build
+  pipeline plus release packaging, never a PR diff. The only acceptable binaries
+  are genuine test-input fixtures/resources (data a test reads), not build
+  outputs — and even those must be justified.
+- Suggested fix: drop the binary from the PR, confirm a `.gitignore` entry
+  covers it, and let CI native-build + release packaging produce it.
+
 ### QuestDB coding standards
 - Class members grouped by kind (static vs instance) and visibility, sorted alphabetically
 - Boolean names use `is...` / `has...` prefix
@@ -299,6 +328,68 @@ Review the diff for:
 - try-with-resources used where applicable
 - Native memory freed correctly
 
+### Store-and-forward & pool startup invariants (QWP facade)
+Apply this whenever the diff touches the SF sender, the async drainer / send
+loop, primary reconnect/failover, `SenderPool` / `QueryClientPool` startup,
+`lazy_connect`, or `initial_connect_retry`. A violation here is a **Critical**
+finding: the whole point of store-and-forward is that a running producer never
+loses data and never hard-fails on a transient outage.
+
+**Drainer (steady state — once the pool is running).**
+- Once the pool is running, an async drainer thread ships buffered SF data to
+  the server. It MUST NOT propagate server / transport errors back to the
+  client (`Sender` producer calls, `flush()`, the pooled handle). The ONLY
+  error a running drainer may surface to the caller is **SF out of space** (the
+  on-disk / backing buffer is full and can accept no more rows). Flag any other
+  failure class (connect-refused, DNS, unreachable/black-hole, TLS/cert, auth,
+  role-reject, upgrade/protocol timeout, reset) that can escape the drainer
+  onto a producer or borrow call.
+- Primary reconnect MUST be fully contained inside the drainer thread and MUST
+  have **no time limit** — no `reconnect_max_duration_millis`-style budget, no
+  deadline, no "give up and latch terminal after N ms". A budget that latches
+  the sender terminal on a long outage is a Critical violation: it drops a
+  producer that store-and-forward promised to keep alive. Flag any bounded
+  reconnect loop, `deadlineNanos` / `while (now < deadline)`, or terminal
+  `SenderError` reachable from the running drainer's reconnect path.
+- The drainer must retry with **exponential backoff** and handle every connect
+  failure class gracefully, without a hard fail — it keeps buffering and keeps
+  retrying until the wire is back. The per-attempt backoff may be capped (a max
+  delay between attempts), but the RETRY LOOP ITSELF must be unbounded. Flag a
+  capped total retry duration or an attempt-count cap on the steady-state
+  drainer.
+- **Sanctioned terminals (orphan-slot drainer only).** The orphan drainer
+  (`BackgroundDrainer`) MAY quarantine its slot (`.failed` sentinel,
+  human-in-the-loop) on conditions that are terminal by design: auth failure,
+  a non-421 upgrade reject, and a genuine cluster-wide durable-ack capability
+  gap that exhausted its documented settle budget (16 consecutive
+  capability-gap sweeps, or a wall-clock budget anchored at the FIRST
+  capability-gap error of the episode — whichever is hit first). These are
+  NOT violations of the no-budget rule above. The settle budget applies ONLY
+  to consecutive capability-gap attempts: transient classes (role reject,
+  transport error) must never increment it or burn its wall clock — a
+  transient state consuming the terminal budget (shared attempt counter,
+  entry-anchored deadline) IS a Critical violation of this checklist.
+
+**Pool startup — two modes; the mode decides who sees connectivity errors.**
+- `lazy_connect=true`: `build()` MUST succeed with **no server present**. The
+  producing `Sender` must work immediately (writes buffer via SF), and once the
+  server comes up the read side must also connect and read (reads are deferred,
+  not disabled). Verify `build()` does not fail-fast, the sender does not throw
+  on the first write while the server is down, and a later `borrowQuery()`
+  succeeds once the server is up.
+- `lazy_connect=false` (default): `build()` / the initial connect MUST expose
+  connectivity problems to the caller — DNS errors, connect-refused /
+  unreachable, TLS/cert, authentication/authorization, and connect/upgrade
+  timeouts must all surface as a thrown exception at startup, not be swallowed.
+  Verify each of those failure classes reaches the user during initialization.
+- **In BOTH modes the boundary is the same:** connectivity errors are only
+  ever the caller's problem DURING initialization. Once the client has
+  connected and is past initialization, the running drainer reverts to the
+  steady-state contract above — it must NEVER expose transport problems, NEVER
+  impose a reconnect time budget, and NEVER hard-fail on a transient outage.
+  Anything that undermines the store-and-forward guarantee past init is
+  Critical.
+
 ### SQL conventions (if tests or SQL involved)
 - Keywords in UPPERCASE
 - `expr::TYPE` cast syntax preferred over CAST()
@@ -351,7 +442,10 @@ Review the diff for:
 Present ONLY verified findings (false positives are excluded). Structure as:
 
 ### Critical
-Issues that must be fixed before merge. Each must include:
+Issues that must be fixed before merge. **A newly committed compiled binary or
+other build artifact (see the "Committed build artifacts" checklist) is always
+Critical, no matter how legitimate it looks — native/C libraries are built from
+source in CI, so a binary in the diff is never acceptable.** Each must include:
 - Exact file path and line numbers (including out-of-diff files)
 - Whether the finding is **in-diff** or **out-of-diff**
 - Code path trace showing why the bug is real
diff --git a/ci/build_native.yaml b/ci/build_native.yaml
new file mode 100644
index 00000000..a831e58d
--- /dev/null
+++ b/ci/build_native.yaml
@@ -0,0 +1,92 @@
+# Builds the native libquestdb shared library on the test runner itself.
+#
+# The Linux (.so) and Windows (.dll) binaries are no longer committed to the
+# repository -- they are produced and committed only by the release
+# "Build and Push Release CXX Libraries" GitHub Action. So the test CI has to
+# compile them locally before running the tests.
+#
+# All three platforms are built on their own native runner: Linux (.so),
+# Windows (.dll) and macOS (.dylib). None of these binaries are committed.
+#
+# CMake writes the artifact to:
+#   core/target/classes/io/questdb/client/bin-local/libquestdb.<ext>
+# which io.questdb.client.std.Os loads first (the "dev CXX lib" path), so the
+# client tests pick it up directly. We additionally copy it into
+#   core/src/main/resources/io/questdb/client/bin/<platform>/libquestdb.<ext>
+# so that `mvn install` packages it into the client jar exactly like the
+# committed binary used to be -- this is what the downstream QuestDB OSS server
+# tests load from the installed jar.
+#
+# JAVA_HOME (set to GraalVM JDK 25 by setup.yaml) provides jni.h / jni_md.h:
+#   - Linux:   $JAVA_HOME/include + $JAVA_HOME/include/linux
+#   - macOS:   $JAVA_HOME/include + $JAVA_HOME/include/darwin
+#   - Windows: %JAVA_HOME%\include + %JAVA_HOME%\include\win32
+steps:
+  - bash: |
+      set -eux
+      git submodule update --init --recursive core/src/main/c/share/zstd
+    displayName: "Init zstd submodule"
+
+  - bash: |
+      set -eux
+      sudo apt-get update
+      sudo apt-get install -y cmake nasm build-essential
+      cd core
+      cmake -DCMAKE_BUILD_TYPE=Release -B cmake-build-release -S.
+      cmake --build cmake-build-release --config Release
+      lib="target/classes/io/questdb/client/bin-local/libquestdb.so"
+      test -f "$lib"
+      # Fail fast if the linker left an unresolved dependency in the .so.
+      if ldd "$lib" | grep -i "not found"; then
+        echo "libquestdb.so has unresolved dependencies"
+        exit 1
+      fi
+      mkdir -p src/main/resources/io/questdb/client/bin/linux-x86-64
+      cp "$lib" src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so
+    displayName: "Build native libquestdb.so (Linux x86-64)"
+    condition: eq(variables['Agent.OS'], 'Linux')
+
+  - bash: |
+      set -eux
+      command -v cmake >/dev/null 2>&1 || brew install cmake
+      command -v nasm  >/dev/null 2>&1 || brew install nasm
+      # darwin-aarch64 on Apple silicon agents, darwin-x86-64 on Intel agents.
+      case "$(uname -m)" in
+        arm64)  platform="darwin-aarch64" ;;
+        x86_64) platform="darwin-x86-64" ;;
+        *) echo "unsupported macOS arch: $(uname -m)"; exit 1 ;;
+      esac
+      cd core
+      # Pin the dylib's minimum macOS version so the artifact stays loadable on
+      # older macOS, matching the release build.
+      export MACOSX_DEPLOYMENT_TARGET=13.0
+      cmake -DCMAKE_BUILD_TYPE=Release -B cmake-build-release -S.
+      cmake --build cmake-build-release --config Release
+      lib="target/classes/io/questdb/client/bin-local/libquestdb.dylib"
+      test -f "$lib"
+      mkdir -p "src/main/resources/io/questdb/client/bin/${platform}"
+      cp "$lib" "src/main/resources/io/questdb/client/bin/${platform}/libquestdb.dylib"
+    displayName: "Build native libquestdb.dylib (macOS)"
+    condition: eq(variables['Agent.OS'], 'Darwin')
+
+  - powershell: |
+      $ErrorActionPreference = "Stop"
+      # The CMake build is GCC/MinGW based (gcc flags, -static-libgcc/-static-libstdc++),
+      # so build the Windows DLL with the MinGW-w64 toolchain + NASM, not MSVC.
+      choco install -y --no-progress nasm mingw
+      Import-Module "$env:ChocolateyInstall\helpers\chocolateyProfile.psm1"
+      refreshenv
+      # choco's nasm package does not put nasm on PATH; add it explicitly.
+      $env:PATH = "C:\Program Files\NASM;" + $env:PATH
+      gcc --version
+      mingw32-make --version
+      nasm --version
+      cd core
+      cmake -G "MinGW Makefiles" -DCMAKE_BUILD_TYPE=Release -B cmake-build-release -S .
+      cmake --build cmake-build-release --config Release
+      $lib = "target/classes/io/questdb/client/bin-local/libquestdb.dll"
+      if (!(Test-Path $lib)) { throw "native build produced no $lib" }
+      New-Item -ItemType Directory -Force -Path "src/main/resources/io/questdb/client/bin/windows-x86-64" | Out-Null
+      Copy-Item $lib "src/main/resources/io/questdb/client/bin/windows-x86-64/libquestdb.dll" -Force
+    displayName: "Build native libquestdb.dll (Windows x86-64)"
+    condition: eq(variables['Agent.OS'], 'Windows_NT')
diff --git a/ci/run_tests_pipeline.yaml b/ci/run_tests_pipeline.yaml
index 3268313b..86d65410 100644
--- a/ci/run_tests_pipeline.yaml
+++ b/ci/run_tests_pipeline.yaml
@@ -54,10 +54,6 @@ stages:
               imageName: "macos-15-arm64"
               poolName: "Azure Pipelines"
               jdkArch: "arm64"
-            mac-x64:
-              imageName: "macos-15"
-              poolName: "Azure Pipelines"
-              jdkArch: "x64"
             windows-msvc-2022-x64:
               imageName: "windows-2022"
               poolName: "Azure Pipelines"
@@ -82,6 +78,13 @@ stages:
                 maven | "$(Agent.OS)"
               path: $(HOME)/.m2/repository
             displayName: "Cache Maven repository"
+          # Compile the native libquestdb shared library on the runner; no
+          # platform's binary is committed anymore. Must run before the client
+          # jar is installed so the freshly built lib is packaged into it. The
+          # template builds the right artifact for the current native agent --
+          # Linux (.so), Windows (.dll), and macOS (.dylib) alike (see
+          # build_native.yaml).
+          - template: build_native.yaml
           - bash: |
               BRANCH="${SYSTEM_PULLREQUEST_SOURCEBRANCH:-$BUILD_SOURCEBRANCHNAME}"
               BRANCH="${BRANCH#refs/heads/}"
@@ -149,6 +152,9 @@ stages:
                 maven | "$(Agent.OS)"
               path: $(HOME)/.m2/repository
             displayName: "Cache Maven repository"
+          # Native binaries are no longer committed; compile libquestdb.so on the
+          # runner so the coverage test run can load it (same as BuildAndTest).
+          - template: build_native.yaml
           - task: Maven@3
             displayName: "Run tests with coverage"
             inputs:
diff --git a/core/CMakeLists.txt b/core/CMakeLists.txt
index 3538aa7f..29611089 100644
--- a/core/CMakeLists.txt
+++ b/core/CMakeLists.txt
@@ -48,6 +48,7 @@ set(
         src/main/c/share/files.h
         src/main/c/share/net.h
         src/main/c/share/os.h
+        src/main/c/share/glibc_compat.h
         src/main/c/share/ooo.cpp
         src/main/c/share/cpprt_overrides.h
         src/main/c/share/cpprt_overrides.cpp
diff --git a/core/src/main/c/share/glibc_compat.h b/core/src/main/c/share/glibc_compat.h
new file mode 100644
index 00000000..24ea6211
--- /dev/null
+++ b/core/src/main/c/share/glibc_compat.h
@@ -0,0 +1,53 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+#ifndef QUESTDB_GLIBC_COMPAT_H
+#define QUESTDB_GLIBC_COMPAT_H
+
+// Pin clock_gettime() to its original GLIBC_2.2.5 symbol version.
+//
+// glibc 2.17 moved clock_gettime() out of librt and into libc, exporting it
+// under a NEW version node: clock_gettime@GLIBC_2.17. The release binaries are
+// built in a modern toolchain container (CI uses manylinux_2_28 / glibc 2.28),
+// so without this pin the linker binds our calls to clock_gettime@GLIBC_2.17.
+// That single symbol raises the whole library's glibc floor to 2.17 and makes
+// it fail to LOAD on hosts running glibc 2.14-2.16 with:
+//
+//     version `GLIBC_2.17' not found (required by libquestdb.so)
+//
+// The original clock_gettime@GLIBC_2.2.5 symbol is still exported as a compat
+// symbol by librt.so.1 on every glibc since (and by libc after the 2.34 librt
+// merge), so forcing the reference back to it keeps the library loadable down
+// to the previous floor (glibc 2.14, set by memcpy@GLIBC_2.14) with no change
+// in runtime behaviour. librt is already a NEEDED dependency (CMake links rt).
+//
+// Scope: x86-64 glibc only. aarch64 glibc started at 2.17 and has only ever
+// shipped clock_gettime in libc@GLIBC_2.17 -- there is no GLIBC_2.2.5 version
+// there, so emitting the pin on aarch64 would fail the link with an undefined
+// clock_gettime@GLIBC_2.2.5. The directive is a no-op on macOS/Windows.
+#if defined(__linux__) && defined(__GLIBC__) && defined(__x86_64__)
+__asm__(".symver clock_gettime,clock_gettime@GLIBC_2.2.5");
+#endif
+
+#endif // QUESTDB_GLIBC_COMPAT_H
diff --git a/core/src/main/c/share/net.c b/core/src/main/c/share/net.c
index 05660f2b..3b0162fc 100644
--- a/core/src/main/c/share/net.c
+++ b/core/src/main/c/share/net.c
@@ -33,6 +33,9 @@
 #include <stdlib.h>
 #include <stdint.h>
 #include <string.h>
+#include <poll.h>
+#include <time.h>
+#include "glibc_compat.h"
 #include "net.h"
 #include <netdb.h>
 #include "sysutil.h"
@@ -298,6 +301,100 @@ JNIEXPORT jint JNICALL Java_io_questdb_client_network_Net_connectAddrInfo
     return handleEintrInConnect(fd, result);
 }
 
+// Waits up to timeout_millis for an in-progress non-blocking connect on fd to
+// finish. Returns 0 on success, -1 on connection failure (errno set so the
+// caller can read it via Os.errno()), or com_questdb_network_Net_ECONNTIMEOUT
+// on timeout.
+static jint awaitConnectComplete(int fd, jint timeout_millis) {
+    // Fix a single absolute deadline up front. Recomputing the remaining budget
+    // against a moving baseline on each EINTR (reset start = now, then subtract
+    // whole milliseconds) lets a high-frequency signal storm extend the timeout:
+    // under sub-millisecond interrupts every interval truncates to 0 ms, the
+    // budget never decrements, and poll is re-armed with the full budget each
+    // time. A fixed deadline is immune to interrupt frequency -- the remaining
+    // time can only ever decrease.
+    struct timespec deadline;
+    clock_gettime(CLOCK_MONOTONIC, &deadline);
+    long budget_millis = timeout_millis > 0 ? timeout_millis : 0;
+    deadline.tv_sec += budget_millis / 1000L;
+    deadline.tv_nsec += (budget_millis % 1000L) * 1000000L;
+    if (deadline.tv_nsec >= 1000000000L) {
+        deadline.tv_sec += 1;
+        deadline.tv_nsec -= 1000000000L;
+    }
+
+    for (;;) {
+        struct timespec now;
+        clock_gettime(CLOCK_MONOTONIC, &now);
+        // Remaining time until the deadline, truncated to whole milliseconds for
+        // poll(). Truncation only ever under-shoots by < 1 ms (it never extends
+        // the wait), which keeps the timeout a strict upper bound.
+        long remaining_millis = (deadline.tv_sec - now.tv_sec) * 1000L
+                                + (deadline.tv_nsec - now.tv_nsec) / 1000000L;
+        if (remaining_millis <= 0) {
+            errno = ETIMEDOUT;
+            return com_questdb_network_Net_ECONNTIMEOUT;
+        }
+
+        struct pollfd pfd;
+        pfd.fd = fd;
+        pfd.events = POLLOUT;
+        pfd.revents = 0;
+
+        int rc = poll(&pfd, 1, (int) remaining_millis);
+        if (rc > 0) {
+            // The connect attempt has finished one way or another; the only
+            // authoritative result is SO_ERROR (POLLOUT alone does not mean
+            // success -- a refused connection is also reported as writable).
+            int so_error = 0;
+            socklen_t len = sizeof(so_error);
+            if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &so_error, &len) < 0) {
+                return -1;
+            }
+            if (so_error != 0) {
+                errno = so_error;
+                return -1;
+            }
+            return 0;
+        }
+        if (rc == 0) {
+            errno = ETIMEDOUT;
+            return com_questdb_network_Net_ECONNTIMEOUT;
+        }
+        if (errno != EINTR) {
+            return -1;
+        }
+        // Interrupted by a signal: loop and recompute the remaining time against
+        // the fixed deadline. EINTR storms cannot extend the timeout.
+    }
+}
+
+JNIEXPORT jint JNICALL Java_io_questdb_client_network_Net_connectAddrInfoTimeout
+        (JNIEnv *e, jclass cl, jint fd, jlong lpAddrInfo, jint timeoutMillis) {
+    struct addrinfo *addr = (struct addrinfo *) lpAddrInfo;
+
+    // Switch to non-blocking BEFORE connect so connect() returns immediately
+    // with EINPROGRESS instead of blocking on the OS connect timeout. The
+    // socket is left non-blocking on success, matching the post-connect
+    // configureNonBlocking() the callers already perform.
+    int flags = fcntl((int) fd, F_GETFL, 0);
+    if (flags < 0) {
+        return -1;
+    }
+    if (fcntl((int) fd, F_SETFL, flags | O_NONBLOCK) < 0) {
+        return -1;
+    }
+
+    int result = connect((int) fd, addr->ai_addr, (int) addr->ai_addrlen);
+    if (result == 0) {
+        return 0; // connected immediately (e.g. loopback)
+    }
+    if (errno == EINPROGRESS || errno == EINTR || errno == EWOULDBLOCK) {
+        return awaitConnectComplete((int) fd, timeoutMillis);
+    }
+    return -1; // immediate failure, errno set
+}
+
 JNIEXPORT void JNICALL Java_io_questdb_client_network_Net_freeAddrInfo0
         (JNIEnv *e, jclass cl, jlong address) {
     if (address != 0) {
diff --git a/core/src/main/c/share/net.h b/core/src/main/c/share/net.h
index 13adafcb..27143639 100644
--- a/core/src/main/c/share/net.h
+++ b/core/src/main/c/share/net.h
@@ -13,6 +13,8 @@ extern "C" {
 #define com_questdb_network_Net_EPEERDISCONNECT -1L
 #undef com_questdb_network_Net_EOTHERDISCONNECT
 #define com_questdb_network_Net_EOTHERDISCONNECT -2L
+#undef com_questdb_network_Net_ECONNTIMEOUT
+#define com_questdb_network_Net_ECONNTIMEOUT -3L
 
 /*
  * Class:     io_questdb_client_network_Net
diff --git a/core/src/main/c/share/os.c b/core/src/main/c/share/os.c
index 7262e3f4..ee0b1f69 100644
--- a/core/src/main/c/share/os.c
+++ b/core/src/main/c/share/os.c
@@ -30,6 +30,7 @@
 #include <string.h>
 #include <sys/time.h>
 #include <time.h>
+#include "glibc_compat.h"
 #include "../share/os.h"
 
 #ifdef __APPLE__
diff --git a/core/src/main/c/windows/net.c b/core/src/main/c/windows/net.c
index c32957d4..fd290629 100644
--- a/core/src/main/c/windows/net.c
+++ b/core/src/main/c/windows/net.c
@@ -160,6 +160,66 @@ JNIEXPORT jint JNICALL Java_io_questdb_client_network_Net_connectAddrInfo
     return res;
 }
 
+JNIEXPORT jint JNICALL Java_io_questdb_client_network_Net_connectAddrInfoTimeout
+        (JNIEnv *e, jclass cl, jint fd, jlong lpAddrInfo, jint timeoutMillis) {
+    struct addrinfo *addr = (struct addrinfo *) lpAddrInfo;
+    SOCKET s = (SOCKET) fd;
+
+    // Switch to non-blocking BEFORE connect so it returns immediately with
+    // WSAEWOULDBLOCK instead of blocking on the OS connect timeout.
+    u_long mode = 1;
+    if (ioctlsocket(s, FIONBIO, &mode) != 0) {
+        SaveLastError();
+        return -1;
+    }
+
+    int res = connect(s, addr->ai_addr, (int) addr->ai_addrlen);
+    if (res == 0) {
+        return 0; // connected immediately (e.g. loopback)
+    }
+    if (WSAGetLastError() != WSAEWOULDBLOCK) {
+        SaveLastError();
+        return -1;
+    }
+
+    fd_set writefds, exceptfds;
+    FD_ZERO(&writefds);
+    FD_ZERO(&exceptfds);
+    FD_SET(s, &writefds);
+    FD_SET(s, &exceptfds);
+
+    struct timeval tv;
+    tv.tv_sec = timeoutMillis / 1000;
+    tv.tv_usec = (timeoutMillis % 1000) * 1000;
+
+    // Winsock signals a failed non-blocking connect via the exception set.
+    int sel = select(0, NULL, &writefds, &exceptfds, &tv);
+    if (sel == 0) {
+        WSASetLastError(WSAETIMEDOUT);
+        SaveLastError();
+        return com_questdb_network_Net_ECONNTIMEOUT;
+    }
+    if (sel == SOCKET_ERROR) {
+        SaveLastError();
+        return -1;
+    }
+
+    int so_error = 0;
+    int len = sizeof(so_error);
+    if (FD_ISSET(s, &exceptfds) || !FD_ISSET(s, &writefds)) {
+        getsockopt(s, SOL_SOCKET, SO_ERROR, (char *) &so_error, &len);
+        WSASetLastError(so_error != 0 ? so_error : WSAECONNREFUSED);
+        SaveLastError();
+        return -1;
+    }
+    if (getsockopt(s, SOL_SOCKET, SO_ERROR, (char *) &so_error, &len) == 0 && so_error != 0) {
+        WSASetLastError(so_error);
+        SaveLastError();
+        return -1;
+    }
+    return 0;
+}
+
 JNIEXPORT jint JNICALL Java_io_questdb_client_network_Net_configureNonBlocking
         (JNIEnv *e, jclass cl, jint fd) {
     u_long mode = 1;
diff --git a/core/src/main/java/io/questdb/client/Completion.java b/core/src/main/java/io/questdb/client/Completion.java
index 0888370d..615799e0 100644
--- a/core/src/main/java/io/questdb/client/Completion.java
+++ b/core/src/main/java/io/questdb/client/Completion.java
@@ -36,15 +36,22 @@
  * {@link #await(long, TimeUnit)} returning {@code true}, or an explicit
  * {@link #cancel()} that races to terminal).
  * <p>
- * Signaling: the Completion is signaled from the I/O thread of the pooled
- * query client when the handler's terminal callback ({@code onEnd},
- * {@code onError}, or {@code onExecDone}) returns.
+ * Signaling: the Completion is signaled on the worker (dispatch) thread of the
+ * pooled query client when the handler's terminal callback ({@code onEnd},
+ * {@code onError}, or {@code onExecDone}) returns -- that callback runs inline
+ * on the worker thread, not on the I/O thread. Because of this, {@code await()}
+ * must never be called from inside a handler (it would self-deadlock on the
+ * worker thread); use {@link #cancel()} to stop a query from inside a handler.
  */
 public interface Completion {
 
     /**
      * Blocks until the query completes. Rethrows any server-reported failure
      * as a {@link QueryException}. Returns normally on success.
+     * <p>
+     * Must NOT be called from a result handler (it runs on the worker thread
+     * and would self-deadlock); calling it there throws
+     * {@link IllegalStateException}. Use {@link #cancel()} instead.
      *
      * @throws QueryException       if the server reported an error or
      *                              {@link #cancel()} won the race
diff --git a/core/src/main/java/io/questdb/client/HttpClientConfiguration.java b/core/src/main/java/io/questdb/client/HttpClientConfiguration.java
index c644f698..587b8111 100644
--- a/core/src/main/java/io/questdb/client/HttpClientConfiguration.java
+++ b/core/src/main/java/io/questdb/client/HttpClientConfiguration.java
@@ -38,6 +38,15 @@ default boolean fixBrokenConnection() {
         return true;
     }
 
+    /**
+     * Upper bound, in milliseconds, on establishing the TCP connection. When
+     * {@code <= 0} (the default) no application-level connect timeout is applied
+     * and the connect falls back to the OS-level TCP connect timeout.
+     */
+    default int getConnectTimeout() {
+        return 0;
+    }
+
     default EpollFacade getEpollFacade() {
         return EpollFacadeImpl.INSTANCE;
     }
diff --git a/core/src/main/java/io/questdb/client/Query.java b/core/src/main/java/io/questdb/client/Query.java
index f6832e84..c2a752f7 100644
--- a/core/src/main/java/io/questdb/client/Query.java
+++ b/core/src/main/java/io/questdb/client/Query.java
@@ -27,19 +27,29 @@
 import io.questdb.client.cutlass.qwp.client.QwpBindSetter;
 import io.questdb.client.cutlass.qwp.client.QwpColumnBatchHandler;
 
+import java.io.Closeable;
+
 /**
- * Per-thread, reusable builder for one query. Obtained from
- * {@link QuestDB#query()}: every call on the same thread returns the same
- * instance, reset to empty.
+ * A query handle leased from the {@link QuestDB} pool via
+ * {@link QuestDB#borrowQuery()}. The handle holds one pooled query client (one
+ * WebSocket + I/O thread) for the lifetime of the borrow; the caller MUST
+ * {@link #close()} it to release the client back to the pool (typically via
+ * try-with-resources).
+ * <p>
+ * Allocation: the per-submit path is allocation-free -- the heavy query state
+ * is pre-allocated on the leased pool slot and reused, and {@link #submit()}
+ * returns this same handle as its {@link Completion}. {@code borrowQuery()}
+ * creates one small lease handle per borrow (often scalar-replaced by the JIT
+ * when used with try-with-resources).
  * <p>
  * Lifecycle: configure with {@link #sql}, optional {@link #binds}, and
- * {@link #handler}, then call {@link #submit()} to obtain a {@link Completion}.
- * After the Completion terminates, the next {@code QuestDB.query()} call on
- * the same thread returns this same instance with its state reset.
+ * {@link #handler}, then call {@link #submit()} to obtain a {@link Completion}
+ * and {@code await()} it before the next {@link #submit()}.
  * <p>
- * Thread safety: not thread-safe. One in-flight query per thread.
+ * Thread safety: not thread-safe and single-flight -- one in-flight query per
+ * handle. To run queries concurrently, borrow one handle per concurrent query.
  */
-public interface Query {
+public interface Query extends Closeable {
 
     /** Discards the current configuration without submitting. */
     void abandon();
@@ -53,9 +63,39 @@ public interface Query {
     Query binds(QwpBindSetter binds);
 
     /**
-     * Sets the result-batch handler. The handler is invoked on the pooled
-     * query client's I/O thread; if it touches caller state, it is
-     * responsible for its own synchronization.
+     * Releases the leased pooled query client back to the pool. The caller
+     * MUST call this (typically via try-with-resources). A real disconnect only
+     * happens at {@link QuestDB#close()}. Idempotent.
+     * <p>
+     * If a submit is still in flight (the caller never awaited, or its
+     * {@code await(timeout)} expired), {@code close()} cancels it and waits for
+     * the terminal event so the client is idle before it returns to the pool.
+     * That wait is bounded by {@code query_close_timeout_ms} (default 5000ms,
+     * see {@link QuestDBBuilder#queryCloseTimeoutMillis(long)}) and is
+     * interruptible -- interrupting the calling thread aborts it. If the query
+     * does not drain within the budget, the client is discarded rather than
+     * returned (its connection may carry late frames for the abandoned query),
+     * and the pool grows a fresh one on the next borrow. {@code close()}
+     * therefore never blocks the caller unbounded, even when the server is slow
+     * to honor the cancel.
+     * <p>
+     * Must NOT be called from a result handler: handlers run on the worker
+     * thread, so {@code close()} would block waiting for a terminal event that
+     * only that thread can deliver. Calling it there throws
+     * {@link IllegalStateException}. Use {@link #cancel()} (non-blocking) to
+     * stop a query from inside a handler.
+     */
+    @Override
+    void close();
+
+    /**
+     * Sets the result-batch handler. The handler is invoked on the worker
+     * (dispatch) thread that drives {@code execute()} -- it consumes the pooled
+     * query client's I/O-thread event queue inline, it does NOT run on the I/O
+     * thread. If it touches caller state, it is responsible for its own
+     * synchronization. A handler must not call the blocking {@link #close()} or
+     * {@link Completion#await()} (they would self-deadlock on the worker
+     * thread); use {@link #cancel()} to stop from inside a handler.
      */
     Query handler(QwpColumnBatchHandler handler);
 
@@ -65,11 +105,12 @@ public interface Query {
     Query sql(CharSequence sql);
 
     /**
-     * Submits the query for execution. Returns the {@link Completion} field
-     * cached on this instance; never allocates. Blocks up to the builder's
-     * configured acquire timeout if the query pool is exhausted.
+     * Submits the query for execution on the leased client. Returns this handle
+     * as its own {@link Completion}; never allocates. The handle is
+     * single-flight: {@code await()} the returned Completion before the next
+     * {@code submit()}.
      *
-     * @return the single-flight Completion bound to this Query instance
+     * @return the single-flight Completion bound to this Query handle
      */
     Completion submit();
 }
diff --git a/core/src/main/java/io/questdb/client/QuestDB.java b/core/src/main/java/io/questdb/client/QuestDB.java
index a608e12f..ee93afcf 100644
--- a/core/src/main/java/io/questdb/client/QuestDB.java
+++ b/core/src/main/java/io/questdb/client/QuestDB.java
@@ -24,8 +24,6 @@
 
 package io.questdb.client;
 
-import io.questdb.client.cutlass.qwp.client.QwpColumnBatchHandler;
-
 import java.io.Closeable;
 
 /**
@@ -34,37 +32,42 @@
  * share across threads.
  * <p>
  * Steady-state allocation is zero: pooled instances are pre-allocated and
- * reused, the per-thread {@link Query} handle is cached in a {@code ThreadLocal},
- * and the {@link Completion} associated with each query is a field on that
- * cached handle.
+ * reused, each borrowed {@link Query} handle is a pre-allocated front bound to
+ * its pool slot, and the {@link Completion} associated with each query is a
+ * field on that handle.
  * <p>
- * Configuration: use {@link #connect(CharSequence)} when the same address list
- * and credentials serve both ingest and egress -- the most common case.
- * Use {@link #connect(CharSequence, CharSequence)} or {@link #builder()} when
- * ingest and egress endpoints differ.
+ * Configuration: one {@code ws}/{@code wss} string describes the whole cluster
+ * (a single {@code addr} server list) and both the ingest and query pools
+ * connect across it. Use {@link #connect(CharSequence)} for the common case, or
+ * {@link #builder()} for pool sizing and the ingest callbacks. To tolerate the
+ * server being down at startup, set {@code lazy_connect=true} in the config
+ * (async ingest + lazy reads; reads stay enabled and connect once the server
+ * is up).
  * <p>
  * Thread safety: instances are safe to share. {@link #borrowSender()} and
- * {@link #query()} may be called concurrently from any thread; the pool
+ * {@link #borrowQuery()} may be called concurrently from any thread; the pool
  * guarantees mutual exclusion of pooled resources.
  */
 public interface QuestDB extends Closeable {
 
     /**
      * Builder for advanced configuration (pool sizes, acquisition timeouts,
-     * differing ingest/egress configs).
+     * ingest callbacks).
      */
     static QuestDBBuilder builder() {
         return new QuestDBBuilder();
     }
 
     /**
-     * Connects with a single configuration string used for both ingest and
-     * egress. The schema must be {@code ws} or {@code wss}: QuestDB ingests and
-     * queries over QWP (the QuestDB WebSocket protocol), so one string
-     * configures both clients.
+     * Connects with a single configuration string for the whole QuestDB cluster,
+     * used for both ingest and egress. The schema must be {@code ws} or
+     * {@code wss}: QuestDB ingests and queries over QWP (the QuestDB WebSocket
+     * protocol), so one string configures both clients. List every cluster node
+     * in a single {@code addr} server list and both pools connect across it.
      * <p>
-     * Use {@link #connect(CharSequence, CharSequence)} or {@link #builder()}
-     * when ingest and egress use different addresses or credentials.
+     * Use {@link #builder()} for pool sizing and the ingest callbacks. To
+     * tolerate the server being down at startup, set {@code lazy_connect=true}
+     * in the config (async ingest + lazy reads, reads still enabled).
      *
      * @param configurationString a {@code ws}/{@code wss} config string (see
      *                            {@link Sender#fromConfig} or
@@ -76,20 +79,29 @@ static QuestDB connect(CharSequence configurationString) {
     }
 
     /**
-     * Connects with explicit ingest and egress configuration strings.
+     * Borrows a {@link Query} handle from the pool. The caller MUST call
+     * {@link Query#close()} on the returned instance to release it back to the
+     * pool (typically via try-with-resources). The handle leases one pooled
+     * query client (one WebSocket + I/O thread) for the borrow's lifetime;
+     * submit one or more queries on it, then close it.
+     * <p>
+     * Allocation: zero at steady state -- the returned instance is a
+     * pre-allocated handle bound to the leased pool slot.
+     * <p>
+     * Blocking: blocks up to the builder's
+     * {@link QuestDBBuilder#acquireTimeoutMillis(long) acquire timeout} when
+     * the pool is exhausted; throws on timeout.
+     * <p>
+     * Concurrency: a single handle is single-flight. To run queries
+     * concurrently, borrow one handle per concurrent query (up to
+     * {@code query_pool_max}).
      *
-     * @param ingestConfigurationString config for the {@link Sender} pool
-     *                                  ({@link Sender#fromConfig} format)
-     * @param queryConfigurationString  config for the query pool
-     *                                  ({@link io.questdb.client.cutlass.qwp.client.QwpQueryClient#fromConfig} format)
-     * @return a connected QuestDB handle
+     * @return a Query handle leased from the pool; release with
+     * {@link Query#close()}
+     * @throws QueryException if the pool is exhausted beyond the acquire
+     *                        timeout, or if this handle is closed
      */
-    static QuestDB connect(CharSequence ingestConfigurationString, CharSequence queryConfigurationString) {
-        return builder()
-                .ingestConfig(ingestConfigurationString)
-                .queryConfig(queryConfigurationString)
-                .build();
-    }
+    Query borrowQuery();
 
     /**
      * Borrows a {@link Sender} from the pool. The caller MUST call
@@ -125,61 +137,4 @@ static QuestDB connect(CharSequence ingestConfigurationString, CharSequence quer
      */
     @Override
     void close();
-
-    /**
-     * One-shot convenience for queries with no bind parameters. Equivalent to
-     * {@code query().sql(sql).handler(handler).submit()}. Returns the same
-     * thread-local {@link Completion} instance that {@link #query()} would,
-     * so this method is also zero-allocation at steady state.
-     *
-     * @param sql     the SQL text; the buffer is not retained after submit
-     * @param handler the result-batch handler; invoked on the pooled query
-     *                client's I/O thread
-     * @return a single-flight handle for the in-flight query
-     */
-    Completion executeSql(CharSequence sql, QwpColumnBatchHandler handler);
-
-    /**
-     * Allocates a fresh {@link Query} handle. Unlike {@link #query()}, this
-     * does NOT return the per-thread cached instance; every call allocates.
-     * <p>
-     * Use this when one thread needs to hold multiple in-flight queries
-     * concurrently (each {@code submit()} acquires its own worker from the
-     * query pool, so up to {@code queryPoolSize} concurrent queries on a
-     * single thread is fine). For the common case of one query at a time,
-     * prefer {@link #query()} -- it is allocation-free.
-     */
-    Query newQuery();
-
-    /**
-     * Opens a query builder for the calling thread. Returns the same
-     * thread-local instance on every call: callers do not need to cache it
-     * themselves. The returned {@code Query} is in a reset state and is not
-     * thread-safe -- one in-flight query per thread.
-     * <p>
-     * For multiple concurrent in-flight queries from a single thread, use
-     * {@link #newQuery()} instead.
-     */
-    Query query();
-
-    /**
-     * Releases the thread-affine {@link Sender} (if any) currently attached
-     * to the calling thread back to the pool. Call this on threads borrowed
-     * from pools you do not own (for example, Netty event loops) before they
-     * are recycled, to avoid pinning a {@link Sender} for the lifetime of
-     * a thread that no longer needs it.
-     */
-    void releaseSender();
-
-    /**
-     * Returns a {@link Sender} pinned to the calling thread. First call on
-     * a thread takes one from the pool and pins it; subsequent calls on the
-     * same thread return the same instance. The pin is released by
-     * {@link #releaseSender()} or by {@link #close()} on this handle.
-     * <p>
-     * Use this for long-lived, dedicated producer threads where borrow/return
-     * overhead would dominate. For short-lived or event-loop callers, prefer
-     * {@link #borrowSender()}.
-     */
-    Sender sender();
 }
diff --git a/core/src/main/java/io/questdb/client/QuestDBBuilder.java b/core/src/main/java/io/questdb/client/QuestDBBuilder.java
index cae00942..be18bfbe 100644
--- a/core/src/main/java/io/questdb/client/QuestDBBuilder.java
+++ b/core/src/main/java/io/questdb/client/QuestDBBuilder.java
@@ -25,6 +25,7 @@
 package io.questdb.client;
 
 import io.questdb.client.cutlass.qwp.client.QwpQueryClient;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener;
 import io.questdb.client.impl.ConfigString;
 import io.questdb.client.impl.ConfigView;
 import io.questdb.client.impl.QuestDBImpl;
@@ -35,14 +36,20 @@
 
 /**
  * Builder for {@link QuestDB}. Most callers use {@link QuestDB#connect(CharSequence)};
- * this builder is for pool sizing, idle/lifetime knobs, acquire timeout,
- * and the case where ingest and egress configs differ.
+ * this builder adds pool sizing, idle/lifetime knobs, the acquire timeout, and
+ * the ingest callbacks.
  * <p>
- * Both configs must use the {@code ws} or {@code wss} schema (QWP over
- * WebSocket). A pool key (e.g. {@code sender_pool_min}) may be carried in the
- * connect string or set with an explicit builder call; an explicit call always
- * wins. When both connect strings carry the same pool key with different values,
- * {@link #build()} fails.
+ * To tolerate the server being down at startup, set {@code lazy_connect=true}
+ * in the config: the ingest side connects asynchronously (writes buffer until
+ * the wire is up) and the read pool connects lazily on first use. Reads stay
+ * fully enabled -- they just connect once the server is available.
+ * <p>
+ * One configuration string describes the whole QuestDB cluster (see
+ * {@link #fromConfig}): list every node in a single {@code addr} server list and
+ * both the ingest and query pools connect across it. The schema must be
+ * {@code ws} or {@code wss} (QWP over WebSocket). A pool key (e.g.
+ * {@code sender_pool_min}) may be carried in the connect string or set with an
+ * explicit builder call; an explicit call always wins.
  */
 public final class QuestDBBuilder {
 
@@ -52,6 +59,7 @@ public final class QuestDBBuilder {
     static final long DEFAULT_MAX_LIFETIME_MILLIS = 30 * 60_000L;
     static final int DEFAULT_POOL_MAX = 4;
     static final int DEFAULT_POOL_MIN = 1;
+    static final long DEFAULT_QUERY_CLOSE_TIMEOUT_MILLIS = 5_000;
 
     // Every valid pool value is >= 0, so -1 unambiguously marks "not set
     // explicitly". The public pool setters are the only writers of these
@@ -59,11 +67,16 @@ public final class QuestDBBuilder {
     private static final int UNSET = -1;
 
     private long acquireTimeoutMillis = UNSET;
+    // Optional ingest-side async callbacks. Null -> each pooled Sender uses its
+    // loud-not-silent default. Applied to every Sender the pool builds.
+    private SenderConnectionListener connectionListener;
+    private BackgroundDrainerListener drainerListener;
+    private SenderErrorHandler errorHandler;
     private long housekeeperIntervalMillis = UNSET;
+    private String config;
     private long idleTimeoutMillis = UNSET;
-    private String ingestConfig;
     private long maxLifetimeMillis = UNSET;
-    private String queryConfig;
+    private long queryCloseTimeoutMillis = UNSET;
     private int queryPoolMax = UNSET;
     private int queryPoolMin = UNSET;
     private int senderPoolMax = UNSET;
@@ -85,6 +98,73 @@ public QuestDBBuilder acquireTimeoutMillis(long millis) {
         return this;
     }
 
+    /**
+     * Maximum time {@link Query#close()} waits for an in-flight query to drain
+     * (after issuing a cancel) before discarding the leased query client and
+     * letting the pool grow a fresh one. Bounds the close of a handle whose
+     * {@code submit()} is still running -- e.g. when the caller's own
+     * {@code await(timeout)} expired and they gave up. Defaults to 5000ms.
+     */
+    public QuestDBBuilder queryCloseTimeoutMillis(long millis) {
+        if (millis < 0) {
+            throw new IllegalArgumentException("queryCloseTimeoutMillis must be >= 0");
+        }
+        this.queryCloseTimeoutMillis = millis;
+        return this;
+    }
+
+    /**
+     * Sets the async connection-event listener applied to every pooled ingest
+     * {@link Sender}. The listener observes connect / disconnect / failover
+     * transitions across the whole sender pool; events are delivered on the
+     * senders' I/O threads, so the listener must be thread-safe and must not
+     * block. Pass {@code null} (the default) to keep each sender's
+     * loud-not-silent default listener.
+     *
+     * @param listener the shared connection listener, or {@code null} for the default
+     * @return this instance for method chaining
+     */
+    public QuestDBBuilder connectionListener(SenderConnectionListener listener) {
+        this.connectionListener = listener;
+        return this;
+    }
+
+    /**
+     * Sets the background orphan-slot drainer listener applied to every pooled
+     * ingest {@link Sender}. The listener observes the background drainer
+     * events of every sender the pool builds: durable-ack capability-gap
+     * retries, transient all-replica failover windows, and the eventual
+     * escalation to a {@code .failed} sentinel. Events are delivered on the
+     * drainers' own threads, so the listener must be thread-safe and must not
+     * block. Only meaningful when the configuration enables
+     * {@code drain_orphans}. Pass {@code null} (the default) to keep the
+     * drainers' default (no listener).
+     *
+     * @param listener the shared drainer listener, or {@code null} for the default
+     * @return this instance for method chaining
+     */
+    public QuestDBBuilder drainerListener(BackgroundDrainerListener listener) {
+        this.drainerListener = listener;
+        return this;
+    }
+
+    /**
+     * Sets the async error handler applied to every pooled ingest
+     * {@link Sender}. The handler receives terminal/async ingest errors
+     * (terminal upgrade failures, write errors)
+     * from across the whole sender pool; notifications are delivered on the
+     * senders' I/O threads, so the handler must be thread-safe and must not
+     * block. Pass {@code null} (the default) to keep each sender's
+     * loud-not-silent default handler.
+     *
+     * @param handler the shared error handler, or {@code null} for the default
+     * @return this instance for method chaining
+     */
+    public QuestDBBuilder errorHandler(SenderErrorHandler handler) {
+        this.errorHandler = handler;
+        return this;
+    }
+
     /**
      * Builds the {@link QuestDB} handle. Validates both connect strings up
      * front -- so a malformed config fails here even when both pools have
@@ -101,39 +181,45 @@ public QuestDBBuilder acquireTimeoutMillis(long millis) {
      * and is delivered once the server acks; until then it stays preserved.
      */
     public QuestDB build() {
-        if (ingestConfig == null) {
-            throw new IllegalStateException("ingest configuration is required; call fromConfig() or ingestConfig()");
+        if (config == null) {
+            throw new IllegalStateException("configuration is required; call fromConfig()");
         }
-        if (queryConfig == null) {
-            throw new IllegalStateException("query configuration is required; call fromConfig() or queryConfig()");
+        ConfigString cs = ConfigString.parse(config);
+        ConfigView view = new ConfigView(cs);
+        // Validate the single cluster config exactly as both pools will, but
+        // without connecting: the full Sender parse plus validateParameters
+        // (ingress value keys are registry-STRING, so only the real parse
+        // validates their values), then the typed egress validateConfig. Each
+        // side applies the keys it owns and silently ignores the rest, so one
+        // string drives both. A malformed config therefore fails here even when
+        // a pool min is 0 and nothing connects.
+        Sender.LineSenderBuilder.validateWsConfigString(config);
+        QwpQueryClient.validateConfig(view, "wss".equals(cs.schema()));
+
+        // lazy_connect: tolerate a down server at startup without disabling
+        // reads. The ingest side connects asynchronously (writes buffer until the
+        // wire is up) and the read pool defaults to min=0 -- it connects lazily
+        // on the first query once the server is up. Reads stay enabled.
+        boolean lazyConnect = view.getBool("lazy_connect", false);
+        String ingestConfig = config;
+        if (lazyConnect) {
+            ingestConfig = resolveLazyConnect(view);
         }
-        ConfigString ingestCs = ConfigString.parse(ingestConfig);
-        ConfigString queryCs = ConfigString.parse(queryConfig);
-        ConfigView ingestView = new ConfigView(ingestCs);
-        ConfigView queryView = new ConfigView(queryCs);
-        // Validate both connect strings exactly as the pools will, but without
-        // connecting. The ingest string runs the full Sender parse plus
-        // validateParameters -- ingress value keys are registry-STRING, so only
-        // the real parse validates their values. The egress string runs the
-        // typed validateConfig. A malformed config therefore fails here even
-        // when a pool min is 0 and nothing connects.
-        Sender.LineSenderBuilder.validateWsConfigString(ingestConfig);
-        QwpQueryClient.validateConfig(queryView, "wss".equals(queryCs.schema()));
-
-        // A view carries no side; getInt/getLong read any key, so the ingest
-        // and query views also serve the POOL reads.
-        resolvePoolInt(senderPoolMin, "sender_pool_min", ingestView, queryView, DEFAULT_POOL_MIN, this::senderPoolMin);
-        resolvePoolInt(senderPoolMax, "sender_pool_max", ingestView, queryView, DEFAULT_POOL_MAX, this::senderPoolMax);
-        resolvePoolInt(queryPoolMin, "query_pool_min", ingestView, queryView, DEFAULT_POOL_MIN, this::queryPoolMin);
-        resolvePoolInt(queryPoolMax, "query_pool_max", ingestView, queryView, DEFAULT_POOL_MAX, this::queryPoolMax);
-        resolvePoolLong(acquireTimeoutMillis, "acquire_timeout_ms", ingestView, queryView, DEFAULT_ACQUIRE_TIMEOUT_MILLIS, this::acquireTimeoutMillis);
-        resolvePoolLong(idleTimeoutMillis, "idle_timeout_ms", ingestView, queryView, DEFAULT_IDLE_TIMEOUT_MILLIS, this::idleTimeoutMillis);
-        resolvePoolLong(maxLifetimeMillis, "max_lifetime_ms", ingestView, queryView, DEFAULT_MAX_LIFETIME_MILLIS, this::maxLifetimeMillis);
-        resolvePoolLong(housekeeperIntervalMillis, "housekeeper_interval_ms", ingestView, queryView, DEFAULT_HOUSEKEEPER_INTERVAL_MILLIS, this::housekeeperIntervalMillis);
+
+        resolvePoolInt(senderPoolMin, "sender_pool_min", view, DEFAULT_POOL_MIN, this::senderPoolMin);
+        resolvePoolInt(senderPoolMax, "sender_pool_max", view, DEFAULT_POOL_MAX, this::senderPoolMax);
+        // lazy_connect makes the read pool lazy (min=0); without it the default min is 1.
+        resolvePoolInt(queryPoolMin, "query_pool_min", view, lazyConnect ? 0 : DEFAULT_POOL_MIN, this::queryPoolMin);
+        resolvePoolInt(queryPoolMax, "query_pool_max", view, DEFAULT_POOL_MAX, this::queryPoolMax);
+        resolvePoolLong(acquireTimeoutMillis, "acquire_timeout_ms", view, DEFAULT_ACQUIRE_TIMEOUT_MILLIS, this::acquireTimeoutMillis);
+        resolvePoolLong(queryCloseTimeoutMillis, "query_close_timeout_ms", view, DEFAULT_QUERY_CLOSE_TIMEOUT_MILLIS, this::queryCloseTimeoutMillis);
+        resolvePoolLong(idleTimeoutMillis, "idle_timeout_ms", view, DEFAULT_IDLE_TIMEOUT_MILLIS, this::idleTimeoutMillis);
+        resolvePoolLong(maxLifetimeMillis, "max_lifetime_ms", view, DEFAULT_MAX_LIFETIME_MILLIS, this::maxLifetimeMillis);
+        resolvePoolLong(housekeeperIntervalMillis, "housekeeper_interval_ms", view, DEFAULT_HOUSEKEEPER_INTERVAL_MILLIS, this::housekeeperIntervalMillis);
 
         return new QuestDBImpl(
                 ingestConfig,
-                queryConfig,
+                config,
                 senderPoolMin,
                 senderPoolMax,
                 queryPoolMin,
@@ -141,19 +227,63 @@ public QuestDB build() {
                 acquireTimeoutMillis,
                 idleTimeoutMillis,
                 maxLifetimeMillis,
-                housekeeperIntervalMillis
+                housekeeperIntervalMillis,
+                queryCloseTimeoutMillis,
+                errorHandler,
+                connectionListener,
+                drainerListener
         );
     }
 
+    // Validates the lazy_connect contract and returns the ingest config to use:
+    // the original string with a non-blocking async initial connect injected
+    // when the user did not set one. lazy_connect requires BOTH sides to start
+    // non-blocking, so an explicit knob that forces a blocking / fail-fast
+    // startup is a configuration conflict and is rejected with a clear remedy.
+    private String resolveLazyConnect(ConfigView view) {
+        // (1) ingest side: only initial_connect_retry=async is non-blocking;
+        // off/false/on/true/sync all block or fail-fast at startup.
+        String mode = view.getStr("initial_connect_retry");
+        if (mode != null && !"async".equalsIgnoreCase(mode)) {
+            throw new IllegalArgumentException(
+                    "conflicting configuration: lazy_connect=true needs a non-blocking startup, but "
+                            + "initial_connect_retry=" + mode + " makes the initial connect block / fail-fast. "
+                            + "Resolve by removing initial_connect_retry (lazy_connect implies "
+                            + "initial_connect_retry=async) or setting initial_connect_retry=async.");
+        }
+        // (2) read side: lazy_connect requires query_pool_min=0 so the read pool
+        // does not eagerly fail-fast at startup. An explicit query_pool_min > 0
+        // (builder call or connect string) contradicts that.
+        int explicitQueryMin;
+        if (queryPoolMin != UNSET) {
+            explicitQueryMin = queryPoolMin; // explicit builder call
+        } else if (view.has("query_pool_min")) {
+            explicitQueryMin = view.getInt("query_pool_min", UNSET); // connect string
+        } else {
+            explicitQueryMin = 0; // unset -> lazy default of 0
+        }
+        if (explicitQueryMin > 0) {
+            throw new IllegalArgumentException(
+                    "conflicting configuration: lazy_connect=true needs query_pool_min=0 (the read pool "
+                            + "connects lazily on first use and must not fail-fast at startup), but query_pool_min="
+                            + explicitQueryMin + " was set. Resolve by removing query_pool_min (lazy_connect "
+                            + "defaults it to 0) or setting query_pool_min=0.");
+        }
+        // No explicit initial_connect_retry -> inject async so the ingest build
+        // is non-blocking. An explicit async needs no injection.
+        return mode == null ? withDefaultAsyncConnect(config) : config;
+    }
+
     /**
-     * Sets a single configuration string used for both ingest and egress. The
-     * schema must be {@code ws} or {@code wss}.
+     * Sets the single configuration string for the whole QuestDB cluster --
+     * used for both ingest and egress. List every cluster node in one
+     * {@code addr} (comma-separated, or by repeating the key); the ingest and
+     * query pools each connect across that one server list. The schema must be
+     * {@code ws} or {@code wss}.
      */
     public QuestDBBuilder fromConfig(CharSequence configurationString) {
-        requireWebSocketSchema(configurationString, "connection");
-        String s = configurationString.toString();
-        this.ingestConfig = s;
-        this.queryConfig = s;
+        requireWebSocketSchema(configurationString, "cluster");
+        this.config = configurationString.toString();
         return this;
     }
 
@@ -183,16 +313,6 @@ public QuestDBBuilder idleTimeoutMillis(long millis) {
         return this;
     }
 
-    /**
-     * Sets the ingest-side configuration. The schema must be {@code ws} or
-     * {@code wss}.
-     */
-    public QuestDBBuilder ingestConfig(CharSequence configurationString) {
-        requireWebSocketSchema(configurationString, "ingest");
-        this.ingestConfig = configurationString.toString();
-        return this;
-    }
-
     /**
      * Maximum age of a pooled connection before the housekeeper recycles it
      * (next time it is idle). Useful for picking up DNS / load-balancer
@@ -206,16 +326,6 @@ public QuestDBBuilder maxLifetimeMillis(long millis) {
         return this;
     }
 
-    /**
-     * Sets the query-side configuration. The schema must be {@code ws} or
-     * {@code wss}.
-     */
-    public QuestDBBuilder queryConfig(CharSequence configurationString) {
-        requireWebSocketSchema(configurationString, "query");
-        this.queryConfig = configurationString.toString();
-        return this;
-    }
-
     /**
      * Maximum query-pool size. Defaults to 4.
      */
@@ -303,12 +413,24 @@ public java.util.Map<String, Object> poolConfigSnapshotForTest() {
         m.put("query_pool_min", queryPoolMin);
         m.put("query_pool_max", queryPoolMax);
         m.put("acquire_timeout_ms", acquireTimeoutMillis);
+        m.put("query_close_timeout_ms", queryCloseTimeoutMillis);
         m.put("idle_timeout_ms", idleTimeoutMillis);
         m.put("max_lifetime_ms", maxLifetimeMillis);
         m.put("housekeeper_interval_ms", housekeeperIntervalMillis);
         return m;
     }
 
+    // Inject a non-blocking async initial connect right after the schema
+    // separator so lazy_connect's build never blocks or fail-fast on a down
+    // server. Only used when the user set no initial_connect_retry of their own
+    // (resolveLazyConnect rejects an explicit blocking mode rather than silently
+    // overriding it), so placement is immaterial -- there is no competing value.
+    private static String withDefaultAsyncConnect(String config) {
+        int sep = config.indexOf("::");
+        // sep >= 0: fromConfig() validated a ws/wss schema, so "::" is present.
+        return config.substring(0, sep + 2) + "initial_connect_retry=async;" + config.substring(sep + 2);
+    }
+
     private static void requireWebSocketSchema(CharSequence config, String role) {
         String schema = ConfigString.parse(config).schema();
         if (!"ws".equals(schema) && !"wss".equals(schema)) {
@@ -317,53 +439,17 @@ private static void requireWebSocketSchema(CharSequence config, String role) {
         }
     }
 
-    private void resolvePoolInt(int current, String key, ConfigView ingest, ConfigView query, int dflt, IntConsumer setter) {
+    private void resolvePoolInt(int current, String key, ConfigView view, int dflt, IntConsumer setter) {
         if (current != UNSET) {
-            return; // explicit builder call wins; skip the conflict check
-        }
-        boolean inIngest = ingest.has(key);
-        boolean inQuery = query.has(key);
-        int value;
-        if (inIngest && inQuery) {
-            int vi = ingest.getInt(key, UNSET);
-            int vq = query.getInt(key, UNSET);
-            if (vi != vq) {
-                throw new IllegalArgumentException(
-                        "conflicting pool config: " + key + " (ingest=" + vi + ", query=" + vq + ")");
-            }
-            value = vi;
-        } else if (inIngest) {
-            value = ingest.getInt(key, UNSET);
-        } else if (inQuery) {
-            value = query.getInt(key, UNSET);
-        } else {
-            value = dflt;
+            return; // explicit builder call wins
         }
-        setter.accept(value);
+        setter.accept(view.has(key) ? view.getInt(key, UNSET) : dflt);
     }
 
-    private void resolvePoolLong(long current, String key, ConfigView ingest, ConfigView query, long dflt, LongConsumer setter) {
+    private void resolvePoolLong(long current, String key, ConfigView view, long dflt, LongConsumer setter) {
         if (current != UNSET) {
-            return; // explicit builder call wins; skip the conflict check
-        }
-        boolean inIngest = ingest.has(key);
-        boolean inQuery = query.has(key);
-        long value;
-        if (inIngest && inQuery) {
-            long vi = ingest.getLong(key, UNSET);
-            long vq = query.getLong(key, UNSET);
-            if (vi != vq) {
-                throw new IllegalArgumentException(
-                        "conflicting pool config: " + key + " (ingest=" + vi + ", query=" + vq + ")");
-            }
-            value = vi;
-        } else if (inIngest) {
-            value = ingest.getLong(key, UNSET);
-        } else if (inQuery) {
-            value = query.getLong(key, UNSET);
-        } else {
-            value = dflt;
+            return; // explicit builder call wins
         }
-        setter.accept(value);
+        setter.accept(view.has(key) ? view.getLong(key, UNSET) : dflt);
     }
 }
diff --git a/core/src/main/java/io/questdb/client/Sender.java b/core/src/main/java/io/questdb/client/Sender.java
index 604f45d5..4a1419a7 100644
--- a/core/src/main/java/io/questdb/client/Sender.java
+++ b/core/src/main/java/io/questdb/client/Sender.java
@@ -791,11 +791,12 @@ default Sender uuidColumn(CharSequence name, long lo, long hi) {
      *       unconnected sender; the I/O thread runs the same retry loop in
      *       the background. The user thread can call {@code at()} /
      *       {@code flush()} immediately; rows accumulate in the cursor SF
-     *       engine until the wire is up. A connect-budget exhaustion or a
-     *       terminal upgrade failure is delivered to the async error inbox
-     *       as a {@link io.questdb.client.SenderError} (no synchronous
-     *       throw on the user call site). Wire {@code error_handler=...}
-     *       to observe these.</li>
+     *       engine until the wire is up. Connect failures are retried
+     *       indefinitely in the background; a terminal upgrade failure
+     *       (auth reject, capability mismatch) is delivered to the async
+     *       error inbox as a {@link io.questdb.client.SenderError} (no
+     *       synchronous throw on the user call site). Wire
+     *       {@code error_handler=...} to observe these.</li>
      * </ul>
      * <p>
      * Default resolution when the caller does not pick a value:
@@ -1011,6 +1012,9 @@ final class LineSenderBuilder {
         private int autoFlushRows = PARAMETER_NOT_SET_EXPLICITLY;
         private int bufferCapacity = PARAMETER_NOT_SET_EXPLICITLY;
         private long closeFlushTimeoutMillis = CLOSE_FLUSH_TIMEOUT_NOT_SET;
+        // Upper bound (ms) on the TCP connect. PARAMETER_NOT_SET_EXPLICITLY ->
+        // 0 (no application-level connect timeout; OS connect timeout applies).
+        private int connectTimeoutMillis = PARAMETER_NOT_SET_EXPLICITLY;
         // Optional user-supplied async connection-event listener. When null,
         // the sender uses DefaultSenderConnectionListener.INSTANCE
         // (loud-not-silent log of every transition).
@@ -1018,6 +1022,11 @@ final class LineSenderBuilder {
         // Bounded inbox capacity for the async connection-event dispatcher.
         // PARAMETER_NOT_SET_EXPLICITLY → spec default (64).
         private int connectionListenerInboxCapacity = PARAMETER_NOT_SET_EXPLICITLY;
+        // Optional user-supplied observer for background orphan-slot drainer
+        // events (durable-ack capability-gap retries, all-replica failover
+        // windows, persistent-failure escalation). When null, drainers run
+        // without a listener. Only meaningful with drainOrphans=true.
+        private io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener drainerListener;
         // Orphan adoption: when true, the foreground sender scans
         // <sf_dir>/*/ at startup for sibling slots that hold unacked data
         // and reports them. Default false. Spec calls for spawning
@@ -1078,6 +1087,11 @@ public String getSettingsPath() {
             public int getTimeout() {
                 return httpTimeout == PARAMETER_NOT_SET_EXPLICITLY ? DEFAULT_HTTP_TIMEOUT : httpTimeout;
             }
+
+            @Override
+            public int getConnectTimeout() {
+                return connectTimeoutMillis == PARAMETER_NOT_SET_EXPLICITLY ? 0 : connectTimeoutMillis;
+            }
         };
         private long minRequestThroughput = PARAMETER_NOT_SET_EXPLICITLY;
         private int multicastTtl = PARAMETER_NOT_SET_EXPLICITLY;
@@ -1199,6 +1213,28 @@ public AdvancedTlsSettings advancedTls() {
             return new AdvancedTlsSettings();
         }
 
+        /**
+         * Upper bound, in milliseconds, on establishing the TCP connection to a
+         * QuestDB endpoint. When set, a connect that does not complete within
+         * this budget is aborted (instead of riding the much longer OS-level
+         * connect timeout). Applies to both HTTP/WebSocket transports. Default
+         * is unset (0), which falls back to the OS connect timeout.
+         *
+         * @param millis connect timeout in milliseconds; must be &gt; 0
+         * @return this instance for method chaining
+         */
+        public LineSenderBuilder connectTimeoutMillis(int millis) {
+            if (this.connectTimeoutMillis != PARAMETER_NOT_SET_EXPLICITLY) {
+                throw new LineSenderException("connect timeout was already configured ")
+                        .put("[connect_timeout=").put(this.connectTimeoutMillis).put("]");
+            }
+            if (millis <= 0) {
+                throw new LineSenderException("connect_timeout must be > 0: ").put(millis);
+            }
+            this.connectTimeoutMillis = millis;
+            return this;
+        }
+
         /**
          * Per-endpoint timeout on the WebSocket upgrade response read. Default
          * {@value QwpWebSocketSender#DEFAULT_AUTH_TIMEOUT_MS} ms.
@@ -1531,6 +1567,7 @@ public Sender build() {
                             actualErrorInboxCapacity,
                             actualDurableAckKeepaliveIntervalMillis,
                             authTimeoutMillis,
+                            connectTimeoutMillis == PARAMETER_NOT_SET_EXPLICITLY ? 0 : connectTimeoutMillis,
                             connectionListener,
                             actualConnectionListenerInboxCapacity
                     );
@@ -1553,6 +1590,12 @@ public Sender build() {
                 // WebSocketClient inside the abandoned `connected`.
                 connected.setTransactional(transactional);
                 try {
+                    // Install the drainer listener BEFORE startOrphanDrainers
+                    // below: drainers must see the listener at submit time so
+                    // no early drainer event is lost to a late installation.
+                    if (drainerListener != null) {
+                        connected.setDrainerListener(drainerListener);
+                    }
                     // Once the foreground sender is up, dispatch drainers
                     // for any sibling orphan slots. Scan AFTER we acquire
                     // our own slot lock so we never accidentally try to
@@ -1755,6 +1798,31 @@ public LineSenderBuilder disableAutoFlush() {
             return this;
         }
 
+        /**
+         * Sets the async listener observing background orphan-slot drainer
+         * events: per-attempt durable-ack capability-gap retries
+         * ({@code onDurableAckUnavailable}), transient all-replica failover
+         * windows ({@code onPrimaryUnavailable}), and the eventual escalation
+         * to a {@code .failed} sentinel
+         * ({@code onDurableAckPersistentFailure}). The listener runs on the
+         * drainers' own threads, so it must be thread-safe and must not block
+         * — hand off to a queue or metrics sink and return. Only meaningful
+         * when {@link #drainOrphans(boolean)} is enabled.
+         *
+         * <p>WebSocket transport only; setting on other transports throws.
+         *
+         * @param listener the listener; {@code null} keeps the default (no listener)
+         * @return this instance for method chaining
+         */
+        public LineSenderBuilder drainerListener(
+                io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener listener) {
+            if (protocol != PARAMETER_NOT_SET_EXPLICITLY && protocol != PROTOCOL_WEBSOCKET) {
+                throw new LineSenderException("drainer_listener is only supported for WebSocket transport");
+            }
+            this.drainerListener = listener;
+            return this;
+        }
+
         /**
          * Opt in to adopting sibling slots under {@code <sf_dir>/*} at
          * startup that hold unacked data left behind by a crashed sender or
@@ -1772,6 +1840,16 @@ public LineSenderBuilder disableAutoFlush() {
          * Slots flagged with the {@code .failed} sentinel are skipped
          * (manual reset required), and the foreground sender's own slot is
          * never adopted.
+         * <p>
+         * Close-latency note: {@code close()} stops adopted drainers. A
+         * drainer still connecting (e.g. during an outage) is stop-signaled
+         * immediately and exits within ~50ms; a drainer actively replaying
+         * frames is given a ~2.5s grace window to finish, plus a 0.5s stop
+         * window — so {@code close()} may take up to ~3s while orphan
+         * drainers are in flight (and a drainer parked in a blocking native
+         * connect is abandoned to exit on its own daemon thread).
+         * Un-drained slots stay on disk and are re-adopted by the next
+         * sender that enables {@code drain_orphans}.
          */
         public LineSenderBuilder drainOrphans(boolean enabled) {
             if (protocol != PARAMETER_NOT_SET_EXPLICITLY && protocol != PROTOCOL_WEBSOCKET) {
@@ -2357,15 +2435,16 @@ public LineSenderBuilder reconnectMaxBackoffMillis(long millis) {
         }
 
         /**
-         * Per-outage cap on the cursor I/O loop's reconnect retry budget.
-         * Once a wire failure occurs, the loop retries with exponential
-         * backoff until either reconnect succeeds (timer resets) or this
-         * many millis elapse since the first failure of this outage —
-         * whichever comes first. On budget exhaustion, the next user
-         * thread API call throws.
+         * Cap on the blocking initial-connect retry budget when
+         * {@code initial_connect_retry=sync}. {@code fromConfig} retries
+         * with exponential backoff until connect succeeds or this many
+         * millis elapse, then throws. The background reconnect loop
+         * (mid-stream outages and async initial connect) does NOT consult
+         * this value: it retries indefinitely and halts only on a terminal
+         * auth/upgrade error or {@code close()}.
          * <p>
-         * Default {@code 300_000} (5 minutes). Lower for fail-fast services;
-         * higher for tolerating long maintenance windows. WebSocket only.
+         * Default {@code 300_000} (5 minutes). Lower for fail-fast startup;
+         * higher for tolerating a slow server boot. WebSocket only.
          */
         public LineSenderBuilder reconnectMaxDurationMillis(long millis) {
             if (protocol != PARAMETER_NOT_SET_EXPLICITLY && protocol != PROTOCOL_WEBSOCKET) {
@@ -3166,6 +3245,9 @@ private LineSenderBuilder fromConfig(CharSequence configurationString) {
                     pos = getValue(configurationString, pos, sink, "request_timeout");
                     int requestTimeout = parseIntValue(sink, "request_timeout");
                     httpTimeoutMillis(requestTimeout);
+                } else if (Chars.equals("connect_timeout", sink)) {
+                    pos = getValue(configurationString, pos, sink, "connect_timeout");
+                    connectTimeoutMillis(parseIntValue(sink, "connect_timeout"));
                 } else if (Chars.equals("request_min_throughput", sink)) {
                     pos = getValue(configurationString, pos, sink, "request_min_throughput");
                     int requestMinThroughput = parseIntValue(sink, "request_min_throughput");
@@ -3446,6 +3528,9 @@ private LineSenderBuilder fromConfigWebSocket(CharSequence configurationString)
                 if (view.has("auth_timeout_ms")) {
                     authTimeoutMillis(view.getLong("auth_timeout_ms", 0));
                 }
+                if (view.has("connect_timeout")) {
+                    connectTimeoutMillis((int) view.getLong("connect_timeout", 0));
+                }
 
                 s = view.getStr("auto_flush_rows");
                 if (s != null) {
@@ -3701,6 +3786,7 @@ public java.util.Map<String, Object> wsConfigSnapshotForTest() {
             m.put("connection_listener_inbox_capacity", connectionListenerInboxCapacity);
             m.put("token", httpToken);
             m.put("auth_timeout_ms", authTimeoutMillis);
+            m.put("connect_timeout", connectTimeoutMillis == PARAMETER_NOT_SET_EXPLICITLY ? 0 : connectTimeoutMillis);
             m.put("username", username);
             m.put("password", password);
             m.put("tls_verify", tlsValidationMode == null ? null : tlsValidationMode.name());
diff --git a/core/src/main/java/io/questdb/client/SenderConnectionEvent.java b/core/src/main/java/io/questdb/client/SenderConnectionEvent.java
index 7d0c2c61..fd450ba9 100644
--- a/core/src/main/java/io/questdb/client/SenderConnectionEvent.java
+++ b/core/src/main/java/io/questdb/client/SenderConnectionEvent.java
@@ -96,8 +96,8 @@ public long getAttemptNumber() {
     /**
      * The classified cause of the event, or {@code null} for success/info
      * events ({@link Kind#CONNECTED}, {@link Kind#FAILED_OVER},
-     * {@link Kind#RECONNECTED}). For terminal kinds
-     * ({@link Kind#AUTH_FAILED}, {@link Kind#RECONNECT_BUDGET_EXHAUSTED}) this
+     * {@link Kind#RECONNECTED}). For the terminal kind
+     * ({@link Kind#AUTH_FAILED}) this
      * carries the typed exception that caused the sender to halt.
      */
     @Nullable
@@ -223,8 +223,10 @@ public enum Kind {
         /**
          * Every endpoint in the configured address list was attempted and none
          * accepted the connection in this sweep. The client will back off and
-         * retry the sweep until the reconnect budget is exhausted. Fired once
-         * per failed sweep.
+         * retry the sweep — bounded by {@code reconnect_max_duration_millis}
+         * during a blocking (sync) initial connect, indefinitely otherwise
+         * (Invariant B: the background loop never gives up on a wall-clock
+         * budget). Fired once per failed sweep.
          */
         ALL_ENDPOINTS_UNREACHABLE,
 
@@ -234,14 +236,6 @@ public enum Kind {
          * producer-thread API call surfaces a {@code LineSenderException}.
          * {@link #getCause()} carries the {@code QwpAuthFailedException}.
          */
-        AUTH_FAILED,
-
-        /**
-         * Terminal: the configured reconnect time budget was exhausted without
-         * a successful reconnect. The sender will halt; the next producer-thread
-         * API call surfaces a {@code LineSenderException}. {@link #getCause()}
-         * carries the last observed reconnect error.
-         */
-        RECONNECT_BUDGET_EXHAUSTED
+        AUTH_FAILED
     }
 }
diff --git a/core/src/main/java/io/questdb/client/SenderConnectionListener.java b/core/src/main/java/io/questdb/client/SenderConnectionListener.java
index 2620ca6c..4595fbbd 100644
--- a/core/src/main/java/io/questdb/client/SenderConnectionListener.java
+++ b/core/src/main/java/io/questdb/client/SenderConnectionListener.java
@@ -51,8 +51,8 @@
  * {@link SenderConnectionEvent.Kind#RECONNECTED}) are guaranteed to fire on
  * each transition. Failure events ({@code ENDPOINT_ATTEMPT_FAILED},
  * {@code ALL_ENDPOINTS_UNREACHABLE}) may be coalesced under inbox pressure.
- * Terminal events ({@code AUTH_FAILED}, {@code RECONNECT_BUDGET_EXHAUSTED})
- * fire before the producer-thread {@code LineSenderException} is observable on
+ * The terminal event {@code AUTH_FAILED}
+ * fires before the producer-thread {@code LineSenderException} is observable on
  * the next API call -- so a listener can react sooner than the producer learns
  * via exception, but should not assume the listener fires first under heavy
  * notification load.
diff --git a/core/src/main/java/io/questdb/client/cutlass/http/client/HttpClient.java b/core/src/main/java/io/questdb/client/cutlass/http/client/HttpClient.java
index 94562663..0175ad6c 100644
--- a/core/src/main/java/io/questdb/client/cutlass/http/client/HttpClient.java
+++ b/core/src/main/java/io/questdb/client/cutlass/http/client/HttpClient.java
@@ -66,6 +66,7 @@ public abstract class HttpClient implements QuietCloseable {
     protected final NetworkFacade nf;
     protected final Socket socket;
     private final ObjectPool<DirectUtf8String> csPool = new ObjectPool<>(DirectUtf8String.FACTORY, 64);
+    private final int connectTimeout;
     private final int defaultTimeout;
     private final boolean fixBrokenConnection;
     private final int maxBufferSize;
@@ -84,6 +85,7 @@ public HttpClient(HttpClientConfiguration configuration, SocketFactory socketFac
         this.nf = configuration.getNetworkFacade();
         this.socket = socketFactory.newInstance(nf, LOG);
         this.defaultTimeout = configuration.getTimeout();
+        this.connectTimeout = configuration.getConnectTimeout();
         this.bufferSize = configuration.getInitialRequestBufferSize();
         this.maxBufferSize = configuration.getMaximumRequestBufferSize();
         this.responseParserBufSize = configuration.getResponseBufferSize();
@@ -617,10 +619,16 @@ private void connect(CharSequence host, int port) {
                 throw new HttpClientException("could not resolve host ").put("[host=").put(host).put("]");
             }
 
-            if (nf.connectAddrInfo(fd, addrInfo) != 0) {
+            final int connectResult = connectTimeout > 0
+                    ? nf.connectAddrInfoTimeout(fd, addrInfo, connectTimeout)
+                    : nf.connectAddrInfo(fd, addrInfo);
+            if (connectResult != 0) {
                 int errno = nf.errno();
                 nf.freeAddrInfo(addrInfo);
                 disconnect();
+                if (connectResult == NetworkFacade.CONNECT_TIMEOUT) {
+                    throw new HttpClientException("connect timed out ").put("[host=").put(host).put(", port=").put(port).put(", timeout=").put(connectTimeout).put(']').flagAsTimeout();
+                }
                 throw new HttpClientException("could not connect to host ").put("[host=").put(host).put(", port=").put(port).put(", errno=").put(errno).put(']');
             }
             nf.freeAddrInfo(addrInfo);
@@ -631,9 +639,20 @@ private void connect(CharSequence host, int port) {
                 throw new HttpClientException("could not configure socket to be non-blocking [fd=").put(fd).put(", errno=").put(errno).put(']');
             }
 
+            // Register the fd with the event loop before the TLS handshake so the
+            // handshake can park on socket readiness via ioWait() instead of
+            // busy-spinning on the non-blocking socket.
+            setupIoWait();
+
             if (socket.supportsTls()) {
+                // Bound the TLS handshake by the connect budget (falling back to
+                // the request timeout when connect_timeout is unset), so a peer
+                // that completes TCP but stalls mid-handshake cannot hang or pin a
+                // CPU.
+                final long tlsHandshakeStartNanos = System.nanoTime();
+                final int tlsHandshakeBudgetMillis = connectTimeout > 0 ? connectTimeout : defaultTimeout;
                 try {
-                    socket.startTlsSession(host);
+                    socket.startTlsSession(host, op -> ioWait(remainingTime(tlsHandshakeBudgetMillis, tlsHandshakeStartNanos), op));
                 } catch (TlsSessionInitFailedException e) {
                     int errno = nf.errno();
                     disconnect();
@@ -641,9 +660,15 @@ private void connect(CharSequence host, int port) {
                             .put(", error=").put(e.getFlyweightMessage())
                             .put(", errno=").put(errno)
                             .put(']');
+                } catch (Throwable t) {
+                    // ioWait() throws a timeout-flagged HttpClientException when the
+                    // handshake budget is exhausted; any other error can also surface
+                    // mid-handshake. Disconnect so the fd and native buffers do not
+                    // leak, then propagate.
+                    disconnect();
+                    throw t;
                 }
             }
-            setupIoWait();
         }
 
         private void doSend(long lo, long hi, int timeoutMillis) {
diff --git a/core/src/main/java/io/questdb/client/cutlass/http/client/WebSocketClient.java b/core/src/main/java/io/questdb/client/cutlass/http/client/WebSocketClient.java
index 81ad7c86..49ecaa6e 100644
--- a/core/src/main/java/io/questdb/client/cutlass/http/client/WebSocketClient.java
+++ b/core/src/main/java/io/questdb/client/cutlass/http/client/WebSocketClient.java
@@ -47,6 +47,7 @@
 import java.security.MessageDigest;
 import java.security.NoSuchAlgorithmException;
 import java.util.Base64;
+import java.util.concurrent.atomic.AtomicBoolean;
 
 import static java.util.concurrent.TimeUnit.NANOSECONDS;
 
@@ -99,8 +100,15 @@ public abstract class WebSocketClient implements QuietCloseable {
     private final int maxRecvBufSize;
     private final SecureRnd rnd;
     private final WebSocketSendBuffer sendBuffer;
-    // volatile: written by user thread in close(), read by I/O thread in checkConnected()/sendFrame()/receiveFrame()
-    private volatile boolean closed;
+    // Written by whichever closer wins the CAS in close(); read by the I/O
+    // thread in checkConnected()/sendFrame()/receiveFrame(). An AtomicBoolean
+    // (not a bare volatile check-then-act) so concurrent closers cannot both
+    // enter close() and double-run disconnect()/Unsafe.free.
+    private final AtomicBoolean closed = new AtomicBoolean();
+    // Upper bound (ms) on the TCP connect. <= 0 disables the application-level
+    // timeout and falls back to the OS connect timeout. Seeded from the
+    // configuration; the QWP sender may override it via setConnectTimeout().
+    private int connectTimeoutMillis;
     private int fragmentBufPos;
     private long fragmentBufPtr;       // native buffer for accumulating fragment payloads
     private int fragmentBufSize;
@@ -168,6 +176,7 @@ public WebSocketClient(HttpClientConfiguration configuration, SocketFactory sock
         this.nf = configuration.getNetworkFacade();
         this.socket = socketFactory.newInstance(nf, LOG);
         this.defaultTimeout = configuration.getTimeout();
+        this.connectTimeoutMillis = configuration.getConnectTimeout();
 
         int sendBufSize = Math.max(configuration.getInitialRequestBufferSize(), DEFAULT_SEND_BUFFER_SIZE);
         int maxSendBufSize = Math.max(configuration.getMaximumRequestBufferSize(), sendBufSize);
@@ -192,7 +201,7 @@ public WebSocketClient(HttpClientConfiguration configuration, SocketFactory sock
             this.frameParser = new WebSocketFrameParser();
             this.rnd = new SecureRnd();
             this.upgraded = false;
-            this.closed = false;
+            this.closed.set(false);
         } catch (Throwable t) {
             if (recvBufPtr != 0) {
                 Unsafe.free(recvBufPtr, recvBufSize, MemoryTag.NATIVE_DEFAULT);
@@ -207,8 +216,12 @@ public WebSocketClient(HttpClientConfiguration configuration, SocketFactory sock
 
     @Override
     public void close() {
-        if (!closed) {
-            closed = true;
+        // CAS gate: exactly one closer runs the teardown below. Closers can be
+        // the owner thread, the I/O thread's exit path, or a stale duplicate
+        // reference (see CursorWebSocketSendLoop) -- a bare volatile
+        // check-then-act here would let two concurrent closers both enter and
+        // double-run disconnect()/Unsafe.free (native double-free).
+        if (closed.compareAndSet(false, true)) {
 
             // Try to send close frame
             if (upgraded && !socket.isClosed()) {
@@ -242,7 +255,7 @@ public void close() {
      * @param port the server port
      */
     public void connect(CharSequence host, int port) {
-        if (closed) {
+        if (closed.get()) {
             throw new HttpClientException("WebSocket client is closed");
         }
 
@@ -375,7 +388,7 @@ public int getUpgradeStatusCode() {
      * Returns whether the WebSocket is connected and upgraded.
      */
     public boolean isConnected() {
-        return upgraded && !closed && !socket.isClosed();
+        return upgraded && !closed.get() && !socket.isClosed();
     }
 
     /**
@@ -481,6 +494,16 @@ public void sendPing(int timeout) {
         }
     }
 
+    /**
+     * Overrides the TCP connect timeout (milliseconds) for subsequent
+     * {@link #connect} calls. {@code <= 0} disables the application-level
+     * timeout and falls back to the OS connect timeout. Must be called before
+     * {@link #connect}.
+     */
+    public void setConnectTimeout(int connectTimeoutMillis) {
+        this.connectTimeoutMillis = connectTimeoutMillis;
+    }
+
     /**
      * Sets the value sent as the {@code X-QWP-Accept-Encoding} upgrade header,
      * e.g. {@code "zstd;level=1,raw"}. Pass {@code null} to omit the header
@@ -570,7 +593,7 @@ public boolean tryReceiveFrame(WebSocketFrameHandler handler) {
      * @param authorizationHeader the Authorization header value (e.g., "Basic ..."), or null
      */
     public void upgrade(CharSequence path, int timeout, CharSequence authorizationHeader) {
-        if (closed) {
+        if (closed.get()) {
             throw new HttpClientException("WebSocket client is closed");
         }
         if (socket.isClosed()) {
@@ -877,7 +900,7 @@ private void appendToFragmentBuffer(long payloadPtr, int payloadLen) {
     }
 
     private void checkConnected() {
-        if (closed) {
+        if (closed.get()) {
             throw new HttpClientException("WebSocket client is closed");
         }
         if (!upgraded) {
@@ -922,10 +945,18 @@ private void doConnect(CharSequence host, int port) {
             throw new HttpClientException("could not resolve host [host=").put(host).put(']');
         }
 
-        if (nf.connectAddrInfo(fd, addrInfo) != 0) {
+        final int connectResult = connectTimeoutMillis > 0
+                ? nf.connectAddrInfoTimeout(fd, addrInfo, connectTimeoutMillis)
+                : nf.connectAddrInfo(fd, addrInfo);
+        if (connectResult != 0) {
             int errno = nf.errno();
             nf.freeAddrInfo(addrInfo);
             disconnect();
+            if (connectResult == NetworkFacade.CONNECT_TIMEOUT) {
+                throw new HttpClientException("connect timed out [host=").put(host)
+                        .put(", port=").put(port)
+                        .put(", timeout=").put(connectTimeoutMillis).put(']').flagAsTimeout();
+            }
             throw new HttpClientException("could not connect [host=").put(host)
                     .put(", port=").put(port)
                     .put(", errno=").put(errno).put(']');
@@ -939,19 +970,35 @@ private void doConnect(CharSequence host, int port) {
                     .put(", errno=").put(errno).put(']');
         }
 
+        // Register the fd with the event loop before the TLS handshake so the
+        // handshake can park on socket readiness via ioWait() instead of
+        // busy-spinning on the non-blocking socket.
+        setupIoWait();
+
         if (socket.supportsTls()) {
+            // Bound the TLS handshake by the connect budget (falling back to the
+            // request timeout when connect_timeout is unset), so a peer that
+            // completes TCP but stalls mid-handshake cannot hang or pin a CPU.
+            final long tlsHandshakeStartNanos = System.nanoTime();
+            final int tlsHandshakeBudgetMillis = connectTimeoutMillis > 0 ? connectTimeoutMillis : defaultTimeout;
             try {
-                socket.startTlsSession(host);
+                socket.startTlsSession(host, op -> ioWait(getRemainingTimeOrThrow(tlsHandshakeBudgetMillis, tlsHandshakeStartNanos), op));
             } catch (TlsSessionInitFailedException e) {
                 int errno = nf.errno();
                 disconnect();
                 throw new HttpClientException("could not start TLS session [fd=").put(fd)
                         .put(", error=").put(e.getFlyweightMessage())
                         .put(", errno=").put(errno).put(']');
+            } catch (Throwable t) {
+                // ioWait() throws a timeout-flagged HttpClientException when the
+                // handshake budget is exhausted; any other error can also surface
+                // mid-handshake. Disconnect so the fd and native buffers do not
+                // leak, then propagate.
+                disconnect();
+                throw t;
             }
         }
 
-        setupIoWait();
         if (LOG.isDebugEnabled()) {
             LOG.debug("Connected to [host={}, port={}]", host, port);
         }
diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpHostHealthTracker.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpHostHealthTracker.java
index 166c0331..7b61f957 100644
--- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpHostHealthTracker.java
+++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpHostHealthTracker.java
@@ -35,9 +35,19 @@
  * so a known-good cross-zone host is picked before an untried local host.
  * <p>
  * Each method is internally synchronized, but pickNext + recordX is not atomic
- * across the pair. Callers must externally serialize a pick → record sequence
- * (the QWP clients do this via the sender's {@code synchronized buildAndConnect}
- * and the query client's documented one-execute-at-a-time contract).
+ * across the pair. Callers of the SHARED-round API (pickNext / beginRound /
+ * isRoundExhausted) must externally serialize a pick → record sequence (the
+ * ingest sender does this by keeping its foreground connect walk single-file
+ * behind a lock; the query client via its documented one-execute-at-a-time
+ * contract).
+ * <p>
+ * Concurrent walkers that must not consume or poison the shared round --
+ * the ingest sender's background orphan drainers -- use a private
+ * {@link RoundCursor} ({@link #newRoundCursor()}) paired with the
+ * health-only record overloads ({@code markRoundAttempted=false}): the
+ * cursor's attempted set is walker-local (claim-at-pick, so concurrent
+ * cursors never race on the pick → record pair), while state/zone updates
+ * flow into the shared health ledger that orders everyone's picks.
  */
 public final class QwpHostHealthTracker {
     public enum HostState {
@@ -250,24 +260,113 @@ public void recordMidStreamFailure(int idx) {
     }
 
     public void recordRoleReject(int idx, boolean isTransient) {
+        recordRoleReject(idx, isTransient, true);
+    }
+
+    /**
+     * Variant with an explicit round-bit policy. {@code markRoundAttempted =
+     * false} updates only the shared health ledger (state), leaving the
+     * shared round's attempted bit untouched — for walkers on a private
+     * {@link RoundCursor} whose attempts must stay invisible to the shared
+     * round (the ingest sender's background drainers).
+     */
+    public void recordRoleReject(int idx, boolean isTransient, boolean markRoundAttempted) {
         synchronized (lock) {
             states[idx] = isTransient ? HostState.TRANSIENT_REJECT : HostState.TOPOLOGY_REJECT;
-            attemptedThisRound[idx] = true;
+            if (markRoundAttempted) {
+                attemptedThisRound[idx] = true;
+            }
         }
     }
 
     public void recordSuccess(int idx) {
+        recordSuccess(idx, true);
+    }
+
+    /**
+     * Variant with an explicit round-bit policy; see
+     * {@link #recordRoleReject(int, boolean, boolean)}. The success epoch
+     * (sticky-Healthy recency) is recorded either way — a background
+     * walker's success is real health data.
+     */
+    public void recordSuccess(int idx, boolean markRoundAttempted) {
         synchronized (lock) {
             states[idx] = HostState.HEALTHY;
-            attemptedThisRound[idx] = true;
+            if (markRoundAttempted) {
+                attemptedThisRound[idx] = true;
+            }
             lastSuccessEpoch[idx] = ++successEpoch;
         }
     }
 
     public void recordTransportError(int idx) {
+        recordTransportError(idx, true);
+    }
+
+    /**
+     * Variant with an explicit round-bit policy; see
+     * {@link #recordRoleReject(int, boolean, boolean)}.
+     */
+    public void recordTransportError(int idx, boolean markRoundAttempted) {
         synchronized (lock) {
             states[idx] = HostState.TRANSPORT_ERROR;
-            attemptedThisRound[idx] = true;
+            if (markRoundAttempted) {
+                attemptedThisRound[idx] = true;
+            }
+        }
+    }
+
+    /**
+     * Creates a walker-private full-sweep cursor over the host list. Each
+     * {@link RoundCursor#next()} returns the highest-priority host this
+     * cursor has not yet returned — priority is the same live
+     * {@code (state, zone_tier)} tuple {@link #pickNext()} uses — and
+     * claims it at pick time in the cursor's OWN attempted set, so:
+     * <ul>
+     *   <li>every cursor sweeps every host exactly once regardless of what
+     *       other walkers do concurrently (no endpoint stealing);</li>
+     *   <li>the pick → record pair needs no external serialization — the
+     *       claim is cursor-local, and the health records are atomic;</li>
+     *   <li>the shared round (attempted bits, {@link #beginRound},
+     *       {@link #isRoundExhausted}) is never consulted nor mutated.</li>
+     * </ul>
+     * Pair with the {@code markRoundAttempted=false} record overloads so the
+     * walker's results update shared health without touching the shared
+     * round.
+     */
+    public RoundCursor newRoundCursor() {
+        return new RoundCursor();
+    }
+
+    /** See {@link #newRoundCursor()}. Not thread-safe for sharing a single
+     *  instance across walkers; create one per walk. */
+    public final class RoundCursor {
+        private final boolean[] attempted = new boolean[hostCount];
+
+        private RoundCursor() {
+        }
+
+        /**
+         * Highest-priority host this cursor has not yet returned, claimed at
+         * pick time; -1 once the cursor has swept every host. Ordering reads
+         * the LIVE shared health state under the tracker lock, so a state
+         * change recorded by any walker between two calls re-ranks the
+         * remaining hosts.
+         */
+        public int next() {
+            synchronized (lock) {
+                for (HostState p : PRIORITY_ORDER) {
+                    for (ZoneTier z : ZONE_PRIORITY_ORDER) {
+                        for (int i = 0; i < hostCount; i++) {
+                            if (!attempted[i] && states[i] == p && zoneTiers[i] == z) {
+                                attempted[i] = true;
+                                return i;
+                            }
+                        }
+                    }
+                }
+                return -1;
+            }
         }
     }
 
diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpQueryClient.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpQueryClient.java
index 1706401e..92b4f6a7 100644
--- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpQueryClient.java
+++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpQueryClient.java
@@ -165,6 +165,9 @@ public class QwpQueryClient implements QuietCloseable {
     private final Random failoverRandom = new Random();
     private long authTimeoutMs = DEFAULT_AUTH_TIMEOUT_MS;
     private String authorizationHeader;
+    // Upper bound (ms) on each TCP connect attempt. 0 (default) falls back to
+    // the OS connect timeout.
+    private int connectTimeoutMs = 0;
     private int bufferPoolSize = DEFAULT_IO_BUFFER_POOL_SIZE;
     private String clientId;
     // Client-configured zone (failover.md §1.1), opaque case-insensitive
@@ -387,6 +390,7 @@ public static QwpQueryClient fromConfig(CharSequence configurationString) {
         Long failoverMaxDurationMs = view.has("failover_max_duration_ms")
                 ? view.getLong("failover_max_duration_ms", 0) : null;
         Long authTimeoutMs = view.has("auth_timeout_ms") ? view.getLong("auth_timeout_ms", 0) : null;
+        Integer connectTimeout = view.has("connect_timeout") ? (int) view.getLong("connect_timeout", 0) : null;
         Long initialCredit = view.has("initial_credit") ? view.getLong("initial_credit", 0) : null;
         int poolSize = view.getInt("buffer_pool_size", DEFAULT_IO_BUFFER_POOL_SIZE);
         String compression = view.getEnum("compression");
@@ -442,6 +446,9 @@ public static QwpQueryClient fromConfig(CharSequence configurationString) {
             if (authTimeoutMs != null) {
                 client.withAuthTimeout(authTimeoutMs);
             }
+            if (connectTimeout != null) {
+                client.withConnectTimeout(connectTimeout);
+            }
             if (initialCredit != null) {
                 client.withInitialCredit(initialCredit);
             }
@@ -497,6 +504,7 @@ public static void validateConfig(ConfigView view, boolean tls) {
         view.getLong("failover_max_duration_ms", -1);
         view.getLong("initial_credit", -1);
         view.getLong("auth_timeout_ms", -1);
+        view.getLong("connect_timeout", -1);
         String username = view.getStr("username");
         String password = view.getStr("password");
         String token = view.getStr("token");
@@ -867,6 +875,7 @@ public java.util.Map<String, Object> configSnapshotForTest() {
         m.put("client_id", clientId);
         m.put("zone", clientZone);
         m.put("auth_timeout_ms", authTimeoutMs);
+        m.put("connect_timeout", connectTimeoutMs);
         m.put("authorization_header", authorizationHeader);
         m.put("tls_verify", tlsValidationMode);
         m.put("tls_roots", trustStorePath);
@@ -994,6 +1003,22 @@ public QwpQueryClient withAuthTimeout(long authTimeoutMs) {
         return this;
     }
 
+    /**
+     * Upper bound, in milliseconds, on establishing the TCP connection to an
+     * endpoint. Unlike {@link #withAuthTimeout(long)} this DOES bound the TCP
+     * connect itself (via a non-blocking connect), so a routing blackhole that
+     * never returns SYN-ACK is aborted within this budget instead of riding the
+     * OS connect timeout. {@code 0} (default) keeps the OS connect timeout.
+     */
+    public QwpQueryClient withConnectTimeout(int connectTimeoutMs) {
+        checkPreConnect("withConnectTimeout");
+        if (connectTimeoutMs <= 0) {
+            throw new IllegalArgumentException("connectTimeoutMs must be > 0");
+        }
+        this.connectTimeoutMs = connectTimeoutMs;
+        return this;
+    }
+
     /**
      * Configures HTTP Basic authentication for the WebSocket upgrade request.
      * The server verifies the credentials against the same user store the
@@ -1369,6 +1394,7 @@ private void connectToEndpoint(Endpoint ep) {
         webSocketClient.setQwpClientId(clientId != null ? clientId : defaultClientId());
         webSocketClient.setQwpAcceptEncoding(buildAcceptEncodingHeader());
         webSocketClient.setQwpMaxBatchRows(maxBatchRows);
+        webSocketClient.setConnectTimeout(connectTimeoutMs);
         runUpgradeWithTimeout(ep);
         negotiatedQwpVersion = webSocketClient.getServerQwpVersion();
         negotiatedZstdLevel = webSocketClient.getServerNegotiatedZstdLevel();
@@ -1745,12 +1771,21 @@ private void reconnectViaTracker() {
     }
 
     private void runUpgradeWithTimeout(Endpoint ep) {
+        // Connect first, OUTSIDE the upgrade try. A connect-phase failure --
+        // including a connect_timeout overage flagged via flagAsTimeout() -- must
+        // keep its own message ("connect timed out ...") and must NOT be relabeled
+        // as an auth_timeout overage below. doConnect() tears down its own socket
+        // on failure; the failover walker treats the propagated HttpClientException
+        // as a transport error and moves on to the next endpoint.
+        webSocketClient.connect(ep.host, ep.port);
+
         int timeoutMs = (int) Math.min(authTimeoutMs, Integer.MAX_VALUE);
         try {
-            webSocketClient.connect(ep.host, ep.port);
             webSocketClient.upgrade(DEFAULT_ENDPOINT_PATH, timeoutMs, authorizationHeader);
         } catch (HttpClientException ex) {
             if (ex.isTimeout()) {
+                // Reachable only for an upgrade/auth-phase timeout now, so the
+                // auth_timeout attribution is accurate.
                 HttpClientException timeout = new HttpClientException("WebSocket upgrade to ")
                         .put(ep.host).put(':').put(ep.port)
                         .put(" exceeded auth_timeout=").put(authTimeoutMs).put("ms");
diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpVersionMismatchException.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpVersionMismatchException.java
index 5323f297..d03cf1ed 100644
--- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpVersionMismatchException.java
+++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpVersionMismatchException.java
@@ -31,10 +31,11 @@
  * {@code X-QWP-Version} outside the client's supported range. Treated as
  * transient at every layer per sf-client.md section 13.3: the per-endpoint
  * round walks to the next host (rolling upgrade can leave one node ahead of
- * or behind its peers), and a full round of mismatches consumes the per-outage
- * reconnect budget. Only after the budget exhausts does the connect loop
- * surface a terminal error -- as {@code PROTOCOL_VIOLATION} via the natural
- * giveup path, not {@code SECURITY_ERROR}.
+ * or behind its peers). The background reconnect loop retries a full round
+ * of mismatches indefinitely (Invariant B: no wall-clock give-up); the
+ * blocking (sync) initial connect consumes its retry budget and surfaces a
+ * {@code LineSenderException} from {@code fromConfig} on exhaustion. Never
+ * classified as {@code SECURITY_ERROR}.
  */
 public final class QwpVersionMismatchException extends HttpClientException {
     public QwpVersionMismatchException(int serverVersion, int clientMaxVersion) {
diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java
index 9b9cc45d..7d6dabe8 100644
--- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java
+++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java
@@ -39,6 +39,7 @@
 import io.questdb.client.cutlass.line.LineSenderException;
 import io.questdb.client.cutlass.line.array.DoubleArray;
 import io.questdb.client.cutlass.line.array.LongArray;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener;
 import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerPool;
 import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine;
 import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop;
@@ -72,6 +73,7 @@
 import java.util.List;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.ReentrantLock;
 
 /**
  * QWP v1 WebSocket client sender for streaming data to QuestDB.
@@ -127,6 +129,9 @@ public class QwpWebSocketSender implements Sender {
     public static final int DEFAULT_AUTO_FLUSH_BYTES = 8 * 1024 * 1024;
     public static final long DEFAULT_AUTO_FLUSH_INTERVAL_NANOS = 100_000_000L; // 100ms
     public static final int DEFAULT_AUTO_FLUSH_ROWS = 1_000;
+    // Finite fallback (ms) for BACKGROUND (drainer) TCP connects when the
+    // user left connect_timeout unset. See effectiveConnectTimeoutMs.
+    public static final int DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS = 15_000;
     private static final int DEFAULT_BUFFER_SIZE = 8192;
     private static final int DEFAULT_MICROBATCH_BUFFER_SIZE = 1024 * 1024; // 1MB
     private static final Logger LOG = LoggerFactory.getLogger(QwpWebSocketSender.class);
@@ -148,12 +153,29 @@ public class QwpWebSocketSender implements Sender {
     private final List<Endpoint> endpoints;
     // Global symbol dictionary for delta encoding
     private final GlobalSymbolDictionary globalSymbolDictionary;
+    // Serializes FOREGROUND connect walks only (see buildAndConnect): the
+    // shared-round state in hostTracker (pickNext/beginRound/attempted
+    // bits), roundSeq, roundConnectAttemptSeq, and the foreground lifecycle
+    // commits (currentEndpointIdx, hasEverConnected, cap-derived sizing)
+    // all have exactly one writer -- the foreground walk -- and foreground
+    // walks cannot overlap by construction (the I/O loop is single-threaded
+    // and the user-thread initial connect completes before the loop
+    // starts); the lock is cheap insurance for that invariant. Background
+    // (drainer) walks take NO lock at all: they walk a private
+    // QwpHostHealthTracker.RoundCursor and record health-only results, so
+    // no network I/O ever runs under a sender-wide lock for background
+    // work, and neither the foreground's reconnect nor close() can queue
+    // behind a drainer's endpoint walk.
+    private final ReentrantLock connectWalkLock = new ReentrantLock();
     private final QwpHostHealthTracker hostTracker;
     private final CharSequenceObjHashMap<QwpTableBuffer> tableBuffers;
     // null means plain text (no TLS)
     private final ClientTlsConfiguration tlsConfig;
     private MicrobatchBuffer activeBuffer;
     private long authTimeoutMs = DEFAULT_AUTH_TIMEOUT_MS;
+    // Upper bound (ms) on each TCP connect attempt. 0 (default) falls back to
+    // the OS connect timeout. Applied to every WebSocketClient before connect.
+    private int connectTimeoutMs = 0;
     // Double-buffering for async I/O
     private MicrobatchBuffer buffer0;
     // Cached column references to avoid repeated hashmap lookups
@@ -161,6 +183,12 @@ public class QwpWebSocketSender implements Sender {
     private QwpTableBuffer.ColumnBuffer cachedTimestampNanosColumn;
     // WebSocket client (zero-GC native implementation)
     private WebSocketClient client;
+    // Test seam: when non-null, buildAndConnect obtains its per-attempt
+    // client here instead of WebSocketClientFactory, so JVM-error cleanup
+    // tests can observe close() on a client whose connect() throws Error.
+    // Null in production; set reflectively by tests.
+    @TestOnly
+    private volatile java.util.function.Supplier<WebSocketClient> clientFactoryOverride;
     // close() drain timeout in millis. Default applied at construction.
     // 0 or -1 means "fast close" (skip the drain); otherwise close blocks
     // up to this many millis for ackedFsn to catch up to publishedFsn.
@@ -193,6 +221,11 @@ public class QwpWebSocketSender implements Sender {
     private CursorSendEngine cursorEngine;
     private CursorWebSocketSendLoop cursorSendLoop;
     private boolean deferCommit;
+    // User-supplied observer for background orphan-slot drainer events.
+    // Volatile: written by setDrainerListener (any thread, before or after
+    // startOrphanDrainers) and read at pool-creation time. Null -> drainers
+    // run without a listener.
+    private volatile BackgroundDrainerListener drainerListener;
     // Orphan-slot drainer pool. Non-null only when the builder requested
     // drain_orphans=true AND we have a slot path to scan against. Closed
     // alongside the cursor send loop in close().
@@ -208,7 +241,8 @@ public class QwpWebSocketSender implements Sender {
     // advertised X-QWP-Max-Batch-Size at handshake so the wire payload stays
     // under the server's cap even with encoding overhead. Volatile because the
     // I/O thread writes this inside buildAndConnect on every successful
-    // (re)connect while the producer thread reads it from sendRow without
+    // FOREGROUND (re)connect -- background drainer connects never touch it --
+    // while the producer thread reads it from sendRow without
     // holding the sender monitor.
     private volatile int effectiveAutoFlushBytes;
     private SenderErrorDispatcher errorDispatcher;
@@ -219,18 +253,20 @@ public class QwpWebSocketSender implements Sender {
     private int errorInboxCapacity = SenderErrorDispatcher.DEFAULT_CAPACITY;
     private long firstPendingRowTimeNanos;
     private boolean hasDeferredMessages;
-    // Stickys true once any successful connect has happened. Drives the
+    // Stickys true once any successful FOREGROUND connect has happened
+    // (background drainer connects never set it). Drives the
     // CONNECTED-vs-RECONNECTED-vs-FAILED_OVER classification at the success
     // point in buildAndConnect.
     private boolean hasEverConnected;
     // OFF   → startup connect failure is immediately terminal (default).
-    // SYNC  → startup connect goes through the same retry-with-backoff
-    //         loop as in-flight reconnect; auth failures still terminal.
+    // SYNC  → startup connect retries with backoff on the user thread,
+    //         bounded by reconnect_max_duration_millis; auth failures
+    //         still terminal.
     // ASYNC → user thread does not connect at all. The I/O thread runs
-    //         the same retry loop in the background; terminal failures
-    //         (auth/upgrade reject, budget exhaustion) are delivered
-    //         to the SenderError dispatcher rather than thrown from the
-    //         constructor.
+    //         the reconnect loop in the background, indefinitely
+    //         (Invariant B); terminal failures (auth/upgrade reject)
+    //         are delivered to the SenderError dispatcher rather than
+    //         thrown from the constructor.
     private Sender.InitialConnectMode initialConnectMode = Sender.InitialConnectMode.OFF;
     private boolean ownsCursorEngine;
     private long pendingBytes;
@@ -255,8 +291,9 @@ public class QwpWebSocketSender implements Sender {
             CursorWebSocketSendLoop.DEFAULT_RECONNECT_MAX_DURATION_MILLIS;
     private boolean requestDurableAck;
     // Monotonic per-attempt counter snapshotted onto every connection event
-    // fired from buildAndConnect. Counts every endpoint try -- successes and
-    // failures alike -- across this sender's lifetime.
+    // fired from buildAndConnect. Counts every FOREGROUND endpoint try --
+    // successes and failures alike -- across this sender's lifetime.
+    // Background (drainer) walks fire no events and do not advance it.
     private long roundConnectAttemptSeq;
     // Monotonic per-round counter incremented inside buildAndConnect on each
     // beginRound(true) call. roundSeq=1 is the first round; CONNECTED in the
@@ -267,7 +304,8 @@ public class QwpWebSocketSender implements Sender {
     // arbitrarily large datasets that exceed the server's recv buffer.
     private boolean transactional;
     // Server-advertised hard cap on QWP ingest payload bytes, captured from
-    // X-QWP-Max-Batch-Size on each successful handshake. 0 when the server
+    // X-QWP-Max-Batch-Size on each successful FOREGROUND handshake (a
+    // background drainer's endpoint cap is irrelevant to the producer's wire). 0 when the server
     // did not advertise the header (older builds); the sender then falls back
     // to its locally configured budget. Volatile because buildAndConnect can
     // refresh this from the cursor I/O thread on a mid-stream reconnect while
@@ -577,7 +615,7 @@ public static QwpWebSocketSender connect(
                 reconnectInitialBackoffMillis, reconnectMaxBackoffMillis,
                 initialConnectMode, errorHandler, errorInboxCapacity,
                 durableAckKeepaliveIntervalMillis, authTimeoutMs,
-                null, SenderConnectionDispatcher.DEFAULT_CAPACITY);
+                0, null, SenderConnectionDispatcher.DEFAULT_CAPACITY);
     }
 
     /**
@@ -602,6 +640,7 @@ public static QwpWebSocketSender connect(
             int errorInboxCapacity,
             long durableAckKeepaliveIntervalMillis,
             long authTimeoutMs,
+            int connectTimeoutMs,
             SenderConnectionListener connectionListener,
             int connectionListenerInboxCapacity
     ) {
@@ -613,6 +652,7 @@ public static QwpWebSocketSender connect(
         try {
             sender.requestDurableAck = requestDurableAck;
             sender.authTimeoutMs = authTimeoutMs;
+            sender.connectTimeoutMs = connectTimeoutMs;
             sender.closeFlushTimeoutMillis = closeFlushTimeoutMillis;
             sender.reconnectMaxDurationMillis = reconnectMaxDurationMillis;
             sender.reconnectInitialBackoffMillis = reconnectInitialBackoffMillis;
@@ -918,6 +958,31 @@ public QwpWebSocketSender charColumn(CharSequence columnName, char value) {
         return this;
     }
 
+    /**
+     * Closes the sender: flushes user-thread state into the engine, drains
+     * acked data within {@code close_flush_timeout}, stops the I/O loop,
+     * closes the orphan-drainer pool, and frees buffers.
+     * <p>
+     * Worst-case latency budget (dominant contributors, sequential):
+     * <ul>
+     *   <li>bounded drain: up to {@code close_flush_timeout} when the server
+     *       is slow or unreachable ({@code <= 0} opts out);</li>
+     *   <li>I/O loop stop: the shutdown-latch await is untimed, but the loop
+     *       exits promptly unless the I/O thread sits inside a blocking
+     *       native connect — bounded by {@code connect_timeout}, or by the
+     *       OS SYN-retry deadline (60-130s on Linux) when the default
+     *       {@code 0} is in effect. Background drainer walks never delay
+     *       this stop: they run lock-free on private round cursors and
+     *       never hold anything the foreground waits on (see
+     *       {@link #buildAndConnect});</li>
+     *   <li>drainer pool: drainers still in their connect-retry phase are
+     *       stop-signaled immediately (exit within ~50ms); drainers actively
+     *       replaying frames get a 2.5s grace window plus a 0.5s stop window
+     *       — worst case ~3s when a drainer sits in a blocking native
+     *       connect (15s background deadline) and must be abandoned to exit
+     *       on its own.</li>
+     * </ul>
+     */
     @Override
     public void close() {
         if (!closed) {
@@ -1014,10 +1079,14 @@ public void close() {
                     terminalError = captureCloseError(terminalError, e);
                 }
             }
-            // Drainer pool runs after the foreground I/O loop is wound
-            // down — drainers don't share state with the foreground, so
-            // ordering doesn't matter for correctness, just predictable
-            // shutdown.
+            // Drainer pool closes after the foreground I/O loop is wound
+            // down. Drainers share buildAndConnect's endpoint walk and
+            // hostTracker state with the foreground (never its observable
+            // connection state or event stream), but their
+            // connect gate is their own stop flag — NOT the foreground
+            // loop's liveness — so the pool's graceful-drain window below
+            // still lets in-flight drainers finish (including reconnects)
+            // even though cursorSendLoop is already stopped.
             if (drainerPool != null) {
                 try {
                     drainerPool.close();
@@ -1048,6 +1117,27 @@ public void close() {
                 // The I/O thread may still be using the socket and microbatch
                 // buffers. Freeing them would risk SIGSEGV.
                 LOG.error("I/O thread is still running, leaking WebSocket client and microbatch buffers");
+                // The engine, however, need not leak: delegate its close to
+                // the I/O thread's exit path, which runs it strictly after
+                // the thread's last engine access — the mapping and slot
+                // lock release as soon as the stuck wire call resolves
+                // (bounded by OS timeouts). slotLockReleased intentionally
+                // stays false: the lock is released only when the delegated
+                // close actually runs, so the pool must not reuse the slot
+                // meanwhile. A false return means the thread exited between
+                // the failed close() and now — then closing here is safe.
+                if (ownsCursorEngine && cursorEngine != null && cursorSendLoop != null
+                        && !cursorSendLoop.delegateEngineClose()) {
+                    try {
+                        cursorEngine.close();
+                    } catch (Throwable t) {
+                        LOG.error("Error closing owned CursorSendEngine: {}", String.valueOf(t));
+                        terminalError = captureCloseError(terminalError, t);
+                    }
+                    cursorEngine = null;
+                    ownsCursorEngine = false;
+                    slotLockReleased = true;
+                }
                 rethrowTerminal(terminalError);
                 return;
             }
@@ -1953,6 +2043,30 @@ public CursorWebSocketSendLoop.ReconnectFactory newReconnectFactory() {
         return new ReconnectSupplier();
     }
 
+    /**
+     * Test seam: a BACKGROUND reconnect factory identical to the ones
+     * {@link #startOrphanDrainers} hands to orphan drainers (abort gate =
+     * the supplied stop flag, {@code isBackground()=true}), so tests can
+     * exercise the background side of the connect-walk lock policy (see
+     * {@link #buildAndConnect}) without reflection.
+     */
+    @TestOnly
+    public CursorWebSocketSendLoop.ReconnectFactory newBackgroundReconnectFactory(
+            java.util.function.BooleanSupplier stopFlag
+    ) {
+        return new ReconnectSupplier(stopFlag, "drainer stop requested during connect");
+    }
+
+    /**
+     * Test seam: installs the per-attempt WebSocket client factory override
+     * consulted by {@code newWebSocketClient()} inside the connect walk.
+     * Production code never sets it.
+     */
+    @TestOnly
+    public void setClientFactoryOverride(java.util.function.Supplier<WebSocketClient> factory) {
+        this.clientFactoryOverride = factory;
+    }
+
     @Override
     public void reset() {
         checkNotClosed();
@@ -2035,6 +2149,33 @@ public void setCursorEngine(CursorSendEngine engine, boolean takeOwnership) {
         this.ownsCursorEngine = takeOwnership && engine != null;
     }
 
+    /**
+     * Register an async observer for background orphan-slot drainer events.
+     * May be called either before or after {@link #startOrphanDrainers} —
+     * when called before, the drainer pool picks it up as its submit-time
+     * default; when called after, it propagates to the pool AND to every
+     * live drainer (per-drainer re-assignment while running is explicitly
+     * permitted by the drainer's listener contract). Pass {@code null} to
+     * clear. {@code synchronized} to coordinate with
+     * {@code startOrphanDrainers}: a concurrent submit either observes the
+     * pool listener already set or is covered by the snapshot propagation.
+     */
+    public synchronized void setDrainerListener(BackgroundDrainerListener listener) {
+        this.drainerListener = listener;
+        BackgroundDrainerPool pool = drainerPool;
+        if (pool != null) {
+            // Submit-time fallback for drainers not yet submitted...
+            pool.setListener(listener);
+            // ...and direct re-assignment for the ones already running (the
+            // pool listener is only applied at submit time, never after).
+            ObjList<io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer> live =
+                    pool.snapshot();
+            for (int i = 0, n = live.size(); i < n; i++) {
+                live.getQuick(i).setListener(listener);
+            }
+        }
+    }
+
     /**
      * Configure the user-supplied error handler. May be called either before
      * or after {@code connect()} — when called after, the change propagates
@@ -2133,18 +2274,42 @@ public synchronized void startOrphanDrainers(
         if (drainerPool == null) {
             drainerPool = new io.questdb.client.cutlass.qwp.client.sf.cursor
                     .BackgroundDrainerPool(maxBackgroundDrainers);
+            // Install the user listener as the pool's submit-time default so
+            // the drainers submitted below observe it from their first event.
+            drainerPool.setListener(this.drainerListener);
         }
         for (int i = 0, n = orphanSlotPaths.size(); i < n; i++) {
             String slot = orphanSlotPaths.get(i);
+            // The drainer's connects must NOT be gated on the foreground
+            // sender's lifecycle: close() stops the foreground I/O loop
+            // BEFORE the drainer pool's graceful-drain window, so a
+            // foreground-gated factory would reject every drainer
+            // (re)connect with "sender closed during connect" during that
+            // window, leaving the orphan slot un-drained (and Invariant B
+            // forbids quarantining it on a transport-shaped error). Gate
+            // each drainer's factory on the drainer's OWN stop flag
+            // instead. The one-element array breaks the construction cycle
+            // (the factory needs the drainer, the drainer's constructor
+            // needs the factory); the ref write happens-before the drainer
+            // runs because submit() publishes the task afterwards.
+            final io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer[] ref =
+                    new io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer[1];
+            ReconnectSupplier factory = new ReconnectSupplier(
+                    () -> {
+                        io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer d = ref[0];
+                        return d != null && d.isStopRequested();
+                    },
+                    "drainer stop requested during connect");
             io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer drainer =
                     new io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer(
                             slot, segmentSizeBytes, sfMaxTotalBytes,
-                            newReconnectFactory(),
+                            factory,
                             reconnectMaxDurationMillis,
                             reconnectInitialBackoffMillis,
                             reconnectMaxBackoffMillis,
                             requestDurableAck,
                             durableAckKeepaliveIntervalMillis);
+            ref[0] = drainer;
             drainerPool.submit(drainer);
         }
     }
@@ -2282,7 +2447,7 @@ public QwpWebSocketSender uuidColumn(CharSequence columnName, long lo, long hi)
      * True iff this sender has at least once installed a live (connected
      * + upgraded) WebSocket. Sticky — once true, stays true even after a
      * subsequent disconnect. Lets a {@link SenderErrorHandler}
-     * disambiguate a "never reached the server" budget exhaustion (likely
+     * disambiguate a "never reached the server" terminal failure (likely
      * a config typo or firewall block) from a "lost connection after we
      * were up" failure (likely transient). Returns {@code false} if no
      * I/O loop is running.
@@ -2389,25 +2554,136 @@ private void atNanos(long timestampNanos) {
         sendRow();
     }
 
-    private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) {
+    /**
+     * Resolves the connect timeout for one {@code buildAndConnect} walk.
+     * Foreground connects honour the configured value verbatim: 0 (the
+     * default) keeps the historical untimed native connect, bounded only by
+     * the OS (SYN retries, 60-130s on Linux). Background (drainer) connects
+     * get a finite fallback instead: during an outage a drainer is routinely
+     * parked inside a blocking native connect that neither unpark nor
+     * interrupt cancels, so the drainer pool's shutdownNow path (~3s into
+     * sender.close()) reliably lands on the failed-stop protocol -- the
+     * WebSocket client and microbatch buffers are deliberately leaked and
+     * the slot lock is held until the OS deadline resolves the connect. A
+     * finite background deadline bounds that window to seconds without
+     * changing foreground semantics. Exposed for unit tests.
+     */
+    @TestOnly
+    public static int effectiveConnectTimeoutMs(boolean background, int configuredMs) {
+        return background && configuredMs <= 0 ? DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS : configuredMs;
+    }
+
+    /**
+     * Builds the per-attempt WebSocket client for {@link #buildAndConnect}.
+     * Production path delegates to {@link WebSocketClientFactory}; tests may
+     * install {@link #clientFactoryOverride} to substitute a stub.
+     */
+    private WebSocketClient newWebSocketClient() {
+        java.util.function.Supplier<WebSocketClient> override = clientFactoryOverride;
+        if (override != null) {
+            return override.get();
+        }
+        return tlsConfig != null
+                ? WebSocketClientFactory.newTlsInstance(tlsConfig)
+                : WebSocketClientFactory.newPlainTextInstance();
+    }
+
+    /**
+     * Multi-endpoint connect walk shared by the foreground sender and the
+     * background orphan drainers. One invocation sweeps the endpoint list,
+     * performing a TCP/TLS connect plus a WebSocket upgrade per endpoint;
+     * worst-case sweep duration is
+     * {@code endpoints x (connect timeout + upgrade timeout)}:
+     * <ul>
+     *   <li>foreground walk: {@code connect_timeout} verbatim -- the default
+     *       {@code 0} keeps the untimed native connect, bounded only by the
+     *       OS SYN-retry deadline (60-130s per endpoint on Linux) -- plus
+     *       {@code auth_timeout_ms} (default 15s) for the upgrade;</li>
+     *   <li>background walk: 15s connect fallback
+     *       ({@link #DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS}) plus
+     *       {@code auth_timeout_ms} -- see
+     *       {@link #effectiveConnectTimeoutMs(boolean, int)}.</li>
+     * </ul>
+     * <p>
+     * Concurrency policy -- no network I/O under a sender-wide lock for
+     * background work. FOREGROUND walks (the producer's initial connect and
+     * the I/O loop's reconnects) hold {@link #connectWalkLock} across the
+     * sweep: they own the shared round state and the lifecycle commits, and
+     * can only ever wait behind another foreground walk (which cannot
+     * happen by construction -- the lock is insurance). BACKGROUND (drainer)
+     * walks take NO lock: each sweeps a private
+     * {@link QwpHostHealthTracker.RoundCursor} -- full sweep, claim-at-pick,
+     * ordered by the live shared health state -- and records results with
+     * the health-only overloads ({@code markRoundAttempted=false}), so
+     * concurrent drainer sweeps proceed in parallel with each other and
+     * with the foreground, share health observations, and can neither
+     * consume nor poison the foreground's round. The foreground's
+     * reconnect and {@code close()} paths are therefore never queued
+     * behind a drainer's endpoint walk.
+     */
+    private WebSocketClient buildAndConnect(ReconnectSupplier ctx) {
+        if (ctx.isBackground()) {
+            // Lock-free: the walk below touches only internally-synchronized
+            // hostTracker health state and walk-local/cursor-local state on
+            // the background path.
+            return connectWalk(ctx);
+        }
+        connectWalkLock.lock();
+        try {
+            return connectWalk(ctx);
+        } finally {
+            connectWalkLock.unlock();
+        }
+    }
+
+    private WebSocketClient connectWalk(ReconnectSupplier ctx) {
+        // Background (drainer) factories share this connect walk -- endpoint
+        // list and hostTracker HEALTH state (never the shared round: a
+        // background sweep walks its own RoundCursor and records with
+        // markRoundAttempted=false, so it cannot consume the foreground's
+        // round or skew roundSeq) -- but must stay INVISIBLE
+        // in the foreground sender's observable state. SenderConnectionEvents
+        // describe the FOREGROUND connection's lifecycle, and the cap-derived
+        // sizing (serverMaxBatchSize / effectiveAutoFlushBytes) guards the
+        // FOREGROUND wire: a drainer connect that committed either would
+        // fabricate lifecycle transitions the foreground never had, steal the
+        // once-per-lifetime CONNECTED classification, and re-size the
+        // producer's batch guard for a connection the producer is not on
+        // (oversize batch -> ws-close[1009] -> producer-terminal HALT caused
+        // by background activity).
+        final boolean background = ctx.isBackground();
+        // Private full-sweep cursor for background walks: claim-at-pick over
+        // cursor-local attempted bits makes the pick -> record pair safe
+        // without any walk-wide lock, and guarantees every sweep tries every
+        // endpoint exactly once regardless of concurrent walkers.
+        final QwpHostHealthTracker.RoundCursor cursor =
+                background ? hostTracker.newRoundCursor() : null;
         int previousIdx = ctx.previousIdx;
         if (previousIdx >= 0) {
             // Mid-stream wire failure -- the I/O loop just observed the active
-            // connection drop and called us via the reconnect factory. Surface
-            // a DISCONNECTED event identifying which endpoint just went away
-            // before we start the per-endpoint walk for a replacement.
-            Endpoint priorEp = endpoints.get(previousIdx);
-            dispatchConnectionEvent(
-                    SenderConnectionEvent.Kind.DISCONNECTED,
-                    priorEp.host, priorEp.port,
-                    null, SenderConnectionEvent.NO_PORT,
-                    SenderConnectionEvent.NO_ATTEMPT_NUMBER,
-                    roundSeq,
-                    null);
+            // connection drop and called us via the reconnect factory. Only a
+            // FOREGROUND drop surfaces DISCONNECTED: a drainer's wire drop is
+            // not a foreground outage, and reporting it would claim an outage
+            // against an endpoint the foreground may be healthily using. The
+            // hostTracker health penalty is recorded either way -- the drop
+            // was real, whichever loop observed it.
+            if (!background) {
+                Endpoint priorEp = endpoints.get(previousIdx);
+                dispatchConnectionEvent(
+                        SenderConnectionEvent.Kind.DISCONNECTED,
+                        priorEp.host, priorEp.port,
+                        null, SenderConnectionEvent.NO_PORT,
+                        SenderConnectionEvent.NO_ATTEMPT_NUMBER,
+                        roundSeq,
+                        null);
+            }
             hostTracker.recordMidStreamFailure(previousIdx);
             ctx.previousIdx = -1;
         }
-        if (hostTracker.isRoundExhausted()) {
+        // Shared-round lifecycle is foreground-only: a background walk must
+        // not advance the round (or roundSeq, which numbers foreground
+        // events) under the foreground's feet.
+        if (!background && hostTracker.isRoundExhausted()) {
             roundSeq++;
             hostTracker.beginRound(true);
         }
@@ -2424,21 +2700,25 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) {
         QwpIngressRoleRejectedException lastRoleReject = null;
         Endpoint lastEndpoint = null;
         while (true) {
-            if (cursorSendLoop == null ? closed : !cursorSendLoop.isRunning()) {
-                throw new LineSenderException("sender closed during connect");
+            if (ctx.isAborted()) {
+                throw new LineSenderException(ctx.abortMessage());
             }
-            int idx = hostTracker.pickNext();
+            int idx = background ? cursor.next() : hostTracker.pickNext();
             if (idx < 0) break;
             Endpoint ep = endpoints.get(idx);
             lastEndpoint = ep;
-            long attemptNumber = ++roundConnectAttemptSeq;
-            WebSocketClient newClient = tlsConfig != null
-                    ? WebSocketClientFactory.newTlsInstance(tlsConfig)
-                    : WebSocketClientFactory.newPlainTextInstance();
+            // Attempt numbers exist for foreground observability only. A
+            // background walk fires no events and must not skew the numbering
+            // the user sees on subsequent foreground events.
+            long attemptNumber = background
+                    ? SenderConnectionEvent.NO_ATTEMPT_NUMBER
+                    : ++roundConnectAttemptSeq;
+            WebSocketClient newClient = newWebSocketClient();
             try {
                 newClient.setQwpMaxVersion(QwpConstants.VERSION);
                 newClient.setQwpClientId(QwpConstants.CLIENT_ID);
                 newClient.setQwpRequestDurableAck(requestDurableAck);
+                newClient.setConnectTimeout(effectiveConnectTimeoutMs(background, connectTimeoutMs));
                 newClient.connect(ep.host, ep.port);
                 int upgradeTimeoutMs = (int) Math.min(authTimeoutMs, Integer.MAX_VALUE);
                 newClient.upgrade(WRITE_PATH, upgradeTimeoutMs, authorizationHeader);
@@ -2447,13 +2727,15 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) {
                 newClient.close();
                 if (classified instanceof QwpIngressRoleRejectedException) {
                     QwpIngressRoleRejectedException re = (QwpIngressRoleRejectedException) classified;
-                    hostTracker.recordRoleReject(idx, re.isTransient());
+                    hostTracker.recordRoleReject(idx, re.isTransient(), !background);
                     lastError = re;
                     lastRoleReject = re;
-                    dispatchConnectionEvent(
-                            SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED,
-                            ep.host, ep.port, null, SenderConnectionEvent.NO_PORT,
-                            attemptNumber, roundSeq, re);
+                    if (!background) {
+                        dispatchConnectionEvent(
+                                SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED,
+                                ep.host, ep.port, null, SenderConnectionEvent.NO_PORT,
+                                attemptNumber, roundSeq, re);
+                    }
                     continue;
                 }
                 if (classified instanceof QwpAuthFailedException) {
@@ -2463,10 +2745,12 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) {
                     // moment the I/O thread gives up, ahead of the producer
                     // thread learning via LineSenderException on the next
                     // API call.
-                    dispatchConnectionEvent(
-                            SenderConnectionEvent.Kind.AUTH_FAILED,
-                            ep.host, ep.port, null, SenderConnectionEvent.NO_PORT,
-                            attemptNumber, roundSeq, classified);
+                    if (!background) {
+                        dispatchConnectionEvent(
+                                SenderConnectionEvent.Kind.AUTH_FAILED,
+                                ep.host, ep.port, null, SenderConnectionEvent.NO_PORT,
+                                attemptNumber, roundSeq, classified);
+                    }
                     throw classified;
                 }
                 if (terminalUpgradeError == null && (
@@ -2475,41 +2759,76 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) {
                                 && !((WebSocketUpgradeException) classified).isRoleMismatch()))) {
                     terminalUpgradeError = classified;
                 }
-                hostTracker.recordTransportError(idx);
+                hostTracker.recordTransportError(idx, !background);
                 lastError = classified;
-                dispatchConnectionEvent(
-                        SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED,
-                        ep.host, ep.port, null, SenderConnectionEvent.NO_PORT,
-                        attemptNumber, roundSeq, classified);
+                if (!background) {
+                    dispatchConnectionEvent(
+                            SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED,
+                            ep.host, ep.port, null, SenderConnectionEvent.NO_PORT,
+                            attemptNumber, roundSeq, classified);
+                }
                 continue;
             } catch (Exception e) {
                 newClient.close();
-                hostTracker.recordTransportError(idx);
+                hostTracker.recordTransportError(idx, !background);
                 lastError = e;
-                dispatchConnectionEvent(
-                        SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED,
-                        ep.host, ep.port, null, SenderConnectionEvent.NO_PORT,
-                        attemptNumber, roundSeq, e);
+                if (!background) {
+                    dispatchConnectionEvent(
+                            SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED,
+                            ep.host, ep.port, null, SenderConnectionEvent.NO_PORT,
+                            attemptNumber, roundSeq, e);
+                }
                 continue;
+            } catch (Error e) {
+                // JVM failure (OOM, LinkageError, StackOverflowError) during
+                // connect/upgrade. Without this catch the half-built client
+                // escaped with its fd and native buffers open -- unreachable
+                // by GC, freed only in close(). Close it quietly: under OOM
+                // close() itself can throw, and a secondary failure must not
+                // mask the original Error. Deliberately NO hostTracker penalty
+                // and NO ENDPOINT_ATTEMPT_FAILED event -- a JVM failure is not
+                // endpoint health data, and misclassifying it would poison the
+                // walk. Rethrow: every retry loop upstream (connectWithRetry,
+                // the cursor reconnect loop, BackgroundDrainer) rethrows Error
+                // rather than retrying, so this stays a loud one-shot failure.
+                try {
+                    newClient.close();
+                } catch (Throwable ignored) {
+                    // best-effort; the original Error is what must surface
+                }
+                throw e;
             }
             if (requestDurableAck && !newClient.isServerDurableAckEnabled()) {
                 newClient.close();
-                hostTracker.recordRoleReject(idx, false);
+                hostTracker.recordRoleReject(idx, false, !background);
                 QwpDurableAckMismatchException ackErr = new QwpDurableAckMismatchException(
                         ep.host, ep.port, null);
                 if (terminalUpgradeError == null) {
                     terminalUpgradeError = ackErr;
                 }
                 lastError = ackErr;
-                dispatchConnectionEvent(
-                        SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED,
-                        ep.host, ep.port, null, SenderConnectionEvent.NO_PORT,
-                        attemptNumber, roundSeq, ackErr);
+                if (!background) {
+                    dispatchConnectionEvent(
+                            SenderConnectionEvent.Kind.ENDPOINT_ATTEMPT_FAILED,
+                            ep.host, ep.port, null, SenderConnectionEvent.NO_PORT,
+                            attemptNumber, roundSeq, ackErr);
+                }
                 continue;
             }
-            int previousLiveIdx = currentEndpointIdx;
-            hostTracker.recordSuccess(idx);
+            hostTracker.recordSuccess(idx, !background);
             ctx.previousIdx = idx;
+            if (background) {
+                // Walk bookkeeping only: recordSuccess feeds the shared health
+                // tracker and ctx.previousIdx arms this factory's own
+                // mid-stream-failure handling on its next reconnect. No
+                // lifecycle event, no CONNECTED/RECONNECTED/FAILED_OVER
+                // classification state, no producer batch re-sizing -- the
+                // drainer's lifecycle is observable via
+                // BackgroundDrainerListener and the drainer counters, never
+                // the foreground connection-event stream.
+                return newClient;
+            }
+            int previousLiveIdx = currentEndpointIdx;
             currentEndpointIdx = idx;
             // Classify the success. CONNECTED only fires once per sender
             // lifetime; subsequent successes are RECONNECTED (same endpoint
@@ -2550,7 +2869,7 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) {
         // which terminal branch fires next. The connectLoop wrapper retries,
         // and each retry that re-enters this method and fails again produces
         // its own ALL_ENDPOINTS_UNREACHABLE event.
-        if (lastEndpoint != null) {
+        if (!background && lastEndpoint != null) {
             dispatchConnectionEvent(
                     SenderConnectionEvent.Kind.ALL_ENDPOINTS_UNREACHABLE,
                     lastEndpoint.host, lastEndpoint.port,
@@ -2561,21 +2880,23 @@ private synchronized WebSocketClient buildAndConnect(ReconnectSupplier ctx) {
             throw terminalUpgradeError;
         }
         if (lastRoleReject != null) {
-            // When the client opted into durable ack but every endpoint
-            // role-rejected the /write/v4 upgrade (typically a misconfigured
-            // address list pointing at replicas only), a primary that can
-            // serve durable ack will not appear by retrying. Throw the typed
-            // QwpDurableAckMismatchException -- the cursor send loop's terminal
-            // classifier recognises it by instanceof and suppresses retry, so
-            // the SYNC/ASYNC connect paths fail fast instead of burning the
-            // full reconnect_max_duration_millis budget walking the same
-            // replicas.
-            if (requestDurableAck) {
-                QwpDurableAckMismatchException ackErr = new QwpDurableAckMismatchException(
-                        lastRoleReject.getHost(), lastRoleReject.getPort(), lastRoleReject.getRole());
-                ackErr.initCause(lastRoleReject);
-                throw ackErr;
-            }
+            // Every endpoint role-rejected the /write/v4 upgrade: right now the
+            // reachable nodes are all replicas (or primary-catchup). That is a
+            // TRANSIENT failover window, not a terminal condition -- a replica
+            // can be promoted and a primary will reappear. Surface it as a
+            // retriable QwpRoleMismatchException so the SYNC/ASYNC connect and
+            // reconnect loops keep the rows in store-and-forward and retry
+            // within reconnect_max_duration_millis (for an SF sender the only
+            // terminal condition is SF exhaustion).
+            //
+            // This holds even when durable ack was requested: a replica that
+            // gets promoted serves durable ack, so an all-replica window must
+            // NOT be reported as a durable-ack mismatch. Doing so conflated a
+            // transient role state with a permanent capability gap and hard-
+            // failed HA senders that should have recovered on promotion. A
+            // genuine capability gap -- an endpoint that upgrades but does not
+            // advertise durable ack -- is still terminal: it is raised as
+            // terminalUpgradeError above, before this block.
             QwpRoleMismatchException ex = new QwpRoleMismatchException(
                     QwpIngressRoleRejectedException.ROLE_PRIMARY,
                     null,
@@ -2811,8 +3132,9 @@ private void ensureConnected() {
                 // version today). Frames written before the first successful
                 // connect commit to V1 because cursor segments are immutable;
                 // a future version bump must account for that. Auth/upgrade
-                // rejects and budget exhaustion are surfaced via the error
-                // inbox by the I/O thread, not thrown here.
+                // rejects are surfaced via the error inbox by the I/O
+                // thread, not thrown here; plain connect failures retry
+                // indefinitely (Invariant B).
                 client = null;
                 break;
             case OFF:
@@ -2854,10 +3176,11 @@ private void ensureConnected() {
             }
             cursorSendLoop.setProgressDispatcher(progressDispatcher);
             // Connection-event dispatcher: lets the cursor I/O loop fire
-            // DISCONNECTED on outage entry and RECONNECT_BUDGET_EXHAUSTED on
-            // budget exit. Sender-side fire points (buildAndConnect) write
-            // directly to connectionDispatcher; this getter just shares the
-            // same instance with the loop.
+            // DISCONNECTED on outage entry. Sender-side fire points
+            // (buildAndConnect) write directly to connectionDispatcher; this
+            // getter just shares the same instance with the loop. (Invariant B:
+            // the loop no longer fires a terminal budget-exhaustion event -- it
+            // retries indefinitely.)
             cursorSendLoop.setConnectionDispatcher(connectionDispatcher);
             cursorSendLoop.start();
         } catch (Throwable t) {
@@ -3326,8 +3649,48 @@ public Endpoint(String host, int port) {
     }
 
     private final class ReconnectSupplier implements CursorWebSocketSendLoop.ReconnectFactory {
+        /**
+         * Optional caller-owned liveness gate. {@code null} means this factory
+         * serves the foreground sender and aborts when the foreground I/O loop
+         * stops. Non-null means the factory serves a {@code BackgroundDrainer}:
+         * the drainer must be able to (re)connect during the sender's close
+         * sequence (the drainer pool's graceful-drain window runs AFTER the
+         * foreground loop is stopped), so its gate is the drainer's own stop
+         * flag, supplied here, instead of the foreground loop's state.
+         */
+        private final java.util.function.BooleanSupplier abortCheck;
+        private final String abortMessage;
         private int previousIdx = -1;
 
+        private ReconnectSupplier() {
+            this(null, null);
+        }
+
+        private ReconnectSupplier(java.util.function.BooleanSupplier abortCheck, String abortMessage) {
+            this.abortCheck = abortCheck;
+            this.abortMessage = abortMessage;
+        }
+
+        String abortMessage() {
+            return abortCheck != null ? abortMessage : "sender closed during connect";
+        }
+
+        /**
+         * True when this factory serves a background drainer. Background
+         * connects share buildAndConnect's endpoint walk and hostTracker
+         * health state, but commit none of the foreground sender's
+         * observable connection state and fire no connection events.
+         */
+        boolean isBackground() {
+            return abortCheck != null;
+        }
+
+        boolean isAborted() {
+            return abortCheck != null
+                    ? abortCheck.getAsBoolean()
+                    : (cursorSendLoop == null ? closed : !cursorSendLoop.isRunning());
+        }
+
         @Override
         public WebSocketClient reconnect() {
             return buildAndConnect(this);
diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainer.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainer.java
index d54a01dc..d3e42602 100644
--- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainer.java
+++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainer.java
@@ -25,7 +25,12 @@
 package io.questdb.client.cutlass.qwp.client.sf.cursor;
 
 import io.questdb.client.cutlass.http.client.WebSocketClient;
+import io.questdb.client.cutlass.http.client.WebSocketUpgradeException;
+import io.questdb.client.cutlass.qwp.client.QwpAuthFailedException;
 import io.questdb.client.cutlass.qwp.client.QwpDurableAckMismatchException;
+import io.questdb.client.cutlass.qwp.client.QwpIngressRoleRejectedException;
+import io.questdb.client.cutlass.qwp.client.QwpRoleMismatchException;
+import io.questdb.client.cutlass.qwp.client.QwpVersionMismatchException;
 import org.jetbrains.annotations.TestOnly;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -50,27 +55,48 @@
  *   <li>Close everything in reverse order; release the lock.</li>
  * </ol>
  * <p>
- * On terminal failure (auth-rejection on reconnect, reconnect-budget
- * exhaustion, recovery error), the drainer drops a
- * {@link OrphanScanner#FAILED_SENTINEL_NAME} sentinel into the slot
- * before exiting. Future scans skip the slot until an operator clears
- * the sentinel — bounded automatic retry, then human-in-the-loop.
+ * On terminal failure (auth-rejection on reconnect, a cluster-wide durable-ack
+ * capability gap that exhausts its settle budget, recovery error), the drainer
+ * drops a {@link OrphanScanner#FAILED_SENTINEL_NAME} sentinel into the slot
+ * before exiting. Future scans skip the slot until an operator clears the
+ * sentinel — bounded automatic retry, then human-in-the-loop. A transient
+ * all-replica failover window is NOT terminal: it is retried indefinitely
+ * (Invariant B), never quarantined on a wall-clock budget or attempt cap.
  */
 public final class BackgroundDrainer implements Runnable {
 
     /**
      * Cap on consecutive {@link QwpDurableAckMismatchException} attempts at
      * initial connect before the drainer escalates to a {@code .failed}
-     * sentinel. The wall-clock budget {@code reconnectMaxDurationMillis}
-     * also caps the same loop; whichever is hit first triggers escalation.
-     * 16 attempts gives the cluster room to settle through a rolling
-     * upgrade (each attempt walks every endpoint internally) without
-     * letting a genuine cluster-wide misconfig hang the drainer forever.
+     * sentinel. Applies ONLY to a genuine cluster-wide durable-ack capability
+     * gap (a server that upgrades but does not advertise durable ack); a
+     * transient all-replica failover window (role reject) is retried
+     * indefinitely and is never subject to this cap (Invariant B). The
+     * wall-clock budget {@code reconnectMaxDurationMillis} also caps this
+     * capability-gap loop; whichever is hit first triggers escalation. Both
+     * halves of the budget measure a capability-gap <i>episode</i>: the
+     * wall clock accumulates only across uninterrupted gap-to-gap intervals
+     * (never before the first gap is observed, and never across an
+     * intervening transport window -- an unreachable cluster is not
+     * "failing to settle"), and an intervening role reject restarts the
+     * episode -- it proves the topology changed, so the next capability-gap
+     * error is a fresh episode against a newly promoted node. 16
+     * attempts gives the cluster room to settle through a rolling upgrade
+     * (each attempt walks every endpoint internally) without letting a genuine
+     * cluster-wide misconfig hang the drainer forever.
      */
     public static final int DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS = 16;
     private static final Logger LOG = LoggerFactory.getLogger(BackgroundDrainer.class);
     /** How often to wake and re-check ackedFsn vs target. */
     private static final long POLL_NANOS = 50_000_000L; // 50 ms
+    /**
+     * Upper bound on a single backoff park so {@link #requestStop()} is
+     * honored promptly even without the unpark (e.g. a permit consumed by
+     * an earlier spurious wakeup). Keeps the pool's post-stop grace window
+     * ({@code BackgroundDrainerPool.STOP_GRACE_MILLIS}) meaningful: a
+     * stopping drainer wakes at least every 50ms to re-check the flag.
+     */
+    private static final long STOP_CHECK_PARK_CHUNK_NANOS = 50_000_000L; // 50 ms
     private final CursorWebSocketSendLoop.ReconnectFactory clientFactory;
     private final long durableAckKeepaliveIntervalMillis;
     private final long reconnectInitialBackoffMillis;
@@ -92,6 +118,13 @@ public final class BackgroundDrainer implements Runnable {
      */
     private volatile BackgroundDrainerListener listener;
     private volatile DrainOutcome outcome = DrainOutcome.PENDING;
+    /**
+     * Thread currently executing {@link #run()} (or a direct
+     * {@link #connectWithDurableAckRetry()} call from tests). Lets
+     * {@link #requestStop()} unpark a drainer sleeping in a backoff or
+     * poll park instead of waiting for the park to elapse.
+     */
+    private volatile Thread runnerThread;
     private volatile boolean stopRequested;
 
     public BackgroundDrainer(
@@ -129,7 +162,10 @@ public BackgroundDrainer() {
     }
 
     /**
-     * Initial connect with retry on whole-cluster durable-ack unavailability.
+     * Budgeted connect with retry on whole-cluster durable-ack unavailability:
+     * the initial connect, and re-entered from {@link #run()} whenever a
+     * mid-drain reconnect sweep hits the same capability gap (each re-entry
+     * is a fresh episode -- a successful connect ended the previous one).
      * The wrapped {@code clientFactory.reconnect()} already walks every
      * configured endpoint per attempt and only throws
      * {@link QwpDurableAckMismatchException} when none of them advertise
@@ -146,36 +182,137 @@ public BackgroundDrainer() {
      * budget, the drainer drops a {@code .failed} sentinel and exits
      * exactly as the original single-shot path did.
      * <p>
-     * Other exceptions (auth failure, version mismatch, transport error,
-     * etc.) preserve the original behavior: mark failed, exit. They are
-     * either terminal in their own right or already retried inside
-     * {@code reconnect()}.
+     * The budget measures a capability-gap <i>episode</i>: consecutive
+     * {@link QwpDurableAckMismatchException} sweeps only. Transient
+     * conditions -- an all-replica failover window (role reject) or a
+     * transport error -- are retried indefinitely (Invariant B) and never
+     * consume the budget: the wall-clock half accumulates only across
+     * uninterrupted gap-to-gap intervals, so a mid-episode transport window
+     * pauses the clock (without touching the attempt count), and a role
+     * reject additionally restarts the episode, because it proves the
+     * topology changed under the rolling upgrade.
+     * Genuine terminals (auth failure, non-421 upgrade reject) preserve
+     * the original behavior: mark failed, exit.
      *
      * @return a fresh durable-ack-capable client, or {@code null} if
      *         {@link #outcome} has been set to FAILED or STOPPED
      */
     @TestOnly
     public WebSocketClient connectWithDurableAckRetry() {
-        long startNanos = System.nanoTime();
-        long deadlineNanos = startNanos + reconnectMaxDurationMillis * 1_000_000L;
+        // run() already set runnerThread; setting it again here is a no-op
+        // on that path but wires up direct @TestOnly calls so requestStop()
+        // can unpark them too.
+        runnerThread = Thread.currentThread();
         long backoffMillis = reconnectInitialBackoffMillis;
-        int mismatchAttempts = 0;
+        // Capability-gap settle budget. Counts ONLY consecutive
+        // QwpDurableAckMismatchException sweeps; the wall-clock half
+        // accumulates ONLY across uninterrupted gap-to-gap intervals, so
+        // transient churn (role reject, transport) can never burn the budget
+        // -- neither before the first gap is observed nor mid-episode (a
+        // cluster unreachable for longer than the whole budget that comes
+        // back still gapped has consumed none of it). An intervening role
+        // reject resets the episode (topology churn: the offending node is
+        // gone); a transport error neither increments nor resets the attempt
+        // count -- a dropped socket does not prove promotion churn, and
+        // resetting on it would let a flaky-but-misconfigured cluster evade
+        // the cap forever -- it only pauses the wall clock: the gap-to-gap
+        // interval spanning the transport window is not charged.
+        int capabilityGapAttempts = 0;
+        // Wall-clock time accumulated across uninterrupted gap-to-gap
+        // intervals of the current episode; escalates once it reaches
+        // capabilityGapBudgetNanos (or the attempt cap fires first).
+        long capabilityGapElapsedNanos = 0L;
+        // Timestamp of the previous capability-gap sweep; 0 = the next gap
+        // charges nothing (episode start, post-role-reject restart, or the
+        // interval was interrupted by a transport window).
+        long lastCapabilityGapNanos = 0L;
+        final long capabilityGapBudgetNanos = reconnectMaxDurationMillis * 1_000_000L;
+        // Observability-only counter for the transient all-replica window;
+        // never consulted for escalation (Invariant B).
+        int roleRejectAttempts = 0;
+        // Throttle the all-replica retry WARN to one per 5s: a real failover
+        // window can last minutes and (Invariant B) is retried indefinitely, so
+        // per-attempt logging would flood. Mirrors CursorWebSocketSendLoop.
+        long lastReplicaWarnNanos = 0L;
+        long lastTransportWarnNanos = 0L;
         while (!stopRequested) {
+            // True only for a genuine durable-ack CAPABILITY gap, which is
+            // bounded by the settle budget / attempt cap. A transient all-replica
+            // failover window (role reject) is retried indefinitely under
+            // Invariant B and leaves this false, so its backoff is never clamped
+            // to the deadline (which would otherwise busy-loop once past it).
+            boolean boundedByBudget = false;
             try {
                 return clientFactory.reconnect();
+            } catch (QwpAuthFailedException | WebSocketUpgradeException e) {
+                // Genuinely non-retriable across the cluster (auth 401/403, or a
+                // non-421 upgrade reject): waiting will not fix it, so quarantine
+                // immediately -- exactly as the live sender's background loop
+                // (CursorWebSocketSendLoop.connectLoop) halts on these errors.
+                String msg = e.getMessage();
+                LOG.error("drainer terminal upgrade/auth error for slot {}: {}", slotPath, msg);
+                lastErrorMessage = msg;
+                OrphanScanner.markFailed(slotPath, "auth/upgrade: " + msg);
+                outcome = DrainOutcome.FAILED;
+                return null;
+            } catch (QwpRoleMismatchException | QwpIngressRoleRejectedException e) {
+                // INVARIANT B: every reachable endpoint is a REPLICA right now.
+                // A replica is promotable and a primary will reappear, so this is
+                // a TRANSIENT failover window, NOT a capability gap. The drainer
+                // must keep retrying (capped backoff) until a primary is reachable,
+                // stopRequested, or SF exhaustion -- it must NEVER quarantine the
+                // slot on a wall-clock budget or an attempt cap. Surface the
+                // per-attempt observability callback, then back off and retry.
+                roleRejectAttempts++;
+                // Topology is mid-churn: whatever node produced any earlier
+                // capability-gap errors is no longer the primary the next
+                // sweep hits, so the gap episode (attempts + wall clock)
+                // restarts and the next gap gets the full settle budget.
+                capabilityGapAttempts = 0;
+                capabilityGapElapsedNanos = 0L;
+                lastCapabilityGapNanos = 0L;
+                BackgroundDrainerListener l = listener;
+                if (l != null) {
+                    try {
+                        l.onPrimaryUnavailable(slotPath, roleRejectAttempts);
+                    } catch (Throwable cb) {
+                        LOG.warn("drainer listener onPrimaryUnavailable threw: {}",
+                                cb.getMessage());
+                    }
+                }
+                long nowWarn = System.nanoTime();
+                if (nowWarn - lastReplicaWarnNanos >= 5_000_000_000L) {
+                    LOG.warn("drainer slot {} attempt {}: all endpoints are replicas "
+                            + "(transient failover window), retrying after backoff",
+                            slotPath, roleRejectAttempts);
+                    lastReplicaWarnNanos = nowWarn;
+                }
             } catch (QwpDurableAckMismatchException e) {
-                mismatchAttempts++;
+                // Genuine cluster-wide durable-ack CAPABILITY gap: a server
+                // upgraded but does not advertise durable ack. Unlike a role
+                // reject this will not clear by waiting for a promotion, so it
+                // stays terminal for the drainer -- give the cluster a bounded
+                // settle budget (rolling upgrade), then quarantine the slot.
+                capabilityGapAttempts++;
                 long now = System.nanoTime();
-                long elapsedMs = (now - startNanos) / 1_000_000L;
-                boolean exhausted = mismatchAttempts >= DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS
-                        || now >= deadlineNanos;
+                if (lastCapabilityGapNanos != 0L) {
+                    // Charge only the interval since the PREVIOUS gap sweep,
+                    // and only when no transient error interrupted it. Time
+                    // spent in a transient window -- before the first gap or
+                    // between two gaps -- is never charged to the episode.
+                    capabilityGapElapsedNanos += now - lastCapabilityGapNanos;
+                }
+                lastCapabilityGapNanos = now;
+                long elapsedMs = capabilityGapElapsedNanos / 1_000_000L;
+                boolean exhausted = capabilityGapAttempts >= DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS
+                        || capabilityGapElapsedNanos >= capabilityGapBudgetNanos;
                 BackgroundDrainerListener l = listener;
                 if (exhausted) {
                     LOG.error("drainer giving up on slot {} after {} durable-ack-mismatch attempts ({}ms): {}",
-                            slotPath, mismatchAttempts, elapsedMs, e.getMessage());
+                            slotPath, capabilityGapAttempts, elapsedMs, e.getMessage());
                     if (l != null) {
                         try {
-                            l.onDurableAckPersistentFailure(slotPath, mismatchAttempts, elapsedMs);
+                            l.onDurableAckPersistentFailure(slotPath, capabilityGapAttempts, elapsedMs);
                         } catch (Throwable cb) {
                             LOG.warn("drainer listener onDurableAckPersistentFailure threw: {}",
                                     cb.getMessage());
@@ -184,36 +321,95 @@ public WebSocketClient connectWithDurableAckRetry() {
                     lastErrorMessage = e.getMessage();
                     OrphanScanner.markFailed(slotPath,
                             "durable-ack persistently unavailable after "
-                                    + mismatchAttempts + " attempts: " + e.getMessage());
+                                    + capabilityGapAttempts + " attempts: " + e.getMessage());
                     outcome = DrainOutcome.FAILED;
                     return null;
                 }
+                boundedByBudget = true;
                 if (l != null) {
                     try {
-                        l.onDurableAckUnavailable(slotPath, mismatchAttempts);
+                        l.onDurableAckUnavailable(slotPath, capabilityGapAttempts);
                     } catch (Throwable cb) {
                         LOG.warn("drainer listener onDurableAckUnavailable threw: {}",
                                 cb.getMessage());
                     }
                 }
                 LOG.warn("drainer slot {} attempt {}: durable-ack unavailable, retrying after backoff",
-                        slotPath, mismatchAttempts);
+                        slotPath, capabilityGapAttempts);
             } catch (Throwable t) {
-                String msg = t.getMessage();
-                LOG.error("drainer initial connect failed for slot {}: {}", slotPath, msg);
-                lastErrorMessage = msg;
-                OrphanScanner.markFailed(slotPath, "initial connect: " + msg);
-                outcome = DrainOutcome.FAILED;
-                return null;
+                if (t instanceof Error) {
+                    // java.lang.Error (OOM, LinkageError, StackOverflowError)
+                    // is a JVM/programming failure, not a transport outage:
+                    // retrying cannot clear it, and spinning here would pin
+                    // the slot .lock forever with no .failed sentinel and only
+                    // a throttled, possibly-null-message WARN as a trace.
+                    // Rethrow: run()'s outer catch quarantines the slot
+                    // (markFailed + FAILED) and its finally releases the lock
+                    // -- quarantine-and-exit, exactly as genuine terminals do.
+                    throw (Error) t;
+                }
+                // INVARIANT B: a transport failure -- the whole cluster is
+                // unreachable right now (server down, network partition) -- is
+                // TRANSIENT, exactly as the live sender's background loop treats
+                // it. The server will come back; keep retrying (capped backoff)
+                // until it does, stopRequested, or SF exhaustion. NEVER quarantine
+                // the slot on a transport error. Genuine terminals (auth /
+                // non-421 upgrade / durable-ack capability gap) are handled by the
+                // catches above and still fail fast. A QWP version mismatch also
+                // reaches here (it extends HttpClientException, not
+                // WebSocketUpgradeException) and is intentionally retried under
+                // Invariant B -- but it is NOT a transport outage, so log it
+                // truthfully below rather than mislabelling it "cluster unreachable".
+                lastErrorMessage = t.getMessage();
+                // Pause the episode wall clock: the gap-to-gap interval this
+                // window interrupts is never charged. Attempts and elapsed
+                // already accumulated are preserved (anti-evasion: see the
+                // budget comment above).
+                lastCapabilityGapNanos = 0L;
+                long nowWarn = System.nanoTime();
+                if (nowWarn - lastTransportWarnNanos >= 5_000_000_000L) {
+                    if (t instanceof QwpVersionMismatchException) {
+                        // The cluster IS reachable: every endpoint completed the
+                        // WebSocket upgrade but advertised a QWP protocol version
+                        // this client cannot speak. A rolling upgrade clears this
+                        // once peers converge, so Invariant B keeps retrying -- but
+                        // if it persists the client binary is version-incompatible
+                        // with the whole cluster and an operator must intervene
+                        // (upgrade the client or the servers). Name the real
+                        // condition so it is diagnosable, not hidden behind a
+                        // network-outage message.
+                        LOG.warn("drainer slot {}: every reachable endpoint advertises an unsupported "
+                                        + "QWP protocol version ({}); retrying (rolling-upgrade window) -- "
+                                        + "if this persists the client is version-incompatible with the cluster",
+                                slotPath, t.getMessage());
+                    } else {
+                        LOG.warn("drainer slot {}: cluster unreachable ({}), retrying after backoff",
+                                slotPath, t.getMessage());
+                    }
+                    lastTransportWarnNanos = nowWarn;
+                }
             }
             // Backoff before the next sweep. Honor stopRequested by parking in
             // small chunks rather than a single long park so close() doesn't
-            // wait for a full sleep to elapse.
+            // wait for a full sleep to elapse. Only the bounded (capability-gap)
+            // path clamps to the remaining budget (the post-gap sleep is charged
+            // to the episode by the next gap sweep) so it escalates promptly once
+            // the accumulated gap-time runs out; the transient failover path
+            // retries indefinitely and just backs off (capped exponential),
+            // never busy-looping past an exhausted budget.
             long jitter = ThreadLocalRandom.current().nextLong(Math.max(1L, backoffMillis));
-            long sleepMillis = Math.min(backoffMillis + jitter,
-                    Math.max(0L, (deadlineNanos - System.nanoTime()) / 1_000_000L));
+            long sleepMillis = backoffMillis + jitter;
+            if (boundedByBudget) {
+                sleepMillis = Math.min(sleepMillis,
+                        Math.max(0L, (capabilityGapBudgetNanos - capabilityGapElapsedNanos) / 1_000_000L));
+            }
             if (sleepMillis > 0L && !stopRequested) {
-                LockSupport.parkNanos(sleepMillis * 1_000_000L);
+                long parkDeadlineNanos = System.nanoTime() + sleepMillis * 1_000_000L;
+                long remaining;
+                while (!stopRequested
+                        && (remaining = parkDeadlineNanos - System.nanoTime()) > 0L) {
+                    LockSupport.parkNanos(Math.min(remaining, STOP_CHECK_PARK_CHUNK_NANOS));
+                }
             }
             backoffMillis = Math.min(backoffMillis * 2L, reconnectMaxBackoffMillis);
         }
@@ -240,10 +436,24 @@ public DrainOutcome outcome() {
 
     public void requestStop() {
         stopRequested = true;
+        // Wake the drainer out of any backoff/poll park immediately so the
+        // pool's bounded stop-grace window is spent unwinding (release slot
+        // lock, close engine), not sleeping out the remainder of a capped
+        // exponential backoff.
+        Thread t = runnerThread;
+        if (t != null) {
+            LockSupport.unpark(t);
+        }
+    }
+
+    /** True once {@link #requestStop()} has been called. */
+    public boolean isStopRequested() {
+        return stopRequested;
     }
 
     @Override
     public void run() {
+        runnerThread = Thread.currentThread();
         CursorSendEngine engine = null;
         WebSocketClient client = null;
         CursorWebSocketSendLoop loop = null;
@@ -278,37 +488,92 @@ public void run() {
                 // already dropped on the FAILED path.
                 return;
             }
-            loop = new CursorWebSocketSendLoop(
-                    client, engine,
-                    0L, CursorWebSocketSendLoop.DEFAULT_PARK_NANOS,
-                    clientFactory,
-                    reconnectMaxDurationMillis,
-                    reconnectInitialBackoffMillis,
-                    reconnectMaxBackoffMillis,
-                    requestDurableAck,
-                    durableAckKeepaliveIntervalMillis);
-            loop.start();
-
+            // One iteration per wire session. Re-entered ONLY when a mid-drain
+            // reconnect sweep hit a durable-ack CAPABILITY gap: that is the
+            // exact rolling-upgrade condition the settle budget in
+            // connectWithDurableAckRetry() exists for, so it must not
+            // quarantine on the first sweep the way the initial-connect path
+            // never does. The engine stays alive across sessions (it holds the
+            // slot lock; only loop + client are recycled), and target remains
+            // valid -- the slot is orphaned, nothing appends to it.
+            drain:
             while (!stopRequested) {
-                long acked = engine.ackedFsn();
-                this.ackedFsn = acked;
-                if (acked >= target) {
-                    outcome = DrainOutcome.SUCCESS;
-                    LOG.info("drainer fully drained slot {} (target={}, acked={})",
-                            slotPath, target, acked);
-                    return;
-                }
-                try {
-                    loop.checkError();
-                } catch (Throwable t) {
-                    String msg = t.getMessage();
-                    LOG.error("drainer wire error for slot {}: {}", slotPath, msg);
-                    lastErrorMessage = msg;
-                    OrphanScanner.markFailed(slotPath, "wire: " + msg);
-                    outcome = DrainOutcome.FAILED;
-                    return;
+                loop = new CursorWebSocketSendLoop(
+                        client, engine,
+                        0L, CursorWebSocketSendLoop.DEFAULT_PARK_NANOS,
+                        clientFactory,
+                        reconnectMaxDurationMillis,
+                        reconnectInitialBackoffMillis,
+                        reconnectMaxBackoffMillis,
+                        requestDurableAck,
+                        durableAckKeepaliveIntervalMillis);
+                loop.start();
+
+                while (!stopRequested) {
+                    long acked = engine.ackedFsn();
+                    this.ackedFsn = acked;
+                    if (acked >= target) {
+                        outcome = DrainOutcome.SUCCESS;
+                        LOG.info("drainer fully drained slot {} (target={}, acked={})",
+                                slotPath, target, acked);
+                        return;
+                    }
+                    try {
+                        loop.checkError();
+                    } catch (Throwable t) {
+                        if (loop.capabilityGapTerminal() != null) {
+                            // Capability gap mid-drain: recycle the wire, NOT
+                            // the slot. connectWithDurableAckRetry() owns the
+                            // episode budget (16 consecutive gap sweeps /
+                            // wall clock) and drops the sentinel itself if the
+                            // gap persists. The loop's own failed sweep is not
+                            // counted toward the fresh episode -- an off-by-one
+                            // that is immaterial at budget 16.
+                            LOG.warn("drainer slot {}: durable-ack capability gap "
+                                            + "mid-drain ({}), re-entering settle budget",
+                                    slotPath, t.getMessage());
+                            try {
+                                loop.close();
+                            } catch (Throwable closeFailure) {
+                                // Interrupted shutdown mid-recycle (pool
+                                // shutdownNow): the old I/O thread is still
+                                // alive, so opening a new wire session against
+                                // the same engine would race its exit — and
+                                // closing the client under a possibly mid-send
+                                // thread risks SEGV. Bail out; the finally
+                                // re-runs loop.close(), which re-signals the
+                                // failed stop and routes client/engine
+                                // teardown to the delegation protocol there.
+                                LOG.warn("drainer slot {}: stop requested mid-recycle and the "
+                                                + "I/O thread did not stop ({}); abandoning recycle",
+                                        slotPath, closeFailure.getMessage());
+                                outcome = stopRequested ? DrainOutcome.STOPPED : DrainOutcome.FAILED;
+                                return;
+                            }
+                            try {
+                                client.close();
+                            } catch (Throwable ignored) {
+                            }
+                            loop = null;
+                            client = connectWithDurableAckRetry();
+                            if (client == null) {
+                                // outcome already set (FAILED after budget
+                                // exhaustion, or STOPPED); sentinel handled.
+                                return;
+                            }
+                            continue drain;
+                        }
+                        String msg = t.getMessage();
+                        LOG.error("drainer wire error for slot {}: {}", slotPath, msg);
+                        lastErrorMessage = msg;
+                        OrphanScanner.markFailed(slotPath, "wire: " + msg);
+                        outcome = DrainOutcome.FAILED;
+                        return;
+                    }
+                    java.util.concurrent.locks.LockSupport.parkNanos(POLL_NANOS);
                 }
-                java.util.concurrent.locks.LockSupport.parkNanos(POLL_NANOS);
+                // Inner loop exits only on stopRequested; fall through to the
+                // outer condition, which is false for the same reason.
             }
             outcome = DrainOutcome.STOPPED;
         } catch (Throwable t) {
@@ -333,25 +598,56 @@ public void run() {
             }
             outcome = DrainOutcome.FAILED;
         } finally {
+            boolean ioThreadStopped = true;
             if (loop != null) {
                 try {
                     loop.close();
-                } catch (Throwable ignored) {
+                } catch (Throwable e) {
+                    // The loop's I/O thread would not stop — close() was
+                    // interrupted (the pool's shutdownNow path) while the
+                    // thread sat in a blocking native connect/send that
+                    // neither unpark nor interrupt cancels. Freeing the
+                    // client's buffers or unmapping the engine now would
+                    // race the live thread (C5 SEGV); both are delegated to
+                    // the thread's own exit path below.
+                    ioThreadStopped = false;
+                    LOG.warn("drainer slot {}: I/O thread did not stop during close ({}); "
+                                    + "delegating client/engine teardown to its exit path",
+                            slotPath, e.getMessage());
                 }
             }
-            if (client != null) {
+            if (client != null && ioThreadStopped) {
+                // Skipped on a failed stop: the thread may be mid-send on
+                // this very client; ioLoop's finally closes the loop's
+                // current client (this one, unless a reconnect swapped it —
+                // in which case swapClient already closed this reference).
                 try {
                     client.close();
                 } catch (Throwable ignored) {
                 }
             }
             if (engine != null) {
-                try {
-                    // engine.close() releases the slot lock too.
-                    engine.close();
-                } catch (Throwable ignored) {
+                // Failed-stop hand-off: delegateEngineClose() makes the I/O
+                // thread run engine.close() strictly after its last engine
+                // access, releasing the slot lock as soon as the stuck wire
+                // call resolves — deferred teardown, never abandoned. The
+                // false return covers the race where the thread exited
+                // between the failed close() and now: then it is safe (and
+                // necessary) to close the engine here.
+                if (ioThreadStopped || !loop.delegateEngineClose()) {
+                    try {
+                        // engine.close() releases the slot lock too.
+                        engine.close();
+                    } catch (Throwable ignored) {
+                    }
+                } else {
+                    LOG.warn("drainer slot {}: engine close delegated to the I/O thread; "
+                            + "slot lock releases when it exits", slotPath);
                 }
             }
+            // Don't let a later requestStop() unpark an unrelated task that
+            // the pool's executor may have scheduled onto this same thread.
+            runnerThread = null;
         }
     }
 
diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerListener.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerListener.java
index e5a298b9..55d1ae73 100644
--- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerListener.java
+++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerListener.java
@@ -43,27 +43,56 @@ public interface BackgroundDrainerListener {
 
     /**
      * Fired when the drainer has retried past its budget on consecutive
-     * durable-ack-unavailable failures. The drainer drops a {@code .failed}
-     * sentinel and exits. Treat as cluster-wide misconfiguration and
-     * surface to operators.
+     * durable-ack capability-gap failures. The drainer drops a
+     * {@code .failed} sentinel and exits. Treat as cluster-wide
+     * misconfiguration and surface to operators.
      *
      * @param slotPath      slot the drainer was processing
-     * @param totalAttempts how many connect attempts hit the same failure
-     * @param elapsedMillis wall time spent on this failure mode
+     * @param totalAttempts capability-gap attempts in the final episode;
+     *                      transient sweeps (role reject, transport) are
+     *                      never counted
+     * @param elapsedMillis wall time of the final capability-gap episode,
+     *                      anchored at its first capability-gap error
      */
     void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis);
 
     /**
-     * Fired when {@code clientFactory.reconnect()} threw
-     * {@code QwpDurableAckMismatchException} — i.e. every endpoint in the
-     * current sweep failed to advertise durable ack. The drainer will
-     * back off and retry; this callback is purely observability. Source
-     * data stays pinned regardless because the loop runs in
+     * Fired when a connect sweep hit a genuine durable-ack capability gap
+     * ({@code QwpDurableAckMismatchException}: an endpoint upgrades but does
+     * not advertise durable ack). The drainer will back off and retry within
+     * its settle budget; this callback is purely observability. Source data
+     * stays pinned regardless because the loop runs in
      * {@code durableAckMode=true} and only trims on STATUS_DURABLE_ACK.
+     * A transient all-replica failover window (role reject) never fires this
+     * callback — it is surfaced through {@link #onPrimaryUnavailable}.
      *
      * @param slotPath      slot the drainer is processing
-     * @param attemptNumber 1-based count of consecutive durable-ack-unavailable
-     *                      failures for this drainer
+     * @param attemptNumber 1-based attempt number within the current
+     *                      capability-gap EPISODE. The counter restarts when
+     *                      an intervening role reject resets the episode —
+     *                      topology churn grants the next gap a fresh settle
+     *                      budget, which is correct behavior — and with the
+     *                      streams separated the reset's cause is visible as
+     *                      an {@link #onPrimaryUnavailable} delivery between
+     *                      the two episodes
      */
     void onDurableAckUnavailable(String slotPath, int attemptNumber);
+
+    /**
+     * Fired when a connect sweep found every reachable endpoint to be a
+     * REPLICA — a transient all-replica failover window (role reject). A
+     * replica is promotable and a primary will reappear, so the drainer
+     * retries indefinitely under Invariant B: this condition NEVER escalates
+     * and is never followed by {@link #onDurableAckPersistentFailure}. Runs
+     * on the drainer thread; implementations must not block. The no-op
+     * default keeps every implementor of the released 1.3.4 contract source-
+     * and binary-compatible.
+     *
+     * @param slotPath      slot the drainer is processing
+     * @param attemptNumber 1-based running role-reject count within the
+     *                      current connect loop (resets across connect
+     *                      re-entries)
+     */
+    default void onPrimaryUnavailable(String slotPath, int attemptNumber) {
+    }
 }
diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerPool.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerPool.java
index 458a4b9a..c3ca5dee 100644
--- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerPool.java
+++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainerPool.java
@@ -46,10 +46,15 @@
  * (no orphans submitted) costs one core thread; submitted-and-finished
  * drainers are GC'd after they complete.
  * <p>
- * Closing the pool requests every still-running drainer to stop and
- * waits up to a few seconds for them to exit cleanly. Drainers that
- * don't exit in time are left to finish on their own — the pool's
- * underlying executor uses daemon threads so they don't block JVM exit.
+ * Closing the pool uses a split stop policy: drainers that never started
+ * draining (still inside their connect-retry loop — e.g. the cluster is
+ * unreachable) are stop-signaled immediately, because no grace window can
+ * help them finish; drainers actively replaying frames get a graceful
+ * window to reach {@code acked >= target} before being signaled. Drainers
+ * that don't exit in time (typically parked in a blocking native connect
+ * that neither unpark nor interrupt cancels) are left to finish on their
+ * own — the pool's underlying executor uses daemon threads so they don't
+ * block JVM exit.
  */
 public final class BackgroundDrainerPool implements QuietCloseable {
 
@@ -66,9 +71,12 @@ public final class BackgroundDrainerPool implements QuietCloseable {
     // either lands before close (and close waits for it to finish) or
     // sees the closed bit and throws.
     private static final int CLOSED_BIT = Integer.MIN_VALUE;
-    // Time we let drainers finish their drain naturally before signaling
-    // stop. awaitTermination returns as soon as the last drainer exits,
-    // so this only matters when something is genuinely stuck.
+    // Time we let ACTIVELY DRAINING drainers finish naturally before
+    // signaling stop. Connect-phase drainers are stop-signaled before this
+    // window even starts (see close()), so during an outage — when no
+    // drainer can be draining — close() does not pay this in full.
+    // awaitTermination returns as soon as the last drainer exits, so this
+    // only matters when something is genuinely stuck.
     private static final long GRACEFUL_DRAIN_MILLIS = 2_500L;
     private static final Logger LOG = LoggerFactory.getLogger(BackgroundDrainerPool.class);
     // After signaling stop, give drainers a brief window to unwind cleanly
@@ -125,11 +133,33 @@ public void close() {
         while (state.get() != CLOSED_BIT) {
             Compat.onSpinWait();
         }
-        // Reject new tasks but let in-flight drainers finish their drain
-        // naturally. Without this grace window a drainer that's seconds
-        // away from acked >= target gets requestStop()'d and exits as
-        // STOPPED — its engine.close() then sees fullyDrained=false and
-        // leaves the slot's .sfa files behind, defeating drain_orphans.
+        // Split stop policy. The graceful window below exists so a drainer
+        // that is seconds away from acked >= target is not cut down
+        // mid-drain (its engine.close() would see fullyDrained=false and
+        // leave the slot's .sfa files behind, defeating drain_orphans). A
+        // drainer that never started draining — still inside its
+        // connect-retry loop, e.g. the cluster is unreachable and
+        // Invariant B retries forever — cannot possibly use that window
+        // productively, so stop it NOW: it wakes from its backoff park
+        // within ~50ms (STOP_CHECK_PARK_CHUNK_NANOS) and exits as STOPPED,
+        // cutting close() latency during an outage from
+        // GRACEFUL_DRAIN_MILLIS + STOP_GRACE_MILLIS (~3s) to roughly one
+        // stop-check park chunk. ackedFsn stays -1 until the drain loop's
+        // first poll, so `< 0` discriminates "never connected/started
+        // draining" from "actively draining"; the moments-wide race with a
+        // just-connected drainer is benign — it exits as STOPPED and the
+        // slot is re-adopted by the next scan. A drainer parked inside a
+        // blocking native connect ignores the stop until its background
+        // connect deadline resolves; that one still burns the full grace +
+        // stop windows below and is then abandoned to exit on its own
+        // (daemon thread).
+        for (BackgroundDrainer d : active) {
+            if (d.outcome() == BackgroundDrainer.DrainOutcome.PENDING && d.getAckedFsn() < 0) {
+                d.requestStop();
+            }
+        }
+        // Reject new tasks but let actively-draining drainers finish
+        // naturally.
         executor.shutdown();
         try {
             if (!executor.awaitTermination(GRACEFUL_DRAIN_MILLIS, TimeUnit.MILLISECONDS)) {
diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java
index 94322e9c..8d0f71e5 100644
--- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java
+++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java
@@ -145,15 +145,30 @@ private CursorSendEngine(String sfDir, long segmentSizeBytes, SegmentManager man
         boolean memoryMode = sfDir == null;
         SlotLock acquiredLock = null;
         if (!memoryMode) {
-            if (sfDir.isEmpty()) {
-                throw new IllegalArgumentException("sfDir must not be empty");
+            try {
+                if (sfDir.isEmpty()) {
+                    throw new IllegalArgumentException("sfDir must not be empty");
+                }
+                // Acquire the slot lock BEFORE we touch any *.sfa files. Two
+                // engines pointed at the same slot would otherwise race on
+                // recovery and create overlapping FSN ranges. SlotLock.acquire
+                // also creates the slot dir if it doesn't exist yet — no
+                // separate mkdir step needed here.
+                acquiredLock = SlotLock.acquire(sfDir);
+            } catch (Throwable t) {
+                // The delegating constructors evaluate `new SegmentManager(...)`
+                // BEFORE this body runs, so on a pre-try throw (e.g. slot lock
+                // collision) an owned manager is already alive and would leak
+                // its native path-scratch sink -- 256 bytes per failed
+                // construction attempt. Close it before propagating.
+                if (ownsManager) {
+                    try {
+                        manager.close();
+                    } catch (Throwable ignored) {
+                    }
+                }
+                throw t;
             }
-            // Acquire the slot lock BEFORE we touch any *.sfa files. Two
-            // engines pointed at the same slot would otherwise race on
-            // recovery and create overlapping FSN ranges. SlotLock.acquire
-            // also creates the slot dir if it doesn't exist yet — no
-            // separate mkdir step needed here.
-            acquiredLock = SlotLock.acquire(sfDir);
         }
         this.slotLock = acquiredLock;
         this.sfDir = sfDir;
@@ -168,7 +183,6 @@ private CursorSendEngine(String sfDir, long segmentSizeBytes, SegmentManager man
         // reference instead of orphaning the mmap'd segments + fds.
         SegmentRing ringInProgress = null;
         AckWatermark watermarkInProgress = null;
-        boolean managerStarted = false;
         try {
             // Disk mode: try to recover any *.sfa files left behind by a prior
             // session before deciding to start fresh. Without this the engine
@@ -277,7 +291,6 @@ private CursorSendEngine(String sfDir, long segmentSizeBytes, SegmentManager man
 
             if (ownsManager) {
                 manager.start();
-                managerStarted = true;
             }
             manager.register(ringInProgress, sfDir, watermarkInProgress);
             // All construction succeeded — commit the ring and
@@ -288,7 +301,10 @@ private CursorSendEngine(String sfDir, long segmentSizeBytes, SegmentManager man
             // Stop an owned manager before freeing the ring and watermark it may
             // touch, then release the slot lock. Each cleanup is in its own
             // try/catch so a single failure doesn't strand later cleanups.
-            if (ownsManager && managerStarted) {
+            // Closing an owned-but-never-started manager is safe (no worker to
+            // join) and required: skipping it leaked the manager's native
+            // path-scratch sink whenever construction failed before start().
+            if (ownsManager) {
                 try {
                     manager.close();
                 } catch (Throwable ignored) {
diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java
index 2003aa08..94f929f1 100644
--- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java
+++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java
@@ -35,6 +35,7 @@
 import io.questdb.client.cutlass.qwp.client.QwpDurableAckMismatchException;
 import io.questdb.client.cutlass.qwp.client.QwpIngressRoleRejectedException;
 import io.questdb.client.cutlass.qwp.client.QwpRoleMismatchException;
+import io.questdb.client.cutlass.qwp.client.QwpVersionMismatchException;
 import io.questdb.client.cutlass.qwp.client.WebSocketResponse;
 import io.questdb.client.cutlass.qwp.websocket.WebSocketCloseCode;
 import io.questdb.client.std.CharSequenceLongHashMap;
@@ -61,10 +62,11 @@
  *       cumulative wire sequence {@code N}, calls
  *       {@code engine.acknowledge(fsnAtZero + N)} so the segment manager
  *       can trim fully-acked segments.</li>
- *   <li>On wire failure, runs the configured reconnect policy: backoff
- *       with jitter up to {@code reconnect_max_duration_millis}, with
- *       auth-style failures (401/403/non-101 upgrade reject) treated as
- *       terminal. On reconnect success, repositions the cursor at
+ *   <li>On wire failure, runs the configured reconnect policy: capped
+ *       exponential backoff with jitter, retried indefinitely (Invariant B --
+ *       a store-and-forward drainer never gives up on a wall-clock budget),
+ *       with only auth-style failures (401/403/non-101 upgrade reject) treated
+ *       as terminal. On reconnect success, repositions the cursor at
  *       {@code ackedFsn+1} and replays.</li>
  * </ol>
  * No locks on the steady-state path. The producer thread (user) writes
@@ -140,6 +142,11 @@ public final class CursorWebSocketSendLoop implements QuietCloseable {
     private final ReconnectFactory reconnectFactory;
     private final long reconnectInitialBackoffMillis;
     private final long reconnectMaxBackoffMillis;
+    // Retained for constructor symmetry and passed in by callers, but NOT
+    // consulted by the background loop: Invariant B removed the wall-clock
+    // give-up from connectLoop. The budget still bounds the blocking (non-lazy)
+    // initial connect via QwpWebSocketSender -> connectWithRetry, which takes it
+    // as an explicit argument rather than reading this field.
     private final long reconnectMaxDurationMillis;
     private final WebSocketResponse response = new WebSocketResponse();
     private final ResponseHandler responseHandler = new ResponseHandler();
@@ -171,11 +178,11 @@ public final class CursorWebSocketSendLoop implements QuietCloseable {
     // alike) is offered to the dispatcher for async delivery to the user's
     // handler. Null disables async delivery entirely; the producer-side
     // typed-throw path is unaffected.
-    // Optional: when non-null, RECONNECT_BUDGET_EXHAUSTED is offered to the
-    // dispatcher for async delivery to the user's listener at the moment
-    // connectLoop gives up. Sender-side fire points (CONNECTED, FAILED_OVER,
-    // ENDPOINT_ATTEMPT_FAILED, AUTH_FAILED, ALL_ENDPOINTS_UNREACHABLE) write
-    // directly to the same dispatcher from QwpWebSocketSender.
+    // Optional: when non-null, sender-side connection events (CONNECTED,
+    // FAILED_OVER, ENDPOINT_ATTEMPT_FAILED, AUTH_FAILED, ALL_ENDPOINTS_UNREACHABLE)
+    // are written to this dispatcher from QwpWebSocketSender. connectLoop itself
+    // no longer emits a terminal budget-exhaustion event (Invariant B: it retries
+    // indefinitely and never gives up on a wall-clock budget).
     private volatile SenderConnectionDispatcher connectionDispatcher;
     private volatile SenderErrorDispatcher errorDispatcher;
     // The send cursor has two coordinate systems:
@@ -194,11 +201,28 @@ public final class CursorWebSocketSendLoop implements QuietCloseable {
     // Sticky flag: false until the very first time a live client is installed
     // (either via the constructor in SYNC/OFF mode or via swapClient on a
     // successful connect attempt in any mode). Once true, stays true. Used to
-    // distinguish "never reached the server" budget exhaustion (looks like a
+    // distinguish a "never reached the server" terminal failure (looks like a
     // config typo or firewall block) from "lost connection after we were
     // up" (looks transient).
     private volatile boolean hasEverConnected;
     private volatile Thread ioThread;
+    // Typed marker for a durable-ack CAPABILITY-GAP terminal: set (before the
+    // terminalError latch, so a checkError() caller that observes the latch is
+    // guaranteed to observe this marker too) when a reconnect sweep threw
+    // QwpDurableAckMismatchException. The orphan drainer consults it to route
+    // a mid-drain capability gap into its budgeted settle-retry
+    // (BackgroundDrainer.connectWithDurableAckRetry) instead of quarantining
+    // the slot on the first sweep; the foreground sender ignores it and keeps
+    // its spec'd loud-fail (sf-client.md section 8.1). Write-once alongside
+    // terminalError: the only writer runs on the I/O thread under the same
+    // first-writer-wins latch.
+    private volatile QwpDurableAckMismatchException capabilityGapTerminal;
+    // Failed-stop hand-off flag: set by delegateEngineClose() when an owner's
+    // close() could not stop the I/O thread and the engine close is therefore
+    // performed by the I/O thread's exit path. Write-once, owner thread only;
+    // read by the I/O thread strictly after its shutdown-latch countdown (see
+    // the handshake contract on delegateEngineClose).
+    private volatile boolean engineCloseDelegated;
     // The latched terminal failure — THE exception every checkError() call
     // rethrows. Write-once for the loop's lifetime: the only writer is
     // recordFatal on the I/O thread (first-writer-wins). The whole
@@ -249,9 +273,10 @@ public final class CursorWebSocketSendLoop implements QuietCloseable {
      * {@code client} may be {@code null} only if {@code reconnectFactory}
      * is non-null — this is the async-initial-connect path: the I/O thread
      * runs the same retry loop on its first iteration to obtain a live
-     * client, and a terminal failure (auth/upgrade reject or budget
-     * exhaustion) is delivered through the dispatcher rather than thrown
-     * to the constructor's caller.
+     * client, and a terminal failure (auth/upgrade reject) is delivered
+     * through the dispatcher rather than thrown to the constructor's
+     * caller; plain connect failures are retried indefinitely
+     * (Invariant B: no wall-clock budget give-up).
      */
     public CursorWebSocketSendLoop(WebSocketClient client, CursorSendEngine engine,
                                    long fsnAtZero, long parkNanos,
@@ -349,11 +374,13 @@ public static SenderError.Category classify(byte status) {
     }
 
     /**
-     * Same retry-with-exponential-backoff-and-jitter loop the I/O thread
-     * uses on a wire failure, but reusable from {@code ensureConnected} to
-     * implement {@code initial_connect_retry=true}. Returns the connected
-     * client on success; throws on terminal upgrade error (won't retry) or
-     * budget exhaustion.
+     * Same exponential-backoff-with-jitter machinery as the I/O thread's
+     * {@code connectLoop}, but reusable from {@code ensureConnected} to
+     * implement {@code initial_connect_retry=true}. Unlike {@code connectLoop}
+     * (which retries indefinitely under Invariant B), this blocking variant
+     * IS bounded by {@code maxDurationMillis}: it returns the connected
+     * client on success and throws on terminal upgrade error (won't retry)
+     * or budget exhaustion.
      * <p>
      * Caller-supplied {@code factory} is invoked once per attempt and
      * should produce a fresh, connected, upgraded client (or throw). The
@@ -399,11 +426,28 @@ public static WebSocketClient connectWithRetry(
                         contextLabel, e.getMessage());
                 throw e;
             } catch (Throwable e) {
+                if (e instanceof Error) {
+                    // JVM/programming failure (OOM, LinkageError): not a
+                    // transport outage, retrying cannot clear it. Propagate
+                    // to the caller instead of burning the connect budget.
+                    throw (Error) e;
+                }
                 lastError = e;
                 long now = System.nanoTime();
                 if (now - lastLogNanos >= RECONNECT_LOG_THROTTLE_NANOS) {
-                    LOG.warn("{} attempt {} failed: {}",
-                            contextLabel, attempts, e.getMessage());
+                    if (e instanceof QwpVersionMismatchException) {
+                        // Reachable but protocol-incompatible: consumes the connect
+                        // budget (walks the cluster across a rolling-upgrade window)
+                        // and, on exhaustion, surfaces as the terminal
+                        // LineSenderException below. Name the condition so a version
+                        // skew is diagnosable, not read as a generic connect failure.
+                        LOG.warn("{} attempt {}: every reachable endpoint advertises an unsupported "
+                                        + "QWP protocol version ({}); retrying within connect budget",
+                                contextLabel, attempts, e.getMessage());
+                    } else {
+                        LOG.warn("{} attempt {} failed: {}",
+                                contextLabel, attempts, e.getMessage());
+                    }
                     lastLogNanos = now;
                 }
             }
@@ -487,6 +531,23 @@ public void checkError() {
         }
     }
 
+    /**
+     * The typed durable-ack capability-gap terminal, or {@code null} if the
+     * loop's terminal (if any) is a different failure class. Non-null only
+     * after {@link #checkError()} started throwing: the marker is written
+     * before the {@code terminalError} latch, both on the I/O thread.
+     * <p>
+     * Consumer contract: the orphan drainer ({@code BackgroundDrainer})
+     * checks this after a {@code checkError()} throw to decide between
+     * re-entering its budgeted settle-retry (capability gap: the rolling
+     * upgrade may still settle) and quarantining the slot (every other
+     * terminal). Package-private on purpose -- the foreground sender must
+     * not branch on it (spec'd loud-fail, sf-client.md section 8.1).
+     */
+    QwpDurableAckMismatchException capabilityGapTerminal() {
+        return capabilityGapTerminal;
+    }
+
     /**
      * Safety-net variant of {@link #checkError()} for
      * {@code QwpWebSocketSender.close()}: rethrows the latched terminal error
@@ -524,8 +585,31 @@ public synchronized void close() {
             if (t.isAlive()) {
                 try {
                     shutdownLatch.await();
-                } catch (InterruptedException ignored) {
+                } catch (InterruptedException e) {
+                    // Re-assert the flag for the caller's stack, then decide.
+                    // If the I/O thread has genuinely not exited (latch still
+                    // up — it may be inside a blocking native connect/send
+                    // that neither unpark nor interrupt cancels), touching the
+                    // client here would free native buffers under a possibly
+                    // mid-send thread, and returning quietly would let the
+                    // owner unmap the engine under it (C5 SEGV). Signal the
+                    // failed stop loudly instead: QwpWebSocketSender.close()
+                    // keys its ioThreadStopped guard on this throw, and
+                    // BackgroundDrainer switches to delegateEngineClose().
+                    // The I/O thread's own exit path (ioLoop's finally)
+                    // disposes of the client either way. ioThread stays set,
+                    // so a duplicate close() re-signals rather than silently
+                    // succeeding against a still-live thread.
                     Thread.currentThread().interrupt();
+                    if (shutdownLatch.getCount() != 0L) {
+                        throw new LineSenderException(
+                                "cursor I/O thread did not stop: close() was interrupted "
+                                        + "while awaiting shutdown; client/engine teardown "
+                                        + "is delegated to the I/O thread's exit path");
+                    }
+                    // Latch hit zero concurrently with the interrupt: the
+                    // thread is past its last client/engine access — proceed
+                    // with normal teardown.
                 }
             }
             ioThread = null;
@@ -534,9 +618,11 @@ public synchronized void close() {
         // replaced the original (and closed it); the owner only retains
         // the stale pre-reconnect reference. Without closing the live
         // client here, its native socket and fds leak past sender.close()
-        // every time the loop reconnected at least once. close() is
-        // idempotent, so the owner's duplicate close on its stale
-        // reference is still safe.
+        // every time the loop reconnected at least once. ioLoop's finally
+        // also closes the current client on I/O-thread exit, so this read
+        // matters chiefly when the loop never started (SYNC construction,
+        // close() before start()) — and doubles as a safety net. close()
+        // is idempotent, so duplicate closes on any path are safe.
         WebSocketClient c = client;
         if (c != null) {
             try {
@@ -548,6 +634,34 @@ public synchronized void close() {
         }
     }
 
+    /**
+     * Failed-stop hand-off for the engine. Called by an owner whose
+     * {@link #close()} threw because the I/O thread would not stop: the owner
+     * must not free the engine (munmap/Unsafe.free of segment memory) while
+     * the thread may still touch it with raw {@code Unsafe} reads. Setting
+     * the delegation flag makes the I/O thread run {@code engine.close()} on
+     * its exit path, strictly after its last engine access and after the
+     * shutdown-latch countdown — releasing the slot lock as soon as the
+     * stuck wire call resolves (bounded by OS timeouts) instead of leaking
+     * the mapping and lock forever.
+     * <p>
+     * Returns {@code true} when the I/O thread is still live and has adopted
+     * the engine close; {@code false} when the thread has already exited —
+     * the caller must close the engine itself.
+     * <p>
+     * Memory model — the classic store/load handshake: this method writes the
+     * volatile flag, then reads the latch count; the exit path counts the
+     * latch down, then reads the flag. Under the sequential consistency of
+     * volatile (and AQS latch state) accesses, if this method observes the
+     * latch still up, the exit path is guaranteed to observe the flag — no
+     * missed close. If both sides act, {@link CursorSendEngine#close()} is
+     * synchronized and idempotent, so the double close is benign.
+     */
+    public boolean delegateEngineClose() {
+        engineCloseDelegated = true;
+        return shutdownLatch.getCount() != 0L;
+    }
+
     /**
      * Typed server-rejection payload of the latched terminal error, or
      * {@code null} when the loop latched a wire-level failure (or nothing).
@@ -647,7 +761,7 @@ public long getTotalServerErrors() {
      * True iff the I/O loop has at least once installed a live (connected
      * + upgraded) WebSocket client. Sticky — once true, stays true even
      * after a subsequent disconnect. Lets a {@code SenderErrorHandler}
-     * disambiguate a "never reached the server" budget exhaustion (likely
+     * disambiguate a "never reached the server" terminal failure (likely
      * a config typo or firewall block) from a "lost connection after we
      * were up" failure (likely transient).
      */
@@ -661,10 +775,10 @@ public boolean isRunning() {
 
     /**
      * Plug an async-delivery sink for {@link SenderConnectionEvent}
-     * notifications. The loop fires {@code RECONNECT_BUDGET_EXHAUSTED}
-     * through this sink when {@code connectLoop} gives up; other connection
-     * events fire from {@code QwpWebSocketSender.buildAndConnect} directly
-     * into the same dispatcher. Same lifecycle contract as
+     * notifications. Connection events fire from
+     * {@code QwpWebSocketSender.buildAndConnect} directly into this dispatcher;
+     * {@code connectLoop} no longer emits a terminal budget-exhaustion event
+     * (Invariant B: it retries indefinitely). Same lifecycle contract as
      * {@link #setErrorDispatcher}.
      */
     public void setConnectionDispatcher(SenderConnectionDispatcher dispatcher) {
@@ -786,8 +900,9 @@ private void applyDurableAck() {
      * Drives the very first connect attempt on the I/O thread, used in the
      * async-initial-connect mode (constructed with {@code client == null}).
      * Reuses the same retry+backoff machinery as {@link #fail(Throwable)} —
-     * a terminal upgrade reject or budget exhaustion is delivered through
-     * the dispatcher, not thrown to the producer.
+     * connect failures are retried indefinitely (Invariant B), and a
+     * terminal upgrade reject is delivered through the dispatcher, not
+     * thrown to the producer.
      */
     private void attemptInitialConnect() {
         connectLoop(new LineSenderException(
@@ -824,17 +939,48 @@ private void connectLoop(Throwable initial, String phase) {
         LOG.warn("cursor I/O loop entering {} loop: {}",
                 phase, initial.getMessage());
         long outageStartNanos = System.nanoTime();
-        long deadlineNanos = outageStartNanos + reconnectMaxDurationMillis * 1_000_000L;
+        // INVARIANT B: a store-and-forward drainer must NEVER terminate on a
+        // wall-clock reconnect budget. A replica-only / all-endpoints-replica
+        // window is TRANSIENT -- a replica gets promoted, a primary reappears --
+        // so this background loop retries for as long as it is running, backing
+        // off between attempts. The ONLY terminal conditions are a genuinely
+        // non-retriable upgrade (auth / non-421 upgrade / durable-ack capability
+        // gap), which return directly below, or the sender being stopped. SF
+        // exhaustion is surfaced to the PRODUCER as append backpressure, never
+        // here. reconnect_max_duration_millis is intentionally NOT consulted: it
+        // bounds only the blocking (non-lazy) initial connect in
+        // QwpWebSocketSender.buildAndConnect, never this background loop.
         long backoffMillis = reconnectInitialBackoffMillis;
         int attempts = 0;
         long lastLogNanos = 0L;
         Throwable lastReconnectError = initial;
-        while (running && System.nanoTime() < deadlineNanos) {
+        while (running) {
             attempts++;
             totalReconnectAttempts.incrementAndGet();
             try {
                 WebSocketClient newClient = reconnectFactory.reconnect();
                 if (newClient != null) {
+                    if (!running) {
+                        // close() ran while this connect attempt was in
+                        // flight. Its latch await may have been interrupted
+                        // (BackgroundDrainerPool.close()'s shutdownNow path)
+                        // and returned already — the owner's teardown,
+                        // including the engine unmap in BackgroundDrainer's
+                        // finally, can be complete. Installing the client now
+                        // would (a) touch engine memory via positionCursorAt
+                        // after a possible unmap and (b) abandon a live socket
+                        // in a loop nothing will revisit — close() has run,
+                        // its client read saw the pre-connect field. The
+                        // attempt owns the client until it is installed, so
+                        // dispose of it here, on the I/O thread, and exit
+                        // through the quiet stopped path below.
+                        try {
+                            newClient.close();
+                        } catch (Throwable ignored) {
+                            // best-effort
+                        }
+                        break;
+                    }
                     swapClient(newClient);
                     totalReconnects.incrementAndGet();
                     long elapsedMs = (System.nanoTime() - outageStartNanos) / 1_000_000L;
@@ -879,6 +1025,13 @@ private void connectLoop(Throwable initial, String phase) {
                 // not SECURITY_ERROR -- this is not an auth failure.
                 LOG.error("durable-ack mismatch during {} -- won't retry: {}",
                         phase, e.getMessage());
+                if (terminalError == null) {
+                    // Mirror recordFatal's first-writer-wins latch: only the
+                    // sweep that owns the terminal may mark the gap, and the
+                    // marker must be visible before the terminalError volatile
+                    // write that checkError() keys on.
+                    capabilityGapTerminal = e;
+                }
                 long fromFsn = engine.ackedFsn() + 1L;
                 long toFsn = Math.max(fromFsn, engine.publishedFsn());
                 SenderError err = new SenderError(
@@ -897,100 +1050,81 @@ private void connectLoop(Throwable initial, String phase) {
                 dispatchError(err);
                 return;
             } catch (QwpRoleMismatchException | QwpIngressRoleRejectedException e) {
-                // Role mismatch: cluster reconfigured during this connect, the
-                // previously-writable endpoint is now read-only. Reset backoff
-                // (don't double on each role reject -- failover usually clears
-                // within seconds) and park for the initial interval before the
-                // next attempt.
-                backoffMillis = reconnectInitialBackoffMillis;
+                // Role mismatch: every reachable endpoint role-rejected the
+                // upgrade -- right now they are all replicas / primary-catchup.
+                // This is a TRANSIENT failover window (a replica is promotable),
+                // so keep retrying with no wall-clock deadline (Invariant B).
+                // Do NOT reset the backoff or pin it at the initial interval:
+                // fall through to the shared capped exponential backoff-with-
+                // jitter block below. Pinning at reconnectInitialBackoffMillis
+                // turned a persistent all-replica window (e.g. an address list
+                // pointing at replicas only, now surfaced here as a retriable
+                // role reject rather than a terminal durable-ack mismatch) into
+                // a fixed ~10/s storm of fresh TLS handshakes -- new
+                // WebSocketClient, new SSLContext, trust-store re-read -- per
+                // endpoint, forever. Growing to reconnectMaxBackoffMillis
+                // mirrors the orphan drainer's role-reject path and honours the
+                // documented capped-exponential-backoff contract.
                 lastReconnectError = e;
-                if (running) {
-                    long remainingNanos = deadlineNanos - System.nanoTime();
-                    if (remainingNanos <= 0L) {
-                        break;
-                    }
-                    long parkNanos = Math.min(reconnectInitialBackoffMillis * 1_000_000L, remainingNanos);
-                    LockSupport.parkNanos(parkNanos);
+                long now = System.nanoTime();
+                if (now - lastLogNanos >= RECONNECT_LOG_THROTTLE_NANOS) {
+                    LOG.warn("{} attempt {}: every reachable endpoint is a replica "
+                                    + "(transient failover window); retrying with capped backoff -- "
+                                    + "if this persists the configured address list may point at replicas only",
+                            phase, attempts);
+                    lastLogNanos = now;
                 }
-                continue;
+                // fall through to the shared capped-backoff block
             } catch (Throwable e) {
+                if (e instanceof Error) {
+                    // JVM/programming failure (OOM, LinkageError): retrying
+                    // cannot clear it -- Invariant B covers transport outages
+                    // only. Latch it as terminal FIRST so a producer parked in
+                    // checkError() observes the failure and `running` flips
+                    // false, then rethrow so the I/O thread dies loudly
+                    // instead of reconnect-looping. The fail() call site sits
+                    // inside ioLoop's catch, so ioLoop's finally still counts
+                    // down the shutdown latch and close() cannot hang.
+                    recordFatal(e);
+                    throw (Error) e;
+                }
                 lastReconnectError = e;
                 long now = System.nanoTime();
                 if (now - lastLogNanos >= RECONNECT_LOG_THROTTLE_NANOS) {
-                    LOG.warn("{} attempt {} failed: {}", phase, attempts, e.getMessage());
+                    if (e instanceof QwpVersionMismatchException) {
+                        // Not a transport failure: the server completed the WS
+                        // upgrade but advertised a QWP version this client cannot
+                        // speak. Retried indefinitely under Invariant B (a rolling
+                        // upgrade clears it once peers converge), but log the real
+                        // condition so a persistent client/cluster version skew is
+                        // diagnosable instead of reading as a generic connect fail.
+                        LOG.warn("{} attempt {}: every reachable endpoint advertises an unsupported "
+                                        + "QWP protocol version ({}); retrying (rolling-upgrade window) -- "
+                                        + "if this persists the client is version-incompatible with the cluster",
+                                phase, attempts, e.getMessage());
+                    } else {
+                        LOG.warn("{} attempt {} failed: {}", phase, attempts, e.getMessage());
+                    }
                     lastLogNanos = now;
                 }
             }
             if (running) {
                 long jitter = ThreadLocalRandom.current().nextLong(backoffMillis);
                 long sleepMillis = backoffMillis + jitter;
-                long remainingMillis = (deadlineNanos - System.nanoTime()) / 1_000_000L;
-                if (remainingMillis <= 0) {
-                    break;
-                }
-                if (sleepMillis > remainingMillis) {
-                    sleepMillis = remainingMillis;
-                }
                 LockSupport.parkNanos(sleepMillis * 1_000_000L);
                 backoffMillis = Math.min(backoffMillis * 2, reconnectMaxBackoffMillis);
             }
         }
+        // The loop exits ONLY because running == false, i.e. the sender is
+        // closing / stopping. Under Invariant B this is NOT a budget give-up
+        // (there is no wall-clock terminal): we retried until asked to stop, so
+        // we return quietly and let close() drive shutdown. Un-acked rows remain
+        // in on-disk SF for this sender's next run or an orphan drainer to ship.
         long elapsedMs = (System.nanoTime() - outageStartNanos) / 1_000_000L;
-        String lastMsg = lastReconnectError.getMessage();
-        LOG.error("cursor I/O loop giving up {} after {}ms, {} attempts; last error: {}",
+        String lastMsg = lastReconnectError == null ? "n/a" : lastReconnectError.getMessage();
+        LOG.info("cursor I/O loop {} stopped after {}ms, {} attempts (sender closing); "
+                        + "un-acked rows remain in SF for retry; last error: {}",
                 phase, elapsedMs, attempts, lastMsg);
-        long fromFsn = engine.ackedFsn() + 1L;
-        long toFsn = Math.max(fromFsn, engine.publishedFsn());
-        // Disambiguate by what the sender saw on the wire: if we never got
-        // a successful upgrade, the user is most likely looking at a config
-        // problem (typo in addr, wrong port, firewall, server not deployed
-        // yet); if we connected at least once and then exhausted the budget,
-        // it's a transient connectivity issue (server down, network flap).
-        // Tag and free-text hint encode the same signal so both grep-the-logs
-        // and read-the-message users get it without parsing.
-        String connectivityTag;
-        String connectivityHint;
-        if (hasEverConnected) {
-            connectivityTag = "connection-lost-budget-exhausted";
-            connectivityHint = "server unreachable since last connect (transient)";
-        } else {
-            connectivityTag = "never-connected-budget-exhausted";
-            connectivityHint = "never reached the server (check addr/port/firewall)";
-        }
-        SenderError err = new SenderError(
-                SenderError.Category.PROTOCOL_VIOLATION,
-                SenderError.Policy.HALT,
-                SenderError.NO_STATUS_BYTE,
-                connectivityTag + ": " + elapsedMs + "ms / " + attempts
-                        + " attempts; " + connectivityHint
-                        + "; last error: " + lastMsg,
-                SenderError.NO_MESSAGE_SEQUENCE,
-                fromFsn,
-                toFsn,
-                null,
-                System.nanoTime()
-        );
-        totalServerErrors.incrementAndGet();
-        // recordFatal MUST run before dispatchError so the producer-observable
-        // terminal error is latched before the handler is invoked.
-        recordFatal(new LineSenderServerException(err));
-        dispatchError(err);
-        // Surface the terminal classification through the connection-event
-        // dispatcher too. Listeners learn about budget exhaustion without
-        // having to also subscribe to SenderError. Fire AFTER recordFatal so
-        // a listener that immediately checks the producer-side terminal state
-        // sees a consistent picture.
-        SenderConnectionDispatcher cd = connectionDispatcher;
-        if (cd != null) {
-            cd.offer(new SenderConnectionEvent(
-                    SenderConnectionEvent.Kind.RECONNECT_BUDGET_EXHAUSTED,
-                    null, SenderConnectionEvent.NO_PORT,
-                    null, SenderConnectionEvent.NO_PORT,
-                    attempts,
-                    SenderConnectionEvent.NO_ROUND_NUMBER,
-                    lastReconnectError,
-                    System.currentTimeMillis()));
-        }
     }
 
     /**
@@ -1064,12 +1198,12 @@ private void enqueuePendingOk(long wireSeq) {
 
     /**
      * Surface a wire failure. With reconnect plumbing wired (factory +
-     * listener both non-null), enters the per-outage retry loop:
-     * exponential backoff with jitter, time-capped at
-     * {@code reconnectMaxDurationMillis}, terminal on auth/upgrade
-     * rejections (so the budget isn't burned on errors that won't fix
-     * themselves). On the first successful reconnect within the budget,
-     * the I/O loop resumes with reset wire state and replays from
+     * listener both non-null), enters the per-outage retry loop: capped
+     * exponential backoff with jitter, retried for as long as the loop is
+     * running -- there is NO wall-clock give-up (Invariant B: a store-and-
+     * forward drainer only terminates on SF exhaustion or a genuinely non-
+     * retriable auth/upgrade reject). On the first successful reconnect the
+     * I/O loop resumes with reset wire state and replays from
      * {@code engine.ackedFsn() + 1}.
      * <p>
      * Without reconnect plumbing, the failure is immediately terminal
@@ -1097,9 +1231,10 @@ private void ioLoop() {
             // a reconnect factory is wired. Drive the very first connect on
             // this thread so the producer thread never blocks on it.
             // attemptInitialConnect either sets `client` (success) or records
-            // a terminal failure and clears `running` (auth/upgrade reject or
-            // budget exhaustion). Either way, the main loop below sees the
-            // outcome via the `running` and `client` fields.
+            // a terminal failure and clears `running` (auth/upgrade reject;
+            // plain connect failures retry indefinitely under Invariant B).
+            // Either way, the main loop below sees the outcome via the
+            // `running` and `client` fields.
             if (client == null && running) {
                 attemptInitialConnect();
             }
@@ -1127,9 +1262,51 @@ private void ioLoop() {
                 }
             }
         } catch (Throwable t) {
+            if (t instanceof Error) {
+                // Never funnel a JVM Error into the reconnect loop: latch it
+                // as terminal so checkError() surfaces it to the producer,
+                // then rethrow so the thread dies loudly. The finally still
+                // counts down the shutdown latch, so close() cannot hang.
+                recordFatal(t);
+                throw (Error) t;
+            }
             fail(t);
         } finally {
+            // Last act of the I/O thread: dispose of whatever client it
+            // holds. This is the airtight half of the close()-vs-reconnect
+            // race — when close()'s latch await is interrupted (drainer pool
+            // shutdownNow), close() returns before this thread has exited,
+            // and its own client close saw the pre-reconnect field. A client
+            // swapped in by the tail of an in-flight connect attempt (running
+            // flipped false between connectLoop's check and swapClient) would
+            // be abandoned live without this. Runs BEFORE the latch countdown
+            // so a non-interrupted close() observes a fully disposed loop.
+            // Duplicate closes — loop.close()'s own, owners' stale references
+            // — stay safe: WebSocketClient.close() is idempotent.
+            WebSocketClient c = client;
+            if (c != null) {
+                try {
+                    c.close();
+                } catch (Throwable ignored) {
+                    // best-effort
+                }
+            }
             shutdownLatch.countDown();
+            // Failed-stop hand-off (see delegateEngineClose): the owner could
+            // not free the engine safely while this thread was alive, so the
+            // engine close — and with it the slot-lock release — happens
+            // here, strictly after this thread's last engine access. The flag
+            // is read only after the countDown: the store/load pairing with
+            // delegateEngineClose's flag-write-then-latch-read guarantees
+            // either this branch or the owner's fallback runs (or both —
+            // engine.close() is idempotent).
+            if (engineCloseDelegated) {
+                try {
+                    engine.close();
+                } catch (Throwable ignored) {
+                    // best-effort
+                }
+            }
         }
     }
 
@@ -1192,7 +1369,7 @@ private void positionCursorInSegment(MmapSegment seg, long targetFsn) {
 
     /**
      * Mark the loop as fatally failed. Caller has decided no reconnect
-     * is possible (or it ran out of budget) — latch the error so
+     * is possible — latch the error so
      * {@link #checkError} can surface it to the producer thread, then
      * stop the loop. First-writer-wins: only the first failure latches.
      * The check-then-latch is unsynchronized and is safe ONLY because
@@ -1279,7 +1456,7 @@ private void swapClient(WebSocketClient newClient) {
         this.client = newClient;
         // Sticky: once the wire is up, we've reached the server at least
         // once for this sender's lifetime. Used downstream to classify a
-        // subsequent budget exhaustion as transient vs config-likely.
+        // subsequent terminal failure as transient vs config-likely.
         this.hasEverConnected = true;
         if (old != null) {
             try {
diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/DefaultSenderConnectionListener.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/DefaultSenderConnectionListener.java
index adfb27f7..07213342 100644
--- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/DefaultSenderConnectionListener.java
+++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/DefaultSenderConnectionListener.java
@@ -36,9 +36,8 @@
  * transition so silence is never the default -- connect-string-only users
  * still see failover and outage signals in their logs.
  *
- * <p>Terminal kinds ({@code AUTH_FAILED}, {@code RECONNECT_BUDGET_EXHAUSTED})
- * and {@code ALL_ENDPOINTS_UNREACHABLE} fire at WARN level; everything else
- * fires at INFO.
+ * <p>Terminal kind {@code AUTH_FAILED} and {@code ALL_ENDPOINTS_UNREACHABLE}
+ * fire at WARN level; everything else fires at INFO.
  */
 public final class DefaultSenderConnectionListener implements SenderConnectionListener {
 
@@ -52,7 +51,6 @@ private DefaultSenderConnectionListener() {
     public void onEvent(@NotNull SenderConnectionEvent e) {
         switch (e.getKind()) {
             case AUTH_FAILED:
-            case RECONNECT_BUDGET_EXHAUSTED:
             case ALL_ENDPOINTS_UNREACHABLE:
                 LOG.warn("connection event {}", e);
                 break;
diff --git a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/SegmentManager.java b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/SegmentManager.java
index 2519a002..d96b8627 100644
--- a/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/SegmentManager.java
+++ b/core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/SegmentManager.java
@@ -143,10 +143,16 @@ public SegmentManager(long segmentSizeBytes, long pollNanos) {
      *                         hold an initial active plus one hot spare.
      */
     public SegmentManager(long segmentSizeBytes, long pollNanos, long maxTotalBytes) {
+        // The pathScratch field initializer has already allocated its native
+        // buffer by the time this body runs, so a validation throw must free
+        // it or every failed construction leaks 256 bytes of native memory
+        // (e.g. a drainer retry loop hitting the same bad config).
         if (segmentSizeBytes < MmapSegment.HEADER_SIZE + MmapSegment.FRAME_HEADER_SIZE + 1) {
+            pathScratch.close();
             throw new IllegalArgumentException("segmentSizeBytes too small: " + segmentSizeBytes);
         }
         if (maxTotalBytes < segmentSizeBytes) {
+            pathScratch.close();
             throw new IllegalArgumentException(
                     "maxTotalBytes (" + maxTotalBytes + ") must allow at least one segment of "
                             + segmentSizeBytes + " bytes");
diff --git a/core/src/main/java/io/questdb/client/impl/ConfigSchema.java b/core/src/main/java/io/questdb/client/impl/ConfigSchema.java
index b36f3207..0508428e 100644
--- a/core/src/main/java/io/questdb/client/impl/ConfigSchema.java
+++ b/core/src/main/java/io/questdb/client/impl/ConfigSchema.java
@@ -56,6 +56,7 @@ public final class ConfigSchema {
         str("tls_roots", Side.COMMON);
         str("tls_roots_password", Side.COMMON);
         longRange("auth_timeout_ms", Side.COMMON, 0, OPEN_MAX, true, false); // > 0
+        longRange("connect_timeout", Side.COMMON, 0, OPEN_MAX, true, false); // > 0
 
         // INGRESS -- the WebSocket Sender applies. STRING in the registry; the
         // Sender parses suffix/mode values (off/on, 64k, durability) with its
@@ -108,9 +109,11 @@ public final class ConfigSchema {
         intRange("query_pool_min", Side.POOL, OPEN, OPEN_MAX, false, false);
         intRange("query_pool_max", Side.POOL, OPEN, OPEN_MAX, false, false);
         longRange("acquire_timeout_ms", Side.POOL, OPEN, OPEN_MAX, false, false);
+        longRange("query_close_timeout_ms", Side.POOL, OPEN, OPEN_MAX, false, false);
         longRange("idle_timeout_ms", Side.POOL, OPEN, OPEN_MAX, false, false);
         longRange("max_lifetime_ms", Side.POOL, OPEN, OPEN_MAX, false, false);
         longRange("housekeeper_interval_ms", Side.POOL, OPEN, OPEN_MAX, false, false);
+        boolOnOff("lazy_connect", Side.POOL); // facade flag: tolerant non-blocking startup (async ingest + lazy reads)
 
         // RESERVED -- accepted no-op (error-policy keys reserved by the spec).
         str("on_internal_error", Side.RESERVED);
diff --git a/core/src/main/java/io/questdb/client/impl/ConfigView.java b/core/src/main/java/io/questdb/client/impl/ConfigView.java
index 1160c2d6..74621eef 100644
--- a/core/src/main/java/io/questdb/client/impl/ConfigView.java
+++ b/core/src/main/java/io/questdb/client/impl/ConfigView.java
@@ -95,6 +95,25 @@ public static String relocatedHint(String key) {
         return RELOCATED_HINTS.get(key);
     }
 
+    /**
+     * A boolean flag accepting {@code true}/{@code false} (and {@code on}/{@code off}
+     * for consistency with the rest of the connect-string surface). Returns
+     * {@code dflt} when the key is absent; throws on any other value.
+     */
+    public boolean getBool(String key, boolean dflt) {
+        String v = getStr(key);
+        if (v == null) {
+            return dflt;
+        }
+        if ("true".equals(v) || "on".equals(v)) {
+            return true;
+        }
+        if ("false".equals(v) || "off".equals(v)) {
+            return false;
+        }
+        throw new IllegalArgumentException("invalid " + key + ": " + v + " (expected true, false, on, off)");
+    }
+
     public boolean getBoolOnOff(String key, boolean dflt) {
         String v = getStr(key);
         if (v == null) {
diff --git a/core/src/main/java/io/questdb/client/impl/PooledSender.java b/core/src/main/java/io/questdb/client/impl/PooledSender.java
index 61d89296..e36a8384 100644
--- a/core/src/main/java/io/questdb/client/impl/PooledSender.java
+++ b/core/src/main/java/io/questdb/client/impl/PooledSender.java
@@ -37,123 +37,112 @@
 import java.time.temporal.ChronoUnit;
 
 /**
- * Decorator that lends a real {@link Sender} from {@link SenderPool}. The
- * decorator is pre-allocated once per pool slot and reused for every borrow.
+ * Thin per-borrow handle returned by {@link SenderPool#borrow()}. A fresh
+ * instance is created on every borrow, capturing the immutable lease
+ * {@code generation} stamped by {@code borrow()}; it forwards every
+ * {@link Sender} call to the reused {@link SenderSlot}'s delegate, validating
+ * that generation first via {@link SenderSlot#live(long)}.
  * <p>
- * Behavior difference from a raw Sender: {@link #close()} on a pooled Sender
- * flushes the buffer and returns the decorator to the pool. The underlying
- * Sender is only truly closed when {@link io.questdb.client.QuestDB#close()}
- * shuts down the pool.
+ * Behaviour difference from a raw Sender: {@link #close()} flushes the buffer
+ * and returns the slot to the pool. The underlying Sender is only truly closed
+ * when {@link io.questdb.client.QuestDB#close()} shuts the pool down.
+ * <p>
+ * Because the slot is reused across borrows, this wrapper -- not the slot --
+ * carries the lease identity. A stale handle (held after {@link #close()}, with
+ * the slot since re-borrowed) fails its generation check: data calls throw and
+ * {@link #close()} is a no-op, so it can never flush into, release, or be
+ * enqueued twice for a slot a different borrower now owns. This mirrors the
+ * egress {@code QueryLease} guard.
  */
 public final class PooledSender implements Sender {
 
-    private final long createdAtMillis;
-    private final Sender delegate;
-    private final SenderPool pool;
-    // Index of the store-and-forward slot this wrapper owns within the pool,
-    // or -1 when SF is disabled. Stable for the wrapper's whole life; the
-    // pool returns it to the free set only when the wrapper is evicted from
-    // {@code all} (discardBroken / reapIdle). Used to derive a distinct
-    // {@code sender_id} per pooled sender so concurrent SF senders sharing
-    // one {@code sf_dir} never collide on the slot {@code flock}.
-    private final int slotIndex;
-    private volatile long idleSinceMillis;
-    private volatile boolean inUse;
-    private volatile boolean invalidated;
-
-    PooledSender(Sender delegate, SenderPool pool, int slotIndex) {
-        this.delegate = delegate;
-        this.pool = pool;
-        this.slotIndex = slotIndex;
-        this.createdAtMillis = System.currentTimeMillis();
-        this.idleSinceMillis = this.createdAtMillis;
+    private final long generation;
+    private final SenderSlot slot;
+
+    PooledSender(SenderSlot slot, long generation) {
+        this.slot = slot;
+        this.generation = generation;
     }
 
     @Override
     public void at(long timestamp, ChronoUnit unit) {
-        delegate.at(timestamp, unit);
+        slot.live(generation).at(timestamp, unit);
     }
 
     @Override
     public void at(Instant timestamp) {
-        delegate.at(timestamp);
+        slot.live(generation).at(timestamp);
     }
 
     @Override
     public void atNow() {
-        delegate.atNow();
+        slot.live(generation).atNow();
     }
 
     @Override
     public boolean awaitAckedFsn(long targetFsn, long timeoutMillis) {
-        return delegate.awaitAckedFsn(targetFsn, timeoutMillis);
+        return slot.live(generation).awaitAckedFsn(targetFsn, timeoutMillis);
     }
 
     @Override
     public Sender binaryColumn(CharSequence name, byte[] value) {
-        delegate.binaryColumn(name, value);
+        slot.live(generation).binaryColumn(name, value);
         return this;
     }
 
     @Override
     public Sender binaryColumn(CharSequence name, long ptr, long len) {
-        delegate.binaryColumn(name, ptr, len);
+        slot.live(generation).binaryColumn(name, ptr, len);
         return this;
     }
 
     @Override
     public Sender binaryColumn(CharSequence name, DirectByteSlice slice) {
-        delegate.binaryColumn(name, slice);
+        slot.live(generation).binaryColumn(name, slice);
         return this;
     }
 
     @Override
     public Sender boolColumn(CharSequence name, boolean value) {
-        delegate.boolColumn(name, value);
+        slot.live(generation).boolColumn(name, value);
         return this;
     }
 
     @Override
     public DirectByteSlice bufferView() {
-        return delegate.bufferView();
+        return slot.live(generation).bufferView();
     }
 
     @Override
     public Sender byteColumn(CharSequence name, byte value) {
-        delegate.byteColumn(name, value);
+        slot.live(generation).byteColumn(name, value);
         return this;
     }
 
     @Override
     public void cancelRow() {
-        delegate.cancelRow();
+        slot.live(generation).cancelRow();
     }
 
     @Override
     public Sender charColumn(CharSequence name, char value) {
-        delegate.charColumn(name, value);
+        slot.live(generation).charColumn(name, value);
         return this;
     }
 
     /**
-     * Flushes pending rows and returns this decorator to the pool. Does not
-     * actually close the underlying {@link Sender}; that only happens when
-     * the owning {@code QuestDB} is closed.
-     * <p>
-     * Idempotent: a second call after a return is a no-op.
+     * Flushes pending rows and returns this lease's slot to the pool. Does not
+     * actually close the underlying {@link Sender}; that only happens when the
+     * owning {@code QuestDB} is closed.
      * <p>
-     * Clears the current thread's pin (if any) before the slot becomes
-     * borrowable again. Without this step a thread that pinned this
-     * wrapper and then closed it via the public {@link Sender#close()}
-     * (the natural try-with-resources idiom) would still hold the pin
-     * in its {@link ThreadLocal}; a subsequent {@code QuestDB.sender()}
-     * call on that thread would return the cached wrapper even though
-     * another consumer has since borrowed the slot, and the two
-     * consumers would write to the same underlying delegate.
+     * Idempotent: a stale generation (the lease was already returned and the
+     * slot possibly re-borrowed) is a no-op, so a double close cannot flush
+     * into, or re-enqueue, a slot a different borrower now owns. The pool
+     * re-checks the generation under its lock.
      */
     @Override
     public void close() {
-        if (!inUse) {
+        if (generation != slot.generation()) {
             return;
         }
         // Track normal completion rather than catching a specific throwable
@@ -163,257 +152,222 @@ public void close() {
         // abnormal exit as unrecyclable, which is the fail-safe default.
         boolean flushed = false;
         try {
-            delegate.flush();
+            slot.delegate().flush();
             flushed = true;
         } finally {
-            inUse = false;
-            // Clear the pin BEFORE returning the slot. If we cleared
-            // after giveBack(), a concurrent borrower could grab the
-            // slot while this thread's pin still references it, and a
-            // re-pin on this thread would return the (now in-use)
-            // wrapper -- the same race this clear is meant to close.
-            pool.clearPinIfCurrent(this);
             if (flushed) {
-                pool.giveBack(this);
+                slot.pool().giveBack(this);
             } else {
-                // flush() did not complete normally. Sender does not clear
-                // its buffer on flush failure (see Sender Javadoc), and
-                // WebSocket transport latches the failure for good. Either
-                // way the wrapper is unsafe to recycle: the next borrower
-                // would inherit the failed rows or a dead connection. The
-                // original throwable propagates naturally once this finally
-                // returns -- no explicit rethrow needed.
-                pool.discardBroken(this);
+                // flush() did not complete normally. Sender does not clear its
+                // buffer on flush failure (see Sender Javadoc), and WebSocket
+                // transport latches the failure for good. Either way the slot
+                // is unsafe to recycle: the next borrower would inherit the
+                // failed rows or a dead connection. The original throwable
+                // propagates naturally once this finally returns -- no explicit
+                // rethrow needed.
+                slot.pool().discardBroken(this);
             }
         }
     }
 
     @Override
     public Sender decimalColumn(CharSequence name, Decimal256 value) {
-        delegate.decimalColumn(name, value);
+        slot.live(generation).decimalColumn(name, value);
         return this;
     }
 
     @Override
     public Sender decimalColumn(CharSequence name, Decimal128 value) {
-        delegate.decimalColumn(name, value);
+        slot.live(generation).decimalColumn(name, value);
         return this;
     }
 
     @Override
     public Sender decimalColumn(CharSequence name, Decimal64 value) {
-        delegate.decimalColumn(name, value);
+        slot.live(generation).decimalColumn(name, value);
         return this;
     }
 
     @Override
     public Sender decimalColumn(CharSequence name, CharSequence value) {
-        delegate.decimalColumn(name, value);
+        slot.live(generation).decimalColumn(name, value);
         return this;
     }
 
     @Override
     public Sender doubleArray(@NotNull CharSequence name, double[] values) {
-        delegate.doubleArray(name, values);
+        slot.live(generation).doubleArray(name, values);
         return this;
     }
 
     @Override
     public Sender doubleArray(@NotNull CharSequence name, double[][] values) {
-        delegate.doubleArray(name, values);
+        slot.live(generation).doubleArray(name, values);
         return this;
     }
 
     @Override
     public Sender doubleArray(@NotNull CharSequence name, double[][][] values) {
-        delegate.doubleArray(name, values);
+        slot.live(generation).doubleArray(name, values);
         return this;
     }
 
     @Override
     public Sender doubleArray(CharSequence name, DoubleArray array) {
-        delegate.doubleArray(name, array);
+        slot.live(generation).doubleArray(name, array);
         return this;
     }
 
     @Override
     public Sender doubleColumn(CharSequence name, double value) {
-        delegate.doubleColumn(name, value);
+        slot.live(generation).doubleColumn(name, value);
         return this;
     }
 
     @Override
     public boolean drain(long timeoutMillis) {
-        return delegate.drain(timeoutMillis);
+        return slot.live(generation).drain(timeoutMillis);
     }
 
     @Override
     public Sender floatColumn(CharSequence name, float value) {
-        delegate.floatColumn(name, value);
+        slot.live(generation).floatColumn(name, value);
         return this;
     }
 
     @Override
     public void flush() {
-        delegate.flush();
+        slot.live(generation).flush();
     }
 
     @Override
     public long flushAndGetSequence() {
-        return delegate.flushAndGetSequence();
+        return slot.live(generation).flushAndGetSequence();
     }
 
     @Override
     public Sender geoHashColumn(CharSequence name, long bits, int precisionBits) {
-        delegate.geoHashColumn(name, bits, precisionBits);
+        slot.live(generation).geoHashColumn(name, bits, precisionBits);
         return this;
     }
 
     @Override
     public Sender geoHashColumn(CharSequence name, CharSequence value) {
-        delegate.geoHashColumn(name, value);
+        slot.live(generation).geoHashColumn(name, value);
         return this;
     }
 
     @Override
     public long getAckedFsn() {
-        return delegate.getAckedFsn();
+        return slot.live(generation).getAckedFsn();
     }
 
     @Override
     public Sender intColumn(CharSequence name, int value) {
-        delegate.intColumn(name, value);
+        slot.live(generation).intColumn(name, value);
         return this;
     }
 
     @Override
     public Sender ipv4Column(CharSequence name, int address) {
-        delegate.ipv4Column(name, address);
+        slot.live(generation).ipv4Column(name, address);
         return this;
     }
 
     @Override
     public Sender ipv4Column(CharSequence name, CharSequence address) {
-        delegate.ipv4Column(name, address);
+        slot.live(generation).ipv4Column(name, address);
         return this;
     }
 
     @Override
     public Sender long256Column(CharSequence name, long l0, long l1, long l2, long l3) {
-        delegate.long256Column(name, l0, l1, l2, l3);
+        slot.live(generation).long256Column(name, l0, l1, l2, l3);
         return this;
     }
 
     @Override
     public Sender longArray(@NotNull CharSequence name, long[] values) {
-        delegate.longArray(name, values);
+        slot.live(generation).longArray(name, values);
         return this;
     }
 
     @Override
     public Sender longArray(@NotNull CharSequence name, long[][] values) {
-        delegate.longArray(name, values);
+        slot.live(generation).longArray(name, values);
         return this;
     }
 
     @Override
     public Sender longArray(@NotNull CharSequence name, long[][][] values) {
-        delegate.longArray(name, values);
+        slot.live(generation).longArray(name, values);
         return this;
     }
 
     @Override
     public Sender longArray(@NotNull CharSequence name, LongArray values) {
-        delegate.longArray(name, values);
+        slot.live(generation).longArray(name, values);
         return this;
     }
 
     @Override
     public Sender longColumn(CharSequence name, long value) {
-        delegate.longColumn(name, value);
+        slot.live(generation).longColumn(name, value);
         return this;
     }
 
     @Override
     public void reset() {
-        delegate.reset();
+        slot.live(generation).reset();
     }
 
     @Override
     public Sender shortColumn(CharSequence name, short value) {
-        delegate.shortColumn(name, value);
+        slot.live(generation).shortColumn(name, value);
         return this;
     }
 
     @Override
     public Sender stringColumn(CharSequence name, CharSequence value) {
-        delegate.stringColumn(name, value);
+        slot.live(generation).stringColumn(name, value);
         return this;
     }
 
     @Override
     public Sender symbol(CharSequence name, CharSequence value) {
-        delegate.symbol(name, value);
+        slot.live(generation).symbol(name, value);
         return this;
     }
 
     @Override
     public Sender table(CharSequence table) {
-        delegate.table(table);
+        slot.live(generation).table(table);
         return this;
     }
 
     @Override
     public Sender timestampColumn(CharSequence name, long value, ChronoUnit unit) {
-        delegate.timestampColumn(name, value, unit);
+        slot.live(generation).timestampColumn(name, value, unit);
         return this;
     }
 
     @Override
     public Sender timestampColumn(CharSequence name, Instant value) {
-        delegate.timestampColumn(name, value);
+        slot.live(generation).timestampColumn(name, value);
         return this;
     }
 
     @Override
     public Sender uuidColumn(CharSequence name, long lo, long hi) {
-        delegate.uuidColumn(name, lo, hi);
+        slot.live(generation).uuidColumn(name, lo, hi);
         return this;
     }
 
-    long createdAtMillis() {
-        return createdAtMillis;
-    }
-
-    int slotIndex() {
-        return slotIndex;
-    }
-
-    Sender delegate() {
-        return delegate;
-    }
-
-    long idleSinceMillis() {
-        return idleSinceMillis;
-    }
-
-    boolean isInUse() {
-        return inUse;
-    }
-
-    boolean isInvalidated() {
-        return invalidated;
-    }
-
-    void markIdleAt(long nowMillis) {
-        idleSinceMillis = nowMillis;
-    }
-
-    void markInUse() {
-        inUse = true;
+    long generation() {
+        return generation;
     }
 
-    void markInvalidated() {
-        invalidated = true;
+    SenderSlot slot() {
+        return slot;
     }
 }
diff --git a/core/src/main/java/io/questdb/client/impl/QueryClientPool.java b/core/src/main/java/io/questdb/client/impl/QueryClientPool.java
index a6365dfa..cbbc150a 100644
--- a/core/src/main/java/io/questdb/client/impl/QueryClientPool.java
+++ b/core/src/main/java/io/questdb/client/impl/QueryClientPool.java
@@ -26,6 +26,7 @@
 
 import io.questdb.client.QueryException;
 import io.questdb.client.cutlass.qwp.client.QwpQueryClient;
+import org.jetbrains.annotations.TestOnly;
 
 import java.util.ArrayDeque;
 import java.util.ArrayList;
@@ -49,6 +50,12 @@
  */
 public final class QueryClientPool implements AutoCloseable {
 
+    // Default upper bound, in milliseconds, on how long Query.close() waits for
+    // an in-flight query to drain (after issuing a cancel) before discarding the
+    // worker. Mirrors the ingest side's close_flush_timeout_millis default so a
+    // close() can never block the caller unbounded. Tunable per pool via
+    // closeQueryTimeoutMillis(long).
+    static final long DEFAULT_CLOSE_QUERY_TIMEOUT_MILLIS = 5_000;
     private final long acquireTimeoutMillis;
     private final ArrayList<QueryWorker> all;
     private final ArrayDeque<QueryWorker> available;
@@ -75,6 +82,10 @@ public final class QueryClientPool implements AutoCloseable {
     private final AtomicInteger nextSlotIndex = new AtomicInteger();
     private final Condition workerReleased;
     private volatile boolean closed;
+    // Upper bound on the Query.close() drain wait; see
+    // DEFAULT_CLOSE_QUERY_TIMEOUT_MILLIS. Volatile because QuestDBImpl sets it
+    // once at build time on a different thread than the borrowers that read it.
+    private volatile long closeQueryTimeoutMillis = DEFAULT_CLOSE_QUERY_TIMEOUT_MILLIS;
     private int inFlightCreations;
 
     public QueryClientPool(
@@ -89,11 +100,12 @@ public QueryClientPool(
                 idleTimeoutMillis, maxLifetimeMillis, null);
     }
 
-    // Package-private constructor exposing the connectHook test seam: production
-    // passes null (-> the real QwpQueryClient.connect()). White-box tests in
-    // io.questdb.client.test.impl reach this by reflection to inject a hook that
-    // throws a non-RuntimeException Throwable from the native connect path.
-    QueryClientPool(
+    // Constructor exposing the connectHook seam. Production (QuestDBImpl) passes
+    // null -> the real QwpQueryClient.connect(); white-box tests pass a hook that
+    // throws a non-RuntimeException Throwable from the native connect path. This
+    // is the construction path QuestDBImpl uses, so it is a real (public) ctor,
+    // not test-only.
+    public QueryClientPool(
             String configurationString,
             int minSize,
             int maxSize,
@@ -106,13 +118,12 @@ public QueryClientPool(
                 idleTimeoutMillis, maxLifetimeMillis, connectHook, null);
     }
 
-    // Package-private constructor exposing both the connectHook and startHook
-    // test seams: production passes null for each (-> the real
-    // QwpQueryClient.connect() and QueryWorker.start()). White-box tests in
-    // io.questdb.client.test.impl reach this by reflection to inject a hook that
-    // throws a Throwable from either the native connect path (connectHook) or
-    // the worker thread-start path (startHook).
-    QueryClientPool(
+    // Constructor exposing both the connectHook and startHook seams. Production
+    // reaches it via the overload above (both null -> the real
+    // QwpQueryClient.connect() and QueryWorker.start()); white-box tests pass a
+    // hook that throws a Throwable from either the native connect path
+    // (connectHook) or the worker thread-start path (startHook).
+    public QueryClientPool(
             String configurationString,
             int minSize,
             int maxSize,
@@ -197,7 +208,12 @@ public QueryWorker acquire() {
                     throw new QueryException((byte) 0, "QuestDB handle is closed");
                 }
                 if (!available.isEmpty()) {
-                    return available.pollFirst();
+                    QueryWorker w = available.pollFirst();
+                    // Stamp a fresh lease id under the lock so the QueryLease
+                    // about to be handed out can be distinguished from any
+                    // prior, now-stale borrow of the same worker.
+                    w.bumpGeneration();
+                    return w;
                 }
                 if (all.size() + inFlightCreations < maxSize) {
                     inFlightCreations++;
@@ -248,6 +264,8 @@ public QueryWorker acquire() {
                         throw new QueryException((byte) 0, "QuestDB handle is closed");
                     }
                     all.add(created);
+                    // Stamp the first lease id for this freshly built worker.
+                    created.bumpGeneration();
                     return created;
                 }
                 if (remainingNanos <= 0) {
@@ -297,6 +315,87 @@ public void close() {
         }
     }
 
+    /**
+     * Cancels the in-flight query on {@code w} only while its lease generation
+     * still equals {@code gen}, holding the pool lock across both the check and
+     * the wire cancel. acquire() and release() bump the generation under this
+     * same lock, so once this method holds it the generation cannot change: a
+     * cancel whose lease has already gone stale (the worker was released and
+     * re-borrowed) is dropped instead of aborting the new borrower's query. The
+     * cancel itself is non-blocking -- a volatile flag plus an AtomicLong set --
+     * so the lock is held only briefly.
+     */
+    void cancelIfCurrent(QueryWorker w, long gen) {
+        lock.lock();
+        try {
+            if (closed) {
+                return;
+            }
+            if (w.generation() != gen) {
+                return;
+            }
+            w.cancelInFlight();
+        } finally {
+            lock.unlock();
+        }
+    }
+
+    long closeQueryTimeoutMillis() {
+        return closeQueryTimeoutMillis;
+    }
+
+    void closeQueryTimeoutMillis(long millis) {
+        this.closeQueryTimeoutMillis = millis;
+    }
+
+    /**
+     * Evicts a worker whose lease {@link QueryImpl#close(long)} could not drain
+     * the in-flight query within {@link #closeQueryTimeoutMillis} (the cancel
+     * was not honored in time, or the caller was interrupted). The worker's
+     * connection is left in an unknown protocol state -- a late {@code RESULT_*}
+     * frame for the abandoned query could corrupt the next borrower's stream --
+     * so it must NOT return to the pool. Removes it from {@code all} (freeing
+     * capacity for a fresh worker) and tears it down outside the lock via
+     * {@link QueryWorker#shutdown()}, which interrupts the dispatch thread so a
+     * stuck {@code execute()} returns promptly.
+     * <p>
+     * Bails when the pool is already closed: {@link #close()} owns the teardown
+     * of every worker via its snapshot loop, so mutating {@code all} here would
+     * race that iteration on a non-thread-safe {@code ArrayList}. Also bails on a
+     * stale generation -- the worker was already released/discarded and possibly
+     * re-borrowed, so discarding it would evict a worker a different borrower now
+     * owns. Mirrors {@link SenderPool#discardBroken} on the ingest side.
+     */
+    void discard(QueryWorker w, long gen) {
+        lock.lock();
+        try {
+            if (closed) {
+                return;
+            }
+            if (w.generation() != gen) {
+                return;
+            }
+            // Invalidate the lease so a duplicate close()/release with the same
+            // generation is dropped and the in-flight handle can no longer drive
+            // this worker.
+            w.bumpGeneration();
+            all.remove(w);
+            // Capacity freed -- a waiter in acquire() may now create a fresh
+            // worker in this slot's place.
+            workerReleased.signal();
+        } finally {
+            lock.unlock();
+        }
+        // Tear down outside the lock so a slow join doesn't keep the pool
+        // latched. shutdown() is best-effort and idempotent.
+        try {
+            w.shutdown();
+        } catch (Throwable ignored) {
+            // Best-effort: a teardown Error (e.g. an -ea AssertionError) must
+            // not propagate out of Query.close().
+        }
+    }
+
     void reapIdle() {
         if (closed) {
             return;
@@ -340,14 +439,30 @@ void reapIdle() {
         }
     }
 
-    void release(QueryWorker w) {
-        long now = System.currentTimeMillis();
-        w.markIdleAt(now);
+    void release(QueryWorker w, long gen) {
         lock.lock();
         try {
             if (closed) {
                 return;
             }
+            if (w.generation() != gen) {
+                // Stale release: this lease was already returned and the worker
+                // has since been re-borrowed (or this is a duplicate close of an
+                // already-released lease). Dropping it is what makes
+                // Query.close() idempotent even under a concurrent re-borrow --
+                // without this guard a double close would enqueue the worker
+                // twice and hand it to two borrowers at once, corrupting the
+                // whole pool. The flag a stale close() reads is no longer its
+                // own lease's, so a non-validated release path could not catch
+                // this; the generation captured at borrow time can.
+                return;
+            }
+            // Invalidate the just-returned lease so a duplicate release with the
+            // same generation is also dropped and the in-flight handle can no
+            // longer drive this worker.
+            w.bumpGeneration();
+            w.markIdleAt(System.currentTimeMillis());
+            assert !available.contains(w) : "worker already present in available deque on release";
             available.addLast(w);
             workerReleased.signal();
         } finally {
@@ -355,11 +470,12 @@ void release(QueryWorker w) {
         }
     }
 
-    // Package-private white-box accessor for tests: reports the current
-    // in-flight creation count under the pool lock. A non-zero value after a
-    // failed acquire() means the slot reservation was never released -- the
-    // capacity-shrink bug this guards against.
-    int inFlightCreations() {
+    // White-box accessor for tests: reports the current in-flight creation count
+    // under the pool lock. A non-zero value after a failed acquire() means the
+    // slot reservation was never released -- the capacity-shrink bug this guards
+    // against.
+    @TestOnly
+    public int inFlightCreations() {
         lock.lock();
         try {
             return inFlightCreations;
diff --git a/core/src/main/java/io/questdb/client/impl/QueryImpl.java b/core/src/main/java/io/questdb/client/impl/QueryImpl.java
index fc80d263..baf483ea 100644
--- a/core/src/main/java/io/questdb/client/impl/QueryImpl.java
+++ b/core/src/main/java/io/questdb/client/impl/QueryImpl.java
@@ -24,8 +24,6 @@
 
 package io.questdb.client.impl;
 
-import io.questdb.client.Completion;
-import io.questdb.client.Query;
 import io.questdb.client.QueryException;
 import io.questdb.client.cutlass.qwp.client.QwpBindSetter;
 import io.questdb.client.cutlass.qwp.client.QwpBindValues;
@@ -40,39 +38,54 @@
 import java.util.concurrent.locks.ReentrantLock;
 
 /**
- * Per-thread implementation of {@link Query}. Holds the configured query
- * state (SQL, optional binds, handler), an inner {@link Completion}, and a
- * wrapping {@link QwpColumnBatchHandler} that forwards callbacks to the user
- * handler and signals the Completion on terminal events.
+ * Reusable per-{@link QueryWorker} query state: the configured SQL, optional
+ * binds, handler, terminal-event signalling, and a wrapping
+ * {@link QwpColumnBatchHandler} that forwards callbacks to the user handler and
+ * signals completion on terminal events. One instance is pre-allocated per
+ * worker in the constructor and reused across every borrow.
  * <p>
- * Lifecycle: {@link QuestDBImpl#query()} returns a per-thread instance, reset
- * to empty if it was in a terminal state. {@link #submit()} acquires a
- * worker, dispatches, and returns the cached {@link Completion}.
+ * Because the instance is shared across borrows, it must never be handed to a
+ * caller directly -- a stale reference would leak into a later borrow's
+ * lifecycle. Callers instead receive a thin, per-borrow {@link QueryLease} that
+ * carries the lease {@code generation} stamped at borrow time and passes it
+ * into every operation here. Each operation validates that generation against
+ * {@link QueryWorker#generation()}:
+ * <ul>
+ *   <li>builder/await operations on a stale generation throw
+ *       {@code IllegalStateException} ("query handle is closed"),</li>
+ *   <li>{@link #close(long)} and {@link #cancel(long)} on a stale generation are
+ *       no-ops -- this is what makes {@code Query.close()} idempotent and
+ *       prevents a stale handle from releasing, or cancelling the in-flight
+ *       query of, a worker a different borrower now owns.</li>
+ * </ul>
+ * <p>
+ * Lifecycle: {@link QueryWorker#lease()} resets this state and wraps it in a
+ * fresh {@link QueryLease} when {@link QuestDBImpl#borrowQuery()} acquires the
+ * worker. {@link #submit(long)} dispatches on the held worker (single-flight);
+ * {@link #close(long)} returns the worker to the pool.
  */
-final class QueryImpl implements Query {
+final class QueryImpl {
 
-    private final InnerCompletion completion = new InnerCompletion();
     private final Condition doneCondition;
     private final ReentrantLock doneLock = new ReentrantLock();
-    private final QueryClientPool pool;
     private final StringSink sqlBuffer = new StringSink();
+    private final QueryWorker worker;
+    private final QwpBindSetter wireBinds = this::applyBinds;
     private final WrappingHandler wrappingHandler = new WrappingHandler();
-    private volatile QueryWorker currentWorker;
     private volatile boolean done = true;
     private volatile String resultMessage;
     private volatile byte resultStatus;
     private volatile Throwable unexpectedError;
     private QwpBindSetter userBinds;
-    private final QwpBindSetter wireBinds = this::applyBinds;
     private QwpColumnBatchHandler userHandler;
 
-    QueryImpl(QueryClientPool pool) {
-        this.pool = pool;
+    QueryImpl(QueryWorker worker) {
+        this.worker = worker;
         this.doneCondition = doneLock.newCondition();
     }
 
-    @Override
-    public void abandon() {
+    void abandon(long gen) {
+        checkLive(gen);
         if (!done) {
             throw new IllegalStateException("a previous submit() is still in flight; await the Completion first");
         }
@@ -81,27 +94,113 @@ public void abandon() {
         sqlBuffer.clear();
     }
 
-    @Override
-    public Query binds(QwpBindSetter binds) {
+    void await(long gen) throws InterruptedException {
+        rejectHandlerReentry("await");
+        checkLive(gen);
+        doneLock.lock();
+        try {
+            while (!done) {
+                doneCondition.await();
+            }
+        } finally {
+            doneLock.unlock();
+        }
+        throwIfFailed();
+    }
+
+    boolean await(long gen, long timeout, TimeUnit unit) throws InterruptedException {
+        rejectHandlerReentry("await");
+        checkLive(gen);
+        long remaining = unit.toNanos(timeout);
+        doneLock.lock();
+        try {
+            while (!done) {
+                if (remaining <= 0) {
+                    return false;
+                }
+                remaining = doneCondition.awaitNanos(remaining);
+            }
+        } finally {
+            doneLock.unlock();
+        }
+        throwIfFailed();
+        return true;
+    }
+
+    void cancel(long gen) {
+        // Fast-path drop of an obviously-stale or already-finished cancel,
+        // without taking the pool lock. This is only a hint -- the
+        // authoritative re-check runs under the pool lock inside
+        // worker.cancelInFlight(gen).
+        if (gen != worker.generation() || done) {
+            return;
+        }
+        // Re-check the lease generation and issue the wire cancel atomically
+        // under the pool lock (the same lock acquire()/release() bump the
+        // generation under). An unlocked check followed by an unlocked cancel
+        // is a TOCTOU: a cross-thread watchdog can pass the check, get
+        // preempted while this lease is released and the worker re-borrowed by
+        // another caller, then resume and abort that caller's in-flight query.
+        worker.cancelInFlight(gen);
+    }
+
+    void close(long gen) {
+        rejectHandlerReentry("close");
+        // A stale generation means this lease was already released and the
+        // worker may now be owned by another borrower. Dropping the call is
+        // what keeps close() idempotent without releasing someone else's
+        // worker or cancelling their in-flight query. release() re-checks the
+        // generation under the pool lock, so the worker can never be enqueued
+        // twice even if two threads race a close on the same live lease.
+        if (gen != worker.generation()) {
+            return;
+        }
+        // If a submit is still in flight (the caller did not await, or its
+        // await timed out), cancel it and wait for the terminal event so the
+        // leased worker is idle before it returns to the pool -- otherwise the
+        // next borrower would inherit a running execute().
+        //
+        // The wait is bounded (closeQueryTimeoutMillis) and interruptible, so a
+        // caller that bounded its own await() is never pinned to the full
+        // remaining query duration here. If the query does NOT drain in time (a
+        // server slow to honor the cancel, or the caller interrupting), the
+        // worker is still running execute() on a connection whose protocol state
+        // is now uncertain -- a late RESULT_* for the abandoned query could
+        // corrupt the next borrower's stream -- so it is discarded rather than
+        // returned. The pool grows a fresh worker on the next borrow.
+        if (!done) {
+            worker.cancelInFlight(gen);
+            if (!awaitDone(worker.closeQueryTimeoutMillis())) {
+                worker.discardFromPool(gen);
+                return;
+            }
+        }
+        worker.releaseToPool(gen);
+    }
+
+    boolean isDone(long gen) {
+        checkLive(gen);
+        return done;
+    }
+
+    void setBinds(long gen, QwpBindSetter binds) {
+        checkLive(gen);
         this.userBinds = binds;
-        return this;
     }
 
-    @Override
-    public Query handler(QwpColumnBatchHandler handler) {
+    void setHandler(long gen, QwpColumnBatchHandler handler) {
+        checkLive(gen);
         this.userHandler = handler;
-        return this;
     }
 
-    @Override
-    public Query sql(CharSequence sql) {
+    void setSql(long gen, CharSequence sql) {
+        checkLive(gen);
         sqlBuffer.clear();
         sqlBuffer.put(sql);
-        return this;
     }
 
-    @Override
-    public Completion submit() {
+    void submit(long gen) {
+        checkLive(gen);
         if (sqlBuffer.length() == 0) {
             throw new IllegalStateException("sql is required");
         }
@@ -111,7 +210,6 @@ public Completion submit() {
         if (!done) {
             throw new IllegalStateException("a previous submit() is still in flight; await the Completion first");
         }
-        QueryWorker w = pool.acquire();
         // Reset terminal state under the lock so a stale signal from a prior
         // run can't be observed by the upcoming await().
         doneLock.lock();
@@ -120,12 +218,10 @@ public Completion submit() {
             resultStatus = 0;
             resultMessage = null;
             unexpectedError = null;
-            currentWorker = w;
         } finally {
             doneLock.unlock();
         }
-        w.dispatch(this);
-        return completion;
+        worker.dispatch(this);
     }
 
     private void applyBinds(QwpBindValues binds) {
@@ -135,6 +231,56 @@ private void applyBinds(QwpBindValues binds) {
         }
     }
 
+    /**
+     * Waits up to {@code timeoutMillis} for the in-flight query's terminal
+     * event. Returns {@code true} once {@code done} is set, {@code false} on
+     * timeout or interrupt. Unlike an uninterruptible drain, an interrupt aborts
+     * the wait and re-raises the thread's interrupt flag, so {@code close()}
+     * stays responsive to a caller that wants to give up.
+     */
+    private boolean awaitDone(long timeoutMillis) {
+        long remaining = TimeUnit.MILLISECONDS.toNanos(timeoutMillis);
+        doneLock.lock();
+        try {
+            while (!done) {
+                if (remaining <= 0) {
+                    return false;
+                }
+                try {
+                    remaining = doneCondition.awaitNanos(remaining);
+                } catch (InterruptedException e) {
+                    Thread.currentThread().interrupt();
+                    return false;
+                }
+            }
+            return true;
+        } finally {
+            doneLock.unlock();
+        }
+    }
+
+    private void checkLive(long gen) {
+        if (gen != worker.generation()) {
+            throw new IllegalStateException("query handle is not borrowed (closed or never leased)");
+        }
+    }
+
+    private void rejectHandlerReentry(String op) {
+        // Result handlers (onBatch/onEnd/onError) run inline on the worker's
+        // dispatch thread. A blocking lease op called from there would wait for
+        // a terminal event that only this same thread can deliver -- a
+        // permanent, uninterruptible self-deadlock plus a leaked worker. Fail
+        // loudly at the call site instead. cancel() is the non-blocking stop.
+        if (worker.isCurrentThreadWorker()) {
+            throw new IllegalStateException(
+                    op + "() must not be called from a result handler. Handlers "
+                            + "(onBatch/onEnd/onError) run on the worker thread, so " + op
+                            + "() would block forever waiting for a terminal event that only "
+                            + "this same thread can deliver. To stop a query from inside a "
+                            + "handler, call cancel() (non-blocking).");
+        }
+    }
+
     private void signalDone(byte status, String message, Throwable unexpected) {
         doneLock.lock();
         try {
@@ -145,27 +291,38 @@ private void signalDone(byte status, String message, Throwable unexpected) {
             this.resultMessage = message;
             this.unexpectedError = unexpected;
             this.done = true;
-            this.currentWorker = null;
             doneCondition.signalAll();
         } finally {
             doneLock.unlock();
         }
     }
 
+    private void throwIfFailed() {
+        Throwable unexpected = unexpectedError;
+        if (unexpected != null) {
+            throw new QueryException(resultStatus, resultMessage, unexpected);
+        }
+        if (resultStatus != 0) {
+            throw new QueryException(resultStatus, resultMessage);
+        }
+    }
+
     /**
-     * Drops any prior builder state (SQL, binds, handler) if no submit is
-     * currently in flight. {@link QuestDBImpl#query()} invokes this before
-     * returning the per-thread instance so callers see the "reset to empty"
-     * contract documented on {@link io.questdb.client.Query} regardless of
-     * whether the previous use ended at a terminal handler callback or at
-     * {@link #abandon()}.
+     * Resets builder and terminal state to empty. Called by
+     * {@link QueryWorker#lease()} when {@link QuestDBImpl#borrowQuery()} hands a
+     * freshly stamped {@link QueryLease} out, so each borrow starts from the
+     * documented "reset to empty" contract on {@link io.questdb.client.Query}.
+     * The leased worker is idle at this point (just acquired from the pool), so
+     * the reset is unconditional.
      */
-    void resetIfDone() {
-        if (done) {
-            userBinds = null;
-            userHandler = null;
-            sqlBuffer.clear();
-        }
+    void resetForBorrow() {
+        userBinds = null;
+        userHandler = null;
+        sqlBuffer.clear();
+        resultStatus = 0;
+        resultMessage = null;
+        unexpectedError = null;
+        done = true;
     }
 
     void runOn(QwpQueryClient client) {
@@ -185,63 +342,6 @@ void signalUnexpected(Throwable t) {
         signalDone((byte) 0, t.getMessage() != null ? t.getMessage() : t.getClass().getSimpleName(), t);
     }
 
-    private final class InnerCompletion implements Completion {
-
-        @Override
-        public void await() throws InterruptedException {
-            doneLock.lock();
-            try {
-                while (!done) {
-                    doneCondition.await();
-                }
-            } finally {
-                doneLock.unlock();
-            }
-            throwIfFailed();
-        }
-
-        @Override
-        public boolean await(long timeout, TimeUnit unit) throws InterruptedException {
-            long remaining = unit.toNanos(timeout);
-            doneLock.lock();
-            try {
-                while (!done) {
-                    if (remaining <= 0) {
-                        return false;
-                    }
-                    remaining = doneCondition.awaitNanos(remaining);
-                }
-            } finally {
-                doneLock.unlock();
-            }
-            throwIfFailed();
-            return true;
-        }
-
-        @Override
-        public void cancel() {
-            QueryWorker w = currentWorker;
-            if (w != null && !done) {
-                w.cancelInFlight();
-            }
-        }
-
-        @Override
-        public boolean isDone() {
-            return done;
-        }
-
-        private void throwIfFailed() {
-            Throwable unexpected = unexpectedError;
-            if (unexpected != null) {
-                throw new QueryException(resultStatus, resultMessage, unexpected);
-            }
-            if (resultStatus != 0) {
-                throw new QueryException(resultStatus, resultMessage);
-            }
-        }
-    }
-
     private final class WrappingHandler implements QwpColumnBatchHandler {
 
         @Override
diff --git a/core/src/main/java/io/questdb/client/impl/QueryLease.java b/core/src/main/java/io/questdb/client/impl/QueryLease.java
new file mode 100644
index 00000000..6083b802
--- /dev/null
+++ b/core/src/main/java/io/questdb/client/impl/QueryLease.java
@@ -0,0 +1,110 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.impl;
+
+import io.questdb.client.Completion;
+import io.questdb.client.Query;
+import io.questdb.client.cutlass.qwp.client.QwpBindSetter;
+import io.questdb.client.cutlass.qwp.client.QwpColumnBatchHandler;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Thin per-borrow handle returned by {@link QuestDBImpl#borrowQuery()}. A fresh
+ * instance is created on every borrow, capturing the immutable lease
+ * {@code generation} stamped by {@link QueryClientPool#acquire()}; it delegates
+ * every {@link Query} and {@link Completion} operation to the worker's reused
+ * {@link QueryImpl}, threading that generation through so a stale handle cannot
+ * disturb a later borrow on the same worker (see {@link QueryImpl}).
+ * <p>
+ * It implements {@link Completion} as well as {@link Query} so {@link #submit()}
+ * can return {@code this} -- the per-submit path stays allocation-free, and the
+ * single small allocation happens once per borrow (and is routinely
+ * scalar-replaced by the JIT in the common try-with-resources case).
+ */
+final class QueryLease implements Query, Completion {
+
+    private final long generation;
+    private final QueryImpl impl;
+
+    QueryLease(QueryImpl impl, long generation) {
+        this.impl = impl;
+        this.generation = generation;
+    }
+
+    @Override
+    public void abandon() {
+        impl.abandon(generation);
+    }
+
+    @Override
+    public void await() throws InterruptedException {
+        impl.await(generation);
+    }
+
+    @Override
+    public boolean await(long timeout, TimeUnit unit) throws InterruptedException {
+        return impl.await(generation, timeout, unit);
+    }
+
+    @Override
+    public Query binds(QwpBindSetter binds) {
+        impl.setBinds(generation, binds);
+        return this;
+    }
+
+    @Override
+    public void cancel() {
+        impl.cancel(generation);
+    }
+
+    @Override
+    public void close() {
+        impl.close(generation);
+    }
+
+    @Override
+    public Query handler(QwpColumnBatchHandler handler) {
+        impl.setHandler(generation, handler);
+        return this;
+    }
+
+    @Override
+    public boolean isDone() {
+        return impl.isDone(generation);
+    }
+
+    @Override
+    public Query sql(CharSequence sql) {
+        impl.setSql(generation, sql);
+        return this;
+    }
+
+    @Override
+    public Completion submit() {
+        impl.submit(generation);
+        return this;
+    }
+}
diff --git a/core/src/main/java/io/questdb/client/impl/QueryWorker.java b/core/src/main/java/io/questdb/client/impl/QueryWorker.java
index f4f641c8..040ade63 100644
--- a/core/src/main/java/io/questdb/client/impl/QueryWorker.java
+++ b/core/src/main/java/io/questdb/client/impl/QueryWorker.java
@@ -24,6 +24,7 @@
 
 package io.questdb.client.impl;
 
+import io.questdb.client.Query;
 import io.questdb.client.QueryException;
 import io.questdb.client.cutlass.qwp.client.QwpQueryClient;
 
@@ -39,7 +40,11 @@
  * The pooled query client's own I/O thread continues to drive the wire; the
  * worker thread exists only to keep {@code execute()} off the application's
  * submitting thread. Handler callbacks ({@code onBatch}, {@code onEnd},
- * {@code onError}) still run on the client's I/O thread.
+ * {@code onError}) run on this worker's own dispatch thread, which consumes the
+ * I/O thread's event queue inline -- not on the I/O thread itself. A handler
+ * must therefore never call the lease's blocking {@code close()}/{@code await()}
+ * (it would self-deadlock waiting for a terminal event only this thread can
+ * deliver); use the non-blocking {@code cancel()} to stop from inside a handler.
  */
 public final class QueryWorker {
 
@@ -47,16 +52,38 @@ public final class QueryWorker {
     private final QwpQueryClient client;
     private final long createdAtMillis;
     private final QueryClientPool pool;
+    private final QueryImpl query;
     private final Condition signalCondition;
     private final ReentrantLock signalLock = new ReentrantLock();
     private final Thread thread;
     private volatile QueryImpl current;
+    // Test-only deterministic barrier for the busy-worker shutdown-drop race
+    // fixed in df6f7ca (while (!shuttingDown) -> while (true)). Null in
+    // production -- the only cost is the null check in runLoop(). A regression
+    // test installs a hook that runs ON THE WORKER THREAD right after a job
+    // returns from runOn() and before the loop re-enters the strand check, to
+    // re-arm current with a re-dispatched job and flip shuttingDown -- exactly
+    // the window where the old top-of-loop check dropped a pending job. The
+    // classes involved (QueryWorker, QueryImpl) are final and QwpQueryClient
+    // has no test seam, so this is the only race-free reproduction point. See
+    // QueryWorkerTest.testBusyWorkerShutdownStrandsReDispatchedCurrent.
+    volatile Runnable busyWorkerTestHook;
+    // Monotonic lease id. Mutated only under the QueryClientPool lock
+    // (bumped once in acquire() when the worker is handed out and once in
+    // release() when it is returned), so successive borrows of the same
+    // worker get distinct ids. A QueryLease captures the value live during
+    // its borrow; once the worker is released or re-borrowed the captured id
+    // no longer matches, which is how a stale handle's close()/cancel()/
+    // submit() are detected and dropped. Volatile so a stale handle on another
+    // thread observes the latest value without taking the pool lock.
+    private volatile long generation;
     private volatile long idleSinceMillis;
     private volatile boolean shuttingDown;
 
     public QueryWorker(QwpQueryClient client, QueryClientPool pool, int slotIndex) {
         this.client = client;
         this.pool = pool;
+        this.query = new QueryImpl(this);
         this.signalCondition = signalLock.newCondition();
         this.thread = new Thread(this::runLoop, "questdb-query-worker-" + slotIndex);
         this.thread.setDaemon(true);
@@ -68,17 +95,48 @@ long createdAtMillis() {
         return createdAtMillis;
     }
 
+    /**
+     * Advances the lease generation. Called by {@link QueryClientPool} under
+     * the pool lock when this worker is handed out (acquire) and when it is
+     * returned (release).
+     */
+    void bumpGeneration() {
+        generation++;
+    }
+
+    /**
+     * Current lease generation. See {@link #generation} for the visibility and
+     * mutation contract.
+     */
+    long generation() {
+        return generation;
+    }
+
     long idleSinceMillis() {
         return idleSinceMillis;
     }
 
+    /**
+     * True when the calling thread is this worker's own dispatch thread -- i.e.
+     * a reentrant call from inside a result handler, which runs inline on this
+     * thread. Blocking lease operations ({@link QueryImpl#close}/
+     * {@link QueryImpl#await}) use this to fail loudly instead of
+     * self-deadlocking.
+     */
+    boolean isCurrentThreadWorker() {
+        return Thread.currentThread() == thread;
+    }
+
     void markIdleAt(long nowMillis) {
         idleSinceMillis = nowMillis;
     }
 
     /**
-     * Cancels the in-flight query on this worker's client. Safe to call from
-     * any thread; harmless if the worker is idle.
+     * Issues an unconditional wire cancel against whatever query this worker's
+     * client is currently running. Callers must already own the worker for the
+     * current lease -- in practice this runs under the pool lock via
+     * {@link QueryClientPool#cancelIfCurrent}, which validates the lease
+     * generation first. Lease code must use {@link #cancelInFlight(long)}.
      */
     void cancelInFlight() {
         try {
@@ -88,6 +146,18 @@ void cancelInFlight() {
         }
     }
 
+    /**
+     * Cancels the in-flight query only if this worker's lease generation still
+     * equals {@code gen}. Delegates to the pool so the generation re-check and
+     * the wire cancel happen together under the pool lock that
+     * {@link QueryClientPool#acquire} and {@link QueryClientPool#release} bump
+     * the generation under. That atomicity stops a stale cross-thread cancel
+     * from aborting a later borrower's query on the same worker.
+     */
+    void cancelInFlight(long gen) {
+        pool.cancelIfCurrent(this, gen);
+    }
+
     /**
      * Returns the {@link QwpQueryClient} this worker drives. Exposed for
      * introspection and tests; callers must not invoke {@code execute()} on
@@ -97,6 +167,44 @@ public QwpQueryClient client() {
         return client;
     }
 
+    /**
+     * Resets the worker's reused {@link QueryImpl} and returns a fresh
+     * {@link QueryLease} stamped with the current lease {@link #generation}.
+     * Called by {@link QuestDBImpl#borrowQuery()} right after
+     * {@link QueryClientPool#acquire()} hands this worker out (which bumped the
+     * generation under the pool lock). The lease is a small per-borrow handle;
+     * the heavy state stays on the reused {@link QueryImpl}, and the per-submit
+     * path remains allocation-free.
+     */
+    Query lease() {
+        query.resetForBorrow();
+        return new QueryLease(query, generation);
+    }
+
+    long closeQueryTimeoutMillis() {
+        return pool.closeQueryTimeoutMillis();
+    }
+
+    /**
+     * Discards this worker from the pool instead of returning it. Called by
+     * {@link QueryImpl#close(long)} when the in-flight query could not be
+     * drained within the close budget, leaving the connection in an unknown
+     * protocol state. The captured lease {@code gen} lets the pool reject a
+     * stale discard whose worker has already been re-borrowed.
+     */
+    void discardFromPool(long gen) {
+        pool.discard(this, gen);
+    }
+
+    /**
+     * Returns this worker to the pool. Called by {@link QueryImpl#close(long)}
+     * when the borrowed lease is released; the captured lease {@code gen} lets
+     * the pool reject a stale release whose worker has already been re-borrowed.
+     */
+    void releaseToPool(long gen) {
+        pool.release(this, gen);
+    }
+
     void shutdown() {
         shuttingDown = true;
         signalLock.lock();
@@ -106,10 +214,19 @@ void shutdown() {
             signalLock.unlock();
         }
         try {
-            // If a query is in flight on this worker, ask the client to abort so
-            // execute() returns promptly and the thread can exit before join
-            // times out. cancel() is documented as thread-safe and is a no-op
-            // when idle.
+            // If a query is in flight on this worker, force execute() to return
+            // promptly so the dispatch thread exits before the join below times
+            // out. Two nudges, strongest first:
+            //   1. Interrupt the dispatch thread. takeEvent() (QwpSpscQueue.take)
+            //      is interrupt-aware, and executeOnce() turns the resulting
+            //      InterruptedException into a terminal event -> signalDone. This
+            //      releases a caller parked in Query.close() even when the I/O
+            //      thread is wedged and client.close()'s synthetic terminal
+            //      (closePool()) never runs -- the race that would otherwise
+            //      strand the caller forever.
+            //   2. Ask the client to cancel on the wire so the server stops work.
+            //      Best-effort and a no-op when idle.
+            thread.interrupt();
             try {
                 client.cancel();
             } catch (Throwable ignored) {
@@ -140,8 +257,10 @@ void start() {
     }
 
     /**
-     * Hands a configured {@link QueryImpl} to this worker. The caller must
-     * have just acquired this worker via QueryClientPool#acquire(long).
+     * Hands a configured {@link QueryImpl} to this worker for execution. The
+     * worker is held by an open {@link io.questdb.client.Query} lease (see
+     * {@link #lease()}), so a lease may dispatch repeatedly (single-flight)
+     * until it is closed.
      */
     void dispatch(QueryImpl q) {
         signalLock.lock();
@@ -161,7 +280,18 @@ void dispatch(QueryImpl q) {
     }
 
     private void runLoop() {
-        while (!shuttingDown) {
+        // Loop unconditionally -- do NOT hoist the shuttingDown check up here as
+        // while (!shuttingDown). The sole exit is the "if (shuttingDown) return"
+        // inside the signalLock block below, which strands a pending current
+        // before returning. Exiting at the top instead would skip that strand on
+        // the busy-worker path: when a reused lease's submit() -> dispatch() sets
+        // current between the terminal callback and this check, and shutdown()
+        // then flips shuttingDown, the worker would return straight after
+        // runOn() without re-inspecting current -- the job is dropped, never
+        // run, never signalled, and its caller's await() hangs forever.
+        // Re-entering the lock every lap funnels every shutdown ordering through
+        // the single strand point.
+        while (true) {
             QueryImpl q;
             signalLock.lock();
             try {
@@ -181,6 +311,17 @@ private void runLoop() {
                     return;
                 }
                 q = current;
+                // Clear the hand-off slot under signalLock, at the moment of
+                // consumption -- NOT after runOn() returns. A lease is
+                // single-flight but reused: the user thread loops submit() ->
+                // await() on the same handle. The terminal callback inside
+                // runOn() wakes the user thread, which can call submit() ->
+                // dispatch() (current = q; signal) before this worker thread
+                // returns from runOn(). Clearing current after runOn() would
+                // race that dispatch, clobber the freshly-set job, drop its
+                // already-consumed signal, and park the worker forever while
+                // the user thread waits on a Completion that never fires.
+                current = null;
             } finally {
                 signalLock.unlock();
             }
@@ -188,9 +329,12 @@ private void runLoop() {
                 q.runOn(client);
             } catch (Throwable t) {
                 q.signalUnexpected(t);
-            } finally {
-                current = null;
-                pool.release(this);
+            }
+            // Test-only barrier: deterministically reproduce the busy-worker
+            // shutdown-drop race (df6f7ca) at its exact site. Null in production.
+            Runnable hook = busyWorkerTestHook;
+            if (hook != null) {
+                hook.run();
             }
         }
     }
diff --git a/core/src/main/java/io/questdb/client/impl/QuestDBImpl.java b/core/src/main/java/io/questdb/client/impl/QuestDBImpl.java
index 5bba8d46..e3da539b 100644
--- a/core/src/main/java/io/questdb/client/impl/QuestDBImpl.java
+++ b/core/src/main/java/io/questdb/client/impl/QuestDBImpl.java
@@ -24,27 +24,31 @@
 
 package io.questdb.client.impl;
 
-import io.questdb.client.Completion;
 import io.questdb.client.QuestDB;
 import io.questdb.client.Query;
 import io.questdb.client.Sender;
-import io.questdb.client.cutlass.qwp.client.QwpColumnBatchHandler;
+import io.questdb.client.SenderConnectionListener;
+import io.questdb.client.SenderErrorHandler;
 import io.questdb.client.cutlass.qwp.client.QwpQueryClient;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener;
+import org.jetbrains.annotations.TestOnly;
 
 import java.util.function.Consumer;
 import java.util.function.IntFunction;
 
 /**
- * Implementation of {@link QuestDB}. Owns the elastic {@link SenderPool}
- * and {@link QueryClientPool}, a {@link PoolHousekeeper} that reaps idle
- * slots, and a {@link ThreadLocal} of {@link QueryImpl} instances so that
- * {@link #query()} is allocation-free after the first call on each thread.
+ * Implementation of {@link QuestDB}. Owns the elastic {@link SenderPool} and
+ * {@link QueryClientPool} and a {@link PoolHousekeeper} that reaps idle slots.
+ * {@link #borrowQuery()} leases a pooled {@link QueryWorker} and hands back a
+ * thin {@link QueryLease} over its reused {@link QueryImpl}; the heavy per-query
+ * state is pre-allocated on the worker and the per-submit path is
+ * allocation-free, so only the small lease handle is created per borrow (and is
+ * routinely scalar-replaced by the JIT in the try-with-resources case).
  */
 public final class QuestDBImpl implements QuestDB {
 
     private final PoolHousekeeper housekeeper;
     private final QueryClientPool queryPool;
-    private final ThreadLocal<QueryImpl> queryThreadLocal;
     private final SenderPool senderPool;
     private volatile boolean closed;
 
@@ -58,20 +62,26 @@ public QuestDBImpl(
             long acquireTimeoutMillis,
             long idleTimeoutMillis,
             long maxLifetimeMillis,
-            long housekeeperIntervalMillis
+            long housekeeperIntervalMillis,
+            long queryCloseTimeoutMillis,
+            SenderErrorHandler errorHandler,
+            SenderConnectionListener connectionListener,
+            BackgroundDrainerListener drainerListener
     ) {
         this(ingestConfig, queryConfig, senderMin, senderMax, queryMin, queryMax,
                 acquireTimeoutMillis, idleTimeoutMillis, maxLifetimeMillis,
-                housekeeperIntervalMillis, null, null);
+                housekeeperIntervalMillis, queryCloseTimeoutMillis, null, null,
+                errorHandler, connectionListener, drainerListener);
     }
 
-    // Package-private constructor exposing the senderFactory and connectHook test
-    // seams: production passes null for both (-> the real native build/connect
-    // paths). White-box tests in io.questdb.client.test.impl reach this by
-    // reflection (the main module is declared `open`) to make SenderPool prewarm
-    // an observable delegate while QueryClientPool construction throws an Error,
+    // Test-only constructor exposing the senderFactory and connectHook seams:
+    // production uses the public overload above, which passes null for both ->
+    // the real native build/connect paths. White-box error-safety tests in
+    // io.questdb.client.test.impl call this to make SenderPool prewarm an
+    // observable delegate while QueryClientPool construction throws an Error,
     // exercising the cleanup catch below.
-    QuestDBImpl(
+    @TestOnly
+    public QuestDBImpl(
             String ingestConfig,
             String queryConfig,
             int senderMin,
@@ -84,6 +94,35 @@ public QuestDBImpl(
             long housekeeperIntervalMillis,
             IntFunction<Sender> senderFactory,
             Consumer<QwpQueryClient> connectHook
+    ) {
+        this(ingestConfig, queryConfig, senderMin, senderMax, queryMin, queryMax,
+                acquireTimeoutMillis, idleTimeoutMillis, maxLifetimeMillis,
+                housekeeperIntervalMillis, QueryClientPool.DEFAULT_CLOSE_QUERY_TIMEOUT_MILLIS,
+                senderFactory, connectHook, null, null, null);
+    }
+
+    // Full constructor adding the ingest-side errorHandler/connectionListener/
+    // drainerListener, applied by SenderPool to every Sender it builds. The
+    // 12-arg overload above is the unchanged white-box test seam and delegates
+    // here with null callbacks; the public overload delegates here with null
+    // test seams.
+    QuestDBImpl(
+            String ingestConfig,
+            String queryConfig,
+            int senderMin,
+            int senderMax,
+            int queryMin,
+            int queryMax,
+            long acquireTimeoutMillis,
+            long idleTimeoutMillis,
+            long maxLifetimeMillis,
+            long housekeeperIntervalMillis,
+            long queryCloseTimeoutMillis,
+            IntFunction<Sender> senderFactory,
+            Consumer<QwpQueryClient> connectHook,
+            SenderErrorHandler errorHandler,
+            SenderConnectionListener connectionListener,
+            BackgroundDrainerListener drainerListener
     ) {
         SenderPool builtSenderPool = null;
         QueryClientPool builtQueryPool = null;
@@ -95,10 +134,12 @@ public QuestDBImpl(
                     // Defer SF startup recovery to the PoolHousekeeper thread so
                     // build() never blocks on a slow / reachable-but-not-acking
                     // server; the housekeeper drives it via runStartupRecoveryStep().
-                    true);
+                    true,
+                    errorHandler, connectionListener, drainerListener);
             builtQueryPool = new QueryClientPool(
                     queryConfig, queryMin, queryMax, acquireTimeoutMillis,
                     idleTimeoutMillis, maxLifetimeMillis, connectHook);
+            builtQueryPool.closeQueryTimeoutMillis(queryCloseTimeoutMillis);
             builtHousekeeper = new PoolHousekeeper(builtSenderPool, builtQueryPool, housekeeperIntervalMillis);
             builtHousekeeper.start();
         } catch (Throwable e) {
@@ -128,7 +169,11 @@ public QuestDBImpl(
         this.senderPool = builtSenderPool;
         this.queryPool = builtQueryPool;
         this.housekeeper = builtHousekeeper;
-        this.queryThreadLocal = ThreadLocal.withInitial(() -> new QueryImpl(queryPool));
+    }
+
+    @Override
+    public Query borrowQuery() {
+        return queryPool.acquire().lease();
     }
 
     @Override
@@ -182,30 +227,4 @@ private static void closeQuietly(AutoCloseable closeable) {
         }
     }
 
-    @Override
-    public Completion executeSql(CharSequence sql, QwpColumnBatchHandler handler) {
-        return query().sql(sql).handler(handler).submit();
-    }
-
-    @Override
-    public Query newQuery() {
-        return new QueryImpl(queryPool);
-    }
-
-    @Override
-    public Query query() {
-        QueryImpl q = queryThreadLocal.get();
-        q.resetIfDone();
-        return q;
-    }
-
-    @Override
-    public void releaseSender() {
-        senderPool.releaseCurrentThread();
-    }
-
-    @Override
-    public Sender sender() {
-        return senderPool.pinToCurrentThread();
-    }
 }
diff --git a/core/src/main/java/io/questdb/client/impl/SenderPool.java b/core/src/main/java/io/questdb/client/impl/SenderPool.java
index 8c9fda7a..2785f2eb 100644
--- a/core/src/main/java/io/questdb/client/impl/SenderPool.java
+++ b/core/src/main/java/io/questdb/client/impl/SenderPool.java
@@ -25,11 +25,15 @@
 package io.questdb.client.impl;
 
 import io.questdb.client.Sender;
+import io.questdb.client.SenderConnectionListener;
+import io.questdb.client.SenderErrorHandler;
 import io.questdb.client.cutlass.line.LineSenderException;
 import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener;
 import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner;
 import io.questdb.client.std.Files;
 import io.questdb.client.std.IntList;
+import org.jetbrains.annotations.TestOnly;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -93,9 +97,14 @@ public final class SenderPool implements AutoCloseable {
     // transport has no application-level connect timeout to clamp it.
     private static final long RECOVERY_DRAIN_BUDGET_MILLIS = 1_000;
     private final long acquireTimeoutMillis;
-    private final ArrayList<PooledSender> all;
-    private final ArrayDeque<PooledSender> available;
+    private final ArrayList<SenderSlot> all;
+    private final ArrayDeque<SenderSlot> available;
     private final String configurationString;
+    // User-supplied ingest callbacks, shared across every pooled Sender this
+    // pool builds. Null -> each sender keeps its loud-not-silent default.
+    private final SenderConnectionListener connectionListener;
+    private final BackgroundDrainerListener drainerListener;
+    private final SenderErrorHandler errorHandler;
     private final long idleTimeoutMillis;
     // Test seam. Production builds delegates via defaultSender(); white-box
     // tests in io.questdb.client.test.impl reach the package-private
@@ -132,7 +141,6 @@ public final class SenderPool implements AutoCloseable {
     private final Condition slotReleased;
     // True iff the configuration enables store-and-forward (sf_dir set).
     private final boolean storeAndForward;
-    private final ThreadLocal<PooledSender> threadAffine = new ThreadLocal<>();
     // Slots removed from `all` whose delegate is still releasing its flock.
     // They keep reserving capacity (and their slotInUse mark) until the
     // flock drops, so the cap check and the slot allocator stay consistent
@@ -189,16 +197,17 @@ public SenderPool(
             long maxLifetimeMillis
     ) {
         this(configurationString, minSize, maxSize, acquireTimeoutMillis,
-                idleTimeoutMillis, maxLifetimeMillis, null);
+                idleTimeoutMillis, maxLifetimeMillis, null, false, null, null, null);
     }
 
-    // Package-private constructor exposing the senderFactory test seam:
-    // production passes null (-> the real defaultSender()). White-box tests in
-    // io.questdb.client.test.impl reach this by reflection to inject a factory
-    // that throws a non-RuntimeException Throwable mid-prewarm. Recovery runs
-    // inline here (deferStartupRecovery=false); the pooled QuestDB handle uses
-    // the 8-arg overload to defer it to the housekeeper thread.
-    SenderPool(
+    // Test-only constructor exposing the senderFactory seam: production builds
+    // via the full constructor below (senderFactory null -> the real
+    // defaultSender()). White-box tests inject a factory that throws a
+    // non-RuntimeException Throwable mid-prewarm. Recovery runs inline here
+    // (deferStartupRecovery=false); the pooled QuestDB handle uses the 8-arg
+    // overload to defer it to the housekeeper thread.
+    @TestOnly
+    public SenderPool(
             String configurationString,
             int minSize,
             int maxSize,
@@ -211,14 +220,16 @@ public SenderPool(
                 idleTimeoutMillis, maxLifetimeMillis, senderFactory, false);
     }
 
-    // Full constructor. deferStartupRecovery=true skips the inline,
-    // construction-time SF recovery (recoverOneSlotStep) so
-    // QuestDB.build() never blocks on a slow or reachable-but-not-acking
-    // server; the owner (QuestDBImpl) then drives recovery one slot per tick on
-    // the PoolHousekeeper thread via runStartupRecoveryStep(). The in-range
-    // recovery pass is concurrency-safe against borrow()/return on that
+    // Test-only constructor adding the deferStartupRecovery toggle.
+    // deferStartupRecovery=true skips the inline, construction-time SF recovery
+    // (recoverOneSlotStep) so QuestDB.build() never blocks on a slow or
+    // reachable-but-not-acking server; the owner (QuestDBImpl) then drives
+    // recovery one slot per tick on the PoolHousekeeper thread via
+    // runStartupRecoveryStep(). White-box SF tests call this directly; the
+    // in-range recovery pass is concurrency-safe against borrow()/return on the
     // deferred path -- see recoverOneSlotStep().
-    SenderPool(
+    @TestOnly
+    public SenderPool(
             String configurationString,
             int minSize,
             int maxSize,
@@ -227,10 +238,36 @@ public SenderPool(
             long maxLifetimeMillis,
             IntFunction<Sender> senderFactory,
             boolean deferStartupRecovery
+    ) {
+        this(configurationString, minSize, maxSize, acquireTimeoutMillis,
+                idleTimeoutMillis, maxLifetimeMillis, senderFactory,
+                deferStartupRecovery, null, null, null);
+    }
+
+    // Full constructor adding the user-supplied ingest callbacks (error
+    // handler, connection listener and background-drainer listener), applied
+    // to every Sender the pool builds (see buildManagedSlotSender). The public
+    // 6-arg ctor and the test-only senderFactory overloads above both delegate
+    // here with null callbacks; the pooled QuestDB handle calls this directly.
+    SenderPool(
+            String configurationString,
+            int minSize,
+            int maxSize,
+            long acquireTimeoutMillis,
+            long idleTimeoutMillis,
+            long maxLifetimeMillis,
+            IntFunction<Sender> senderFactory,
+            boolean deferStartupRecovery,
+            SenderErrorHandler errorHandler,
+            SenderConnectionListener connectionListener,
+            BackgroundDrainerListener drainerListener
     ) {
         if (minSize < 0 || maxSize < 1 || minSize > maxSize) {
             throw new IllegalArgumentException("invalid pool sizing: min=" + minSize + ", max=" + maxSize);
         }
+        this.errorHandler = errorHandler;
+        this.connectionListener = connectionListener;
+        this.drainerListener = drainerListener;
         this.senderFactory = senderFactory != null ? senderFactory : this::defaultSender;
         // An injected factory (tests) drives recovery too, preserving the
         // white-box recovery seam; production recovery forces OFF-mode connects
@@ -262,7 +299,7 @@ public SenderPool(
                 if (storeAndForward) {
                     slotInUse[i] = true;
                 }
-                PooledSender ps = createUnlocked(storeAndForward ? i : -1);
+                SenderSlot ps = createUnlocked(storeAndForward ? i : -1);
                 all.add(ps);
                 available.add(ps);
                 built++;
@@ -571,7 +608,7 @@ private boolean drainCandidateSlotForRecovery(int slotIndex, String slotPath,
         // createRecoverer() takes the slot flock on <base>-slotIndex, and
         // delegate().close() can early-return with the I/O thread still running
         // (flock still held).
-        PooledSender recoverer = null;
+        SenderSlot recoverer = null;
         boolean stopScan = false;
         try {
             if (!OrphanScanner.isCandidateOrphan(slotPath)) {
@@ -597,7 +634,7 @@ private boolean drainCandidateSlotForRecovery(int slotIndex, String slotPath,
                 // on a timeout: a server that fails to ack within the budget
                 // will very likely do the same for every remaining slot -- the
                 // same reasoning as the build-failure case above.
-                if (!recoverer.drain(remainingMillis)) {
+                if (!recoverer.delegate().drain(remainingMillis)) {
                     LOG.warn("startup SF recovery: drain did not ack slot {} "
                             + "within {}ms; skipping remaining slots",
                             slotPath, remainingMillis);
@@ -636,9 +673,12 @@ public PooledSender borrow() {
                     throw new LineSenderException("QuestDB handle is closed");
                 }
                 if (!available.isEmpty()) {
-                    PooledSender s = available.pollFirst();
-                    s.markInUse();
-                    return s;
+                    SenderSlot s = available.pollFirst();
+                    // Stamp a fresh lease id under the lock so the PooledSender
+                    // wrapper handed out can be told apart from any prior,
+                    // now-stale borrow of the same slot.
+                    s.bumpGeneration();
+                    return new PooledSender(s, s.generation());
                 }
                 if (all.size() + inFlightCreations + closingSlots + leakedSlots + recoveringSlots < maxSize) {
                     inFlightCreations++;
@@ -647,7 +687,7 @@ public PooledSender borrow() {
                     // SF is off (no per-slot identity needed).
                     int slotIndex = storeAndForward ? allocateSlotIndex() : -1;
                     lock.unlock();
-                    PooledSender created;
+                    SenderSlot created;
                     try {
                         created = createUnlocked(slotIndex);
                     } catch (Throwable e) {
@@ -685,8 +725,8 @@ public PooledSender borrow() {
                         throw new LineSenderException("QuestDB handle is closed");
                     }
                     all.add(created);
-                    created.markInUse();
-                    return created;
+                    created.bumpGeneration();
+                    return new PooledSender(created, created.generation());
                 }
                 if (remainingNanos <= 0) {
                     throw new LineSenderException(
@@ -721,7 +761,7 @@ void markClosing() {
 
     @Override
     public void close() {
-        PooledSender[] snapshot;
+        SenderSlot[] snapshot;
         lock.lock();
         try {
             if (closeStarted) {
@@ -731,22 +771,13 @@ public void close() {
             // Raise the shutdown signal too (a direct, non-pooled caller may
             // close() without a prior markClosing()); harmless if already set.
             closed = true;
-            // Mark every pooled wrapper invalidated so pinToCurrentThread()
-            // on other threads -- which never takes this lock -- can detect
-            // that its cached entry no longer wraps a live delegate. Removing
-            // the calling thread's ThreadLocal only clears one slot; other
-            // threads' slots survive until they read the flag.
-            for (int i = 0; i < all.size(); i++) {
-                all.get(i).markInvalidated();
-            }
             // Snapshot under the lock so the delegate-close loop below is
             // immune to concurrent mutation of `all`. discardBroken running
             // on another thread can still bail thanks to the `closed` check
             // it now performs; the snapshot is belt-and-braces for any
             // future code path that mutates `all` outside this lock's
             // happens-before chain.
-            snapshot = all.toArray(new PooledSender[0]);
-            threadAffine.remove();
+            snapshot = all.toArray(new SenderSlot[0]);
             slotReleased.signalAll();
         } finally {
             lock.unlock();
@@ -763,27 +794,11 @@ public void close() {
         }
     }
 
-    /**
-     * Clears the current thread's pin if it currently references {@code s}.
-     * Invoked from {@link PooledSender#close()} before the wrapper is
-     * returned to the pool, so a subsequent {@link #pinToCurrentThread()}
-     * on this thread cannot hand the wrapper back after another consumer
-     * has borrowed the slot. No-op when the caller never pinned, or pinned
-     * a different wrapper.
-     */
-    void clearPinIfCurrent(PooledSender s) {
-        if (threadAffine.get() == s) {
-            threadAffine.remove();
-        }
-    }
-
     /**
      * Evicts a slot whose delegate has failed (typically a {@code flush()}
-     * failure observed in {@link PooledSender#close()}). The wrapper is
-     * marked invalidated so any thread-pinned reference gets rejected on the
-     * next {@link #pinToCurrentThread()} call; the slot is removed from
-     * {@code all} so the pool can grow back into a fresh slot on demand. The
-     * underlying delegate is closed outside the lock so a slow real-close
+     * failure observed in {@link PooledSender#close()}). The slot is removed
+     * from {@code all} so the pool can grow back into a fresh slot on demand.
+     * The underlying delegate is closed outside the lock so a slow real-close
      * does not stall other borrowers.
      * <p>
      * Bails when the pool is already closed: {@link #close()} owns the
@@ -792,14 +807,22 @@ void clearPinIfCurrent(PooledSender s) {
      * {@code ArrayList} and the {@code delegate.close()} below would be a
      * double-close on a delegate {@code close()} has already shut down.
      */
-    void discardBroken(PooledSender s) {
-        s.markInvalidated();
+    void discardBroken(PooledSender ps) {
+        SenderSlot s = ps.slot();
+        long gen = ps.generation();
         boolean reserved = false;
         lock.lock();
         try {
             if (closed) {
                 return;
             }
+            if (s.generation() != gen) {
+                // Stale discard: the slot was already returned/discarded and
+                // possibly re-borrowed. Dropping it avoids evicting a slot a
+                // different borrower now owns and double-closing its delegate.
+                return;
+            }
+            s.bumpGeneration();
             boolean removed = all.remove(s);
             // For an SF slot, keep its index reserved (move the reservation
             // from `all` to `closingSlots`) until the delegate below releases
@@ -844,15 +867,26 @@ void discardBroken(PooledSender s) {
         }
     }
 
-    public void giveBack(PooledSender s) {
-        long now = System.currentTimeMillis();
-        s.markIdleAt(now);
+    public void giveBack(PooledSender ps) {
+        SenderSlot s = ps.slot();
+        long gen = ps.generation();
         lock.lock();
         try {
             if (closed) {
                 // Pool already shut down: don't requeue; let close() finish destroying.
                 return;
             }
+            if (s.generation() != gen) {
+                // Stale return: this lease was already given back and the slot
+                // possibly re-borrowed (or this is a duplicate close). Dropping
+                // it keeps Sender.close() idempotent under a concurrent
+                // re-borrow -- without it a double close would enqueue the slot
+                // twice and hand it to two borrowers writing into one delegate.
+                return;
+            }
+            s.bumpGeneration();
+            s.markIdleAt(System.currentTimeMillis());
+            assert !available.contains(s) : "slot already present in available deque on giveBack";
             available.addLast(s);
             slotReleased.signal();
         } finally {
@@ -860,19 +894,6 @@ public void giveBack(PooledSender s) {
         }
     }
 
-    public PooledSender pinToCurrentThread() {
-        PooledSender pinned = threadAffine.get();
-        if (pinned != null && !pinned.isInvalidated()) {
-            return pinned;
-        }
-        if (pinned != null) {
-            threadAffine.remove();
-        }
-        PooledSender s = borrow();
-        threadAffine.set(s);
-        return s;
-    }
-
     /**
      * Closes idle slots that have exceeded {@code idleTimeoutMillis} or that
      * have aged past {@code maxLifetimeMillis}. Never shrinks below
@@ -883,15 +904,15 @@ public void reapIdle() {
             return;
         }
         long now = System.currentTimeMillis();
-        ArrayList<PooledSender> toClose = null;
+        ArrayList<SenderSlot> toClose = null;
         lock.lock();
         try {
             if (closed) {
                 return;
             }
-            Iterator<PooledSender> it = available.iterator();
+            Iterator<SenderSlot> it = available.iterator();
             while (it.hasNext() && all.size() > minSize) {
-                PooledSender s = it.next();
+                SenderSlot s = it.next();
                 boolean idleExpired = idleTimeoutMillis < Long.MAX_VALUE
                         && (now - s.idleSinceMillis()) >= idleTimeoutMillis;
                 boolean overAge = maxLifetimeMillis < Long.MAX_VALUE
@@ -933,7 +954,7 @@ public void reapIdle() {
                 lock.lock();
                 try {
                     for (int i = 0, n = toClose.size(); i < n; i++) {
-                        PooledSender s = toClose.get(i);
+                        SenderSlot s = toClose.get(i);
                         if (s.slotIndex() >= 0) {
                             reclaimSlot(s, " during idle reaping");
                         }
@@ -983,32 +1004,19 @@ public int leakedSlotCount() {
         }
     }
 
-    public void releaseCurrentThread() {
-        PooledSender pinned = threadAffine.get();
-        if (pinned == null) {
-            return;
-        }
-        threadAffine.remove();
-        if (pinned.isInvalidated()) {
-            // Pool was closed: delegate is already closed, skip flush/giveBack.
-            return;
-        }
-        pinned.close();
-    }
-
-    private PooledSender createUnlocked(int slotIndex) {
-        return new PooledSender(senderFactory.apply(slotIndex), this, slotIndex);
+    private SenderSlot createUnlocked(int slotIndex) {
+        return new SenderSlot(senderFactory.apply(slotIndex), this, slotIndex);
     }
 
     /**
-     * Builds a {@link PooledSender} for startup recovery of one stranded slot.
+     * Builds a {@link SenderSlot} for startup recovery of one stranded slot.
      * Routes through {@link #recoverySenderFactory}, which in production forces
      * a non-blocking initial connect ({@link #defaultRecoverySender}) so a
      * single recovery step stays bounded -- see that method and
      * {@link #drainCandidateSlotForRecovery}.
      */
-    private PooledSender createRecoverer(int slotIndex) {
-        return new PooledSender(recoverySenderFactory.apply(slotIndex), this, slotIndex);
+    private SenderSlot createRecoverer(int slotIndex) {
+        return new SenderSlot(recoverySenderFactory.apply(slotIndex), this, slotIndex);
     }
 
     private Sender defaultSender(int slotIndex) {
@@ -1035,9 +1043,24 @@ private Sender defaultRecoverySender(int slotIndex) {
         return buildManagedSlotSender(slotIndex, true);
     }
 
+    // Applies the user-supplied ingest callbacks to a sender builder. Null
+    // callbacks are skipped so the sender keeps its loud-not-silent default.
+    private Sender.LineSenderBuilder applyUserCallbacks(Sender.LineSenderBuilder builder) {
+        if (errorHandler != null) {
+            builder.errorHandler(errorHandler);
+        }
+        if (connectionListener != null) {
+            builder.connectionListener(connectionListener);
+        }
+        if (drainerListener != null) {
+            builder.drainerListener(drainerListener);
+        }
+        return builder;
+    }
+
     private Sender buildManagedSlotSender(int slotIndex, boolean forRecovery) {
         if (!storeAndForward) {
-            return Sender.fromConfig(configurationString);
+            return applyUserCallbacks(Sender.builder(configurationString)).build();
         }
         // Give this pooled sender its own slot dir <sf_dir>/<base>-<index>
         // so concurrent SF senders sharing one sf_dir never collide on
@@ -1091,7 +1114,9 @@ private Sender buildManagedSlotSender(int slotIndex, boolean forRecovery) {
             // returns).
             builder.drainOrphans(false);
         }
-        return builder.build();
+        // Recovery delegates are internal, short-lived, OFF-mode drain senders;
+        // don't surface their connect/error events to the user's callbacks.
+        return (forRecovery ? builder : applyUserCallbacks(builder)).build();
     }
 
     /**
@@ -1130,7 +1155,7 @@ private void freeSlotIndex(int idx) {
      * {@link QwpWebSocketSender#isSlotLockReleased()} -- false means close()
      * bailed early with the I/O thread still running and the flock still held.
      */
-    private static boolean flockReleased(PooledSender s) {
+    private static boolean flockReleased(SenderSlot s) {
         Sender d = s.delegate();
         return !(d instanceof QwpWebSocketSender) || ((QwpWebSocketSender) d).isSlotLockReleased();
     }
@@ -1153,7 +1178,7 @@ private static boolean flockReleased(PooledSender s) {
      *                path (e.g. {@code ""} or {@code " during idle reaping"})
      * @return {@code true} if the index was freed, {@code false} if retired
      */
-    private boolean reclaimSlot(PooledSender s, String context) {
+    private boolean reclaimSlot(SenderSlot s, String context) {
         closingSlots--;
         if (flockReleased(s)) {
             freeSlotIndex(s.slotIndex());
diff --git a/core/src/main/java/io/questdb/client/impl/SenderSlot.java b/core/src/main/java/io/questdb/client/impl/SenderSlot.java
new file mode 100644
index 00000000..19c93671
--- /dev/null
+++ b/core/src/main/java/io/questdb/client/impl/SenderSlot.java
@@ -0,0 +1,118 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.impl;
+
+import io.questdb.client.Sender;
+
+/**
+ * One reusable {@link SenderPool} slot: owns a real {@link Sender} delegate, its
+ * store-and-forward slot index, and the idle/age bookkeeping the pool needs.
+ * Pre-allocated once per slot and held in the pool's {@code all}/{@code
+ * available} collections across borrows; it is never handed to callers
+ * directly.
+ * <p>
+ * Each borrow wraps the slot in a fresh {@link PooledSender} stamped with the
+ * slot's current lease {@link #generation}. Because the slot is shared across
+ * borrows, a stale handle's {@code close()} or data write must not release, or
+ * write through, a slot a later borrower now owns. The generation -- mutated
+ * only under the pool lock when the slot is handed out and returned -- is what
+ * lets {@link #live(long)} and {@link SenderPool#giveBack}/{@link
+ * SenderPool#discardBroken} detect and drop such stale calls. This is the
+ * ingest-side mirror of the egress {@code QueryWorker} generation guard.
+ */
+final class SenderSlot {
+
+    private final long createdAtMillis;
+    private final Sender delegate;
+    private final SenderPool pool;
+    private final int slotIndex;
+    // Monotonic lease id. Mutated only under the SenderPool lock (bumped in
+    // borrow() when the slot is handed out and in giveBack()/discardBroken()
+    // when it is returned). A PooledSender wrapper captures it live for its
+    // borrow; once the slot is released or re-borrowed the captured id no
+    // longer matches. Volatile so a stale handle on another thread observes
+    // the latest value without taking the pool lock.
+    private volatile long generation;
+    private volatile long idleSinceMillis;
+
+    SenderSlot(Sender delegate, SenderPool pool, int slotIndex) {
+        this.delegate = delegate;
+        this.pool = pool;
+        this.slotIndex = slotIndex;
+        this.createdAtMillis = System.currentTimeMillis();
+        this.idleSinceMillis = this.createdAtMillis;
+    }
+
+    /**
+     * Advances the lease generation. Called by {@link SenderPool} under the
+     * pool lock when the slot is handed out (borrow) and when it is returned
+     * (giveBack/discardBroken).
+     */
+    void bumpGeneration() {
+        generation++;
+    }
+
+    long createdAtMillis() {
+        return createdAtMillis;
+    }
+
+    Sender delegate() {
+        return delegate;
+    }
+
+    long generation() {
+        return generation;
+    }
+
+    long idleSinceMillis() {
+        return idleSinceMillis;
+    }
+
+    /**
+     * Validates the borrowing lease's {@code gen} and returns the underlying
+     * delegate for a data-plane call. Throws if the lease is stale (the slot
+     * was returned to the pool or re-borrowed), so a stale handle cannot write
+     * into a slot a later borrower owns. Called by {@link PooledSender} on
+     * every operation.
+     */
+    Sender live(long gen) {
+        if (gen != generation) {
+            throw new IllegalStateException("sender handle is closed (returned to the pool)");
+        }
+        return delegate;
+    }
+
+    void markIdleAt(long nowMillis) {
+        idleSinceMillis = nowMillis;
+    }
+
+    SenderPool pool() {
+        return pool;
+    }
+
+    int slotIndex() {
+        return slotIndex;
+    }
+}
diff --git a/core/src/main/java/io/questdb/client/network/JavaTlsClientSocket.java b/core/src/main/java/io/questdb/client/network/JavaTlsClientSocket.java
index 4d363fbb..c1b1eec7 100644
--- a/core/src/main/java/io/questdb/client/network/JavaTlsClientSocket.java
+++ b/core/src/main/java/io/questdb/client/network/JavaTlsClientSocket.java
@@ -307,91 +307,13 @@ public int send(long bufferPtr, int bufferLen) {
     }
 
     @Override
-    public void startTlsSession(CharSequence peerName) throws TlsSessionInitFailedException {
+    public void startTlsSession(CharSequence peerName, SocketReadinessWaiter waiter) throws TlsSessionInitFailedException {
         assert state == STATE_PLAINTEXT;
         prepareInternalBuffers();
         try {
             this.sslEngine = createSslEngine(peerName);
             this.sslEngine.beginHandshake();
-            SSLEngineResult.HandshakeStatus handshakeStatus = sslEngine.getHandshakeStatus();
-            while (handshakeStatus != SSLEngineResult.HandshakeStatus.FINISHED) {
-                switch (handshakeStatus) {
-                    case NEED_TASK:
-                        Runnable task;
-                        while ((task = sslEngine.getDelegatedTask()) != null) {
-                            task.run();
-                        }
-                        handshakeStatus = sslEngine.getHandshakeStatus();
-                        break;
-                    case NEED_WRAP: {
-                        SSLEngineResult result = sslEngine.wrap(wrapInputBuffer, wrapOutputBuffer);
-                        handshakeStatus = result.getHandshakeStatus();
-                        switch (result.getStatus()) {
-                            case BUFFER_UNDERFLOW:
-                                // there cannot be underflow since wrap() during handshake does not read from the input buffer at all
-                                throw new AssertionError("Buffer underflow during TLS handshake. This should not happen. please report as a bug");
-                            case BUFFER_OVERFLOW:
-                                if (wrapOutputBuffer.position() != 0) {
-                                    // wrap() left bytes behind without producing a complete record. The OK
-                                    // branch is the only place that drains and clears, so a non-empty
-                                    // buffer here means we would re-enter NEED_WRAP with identical state
-                                    // and spin forever. Fail loudly instead.
-                                    throw new AssertionError("Buffer overflow during TLS handshake with non-empty output buffer. This should not happen, please report as a bug");
-                                }
-                                // in theory, this can happen if the output buffer is too small to fit a single TLS handshake record,
-                                // but that would indicate our starting buffer is too small.
-                                growWrapOutputBuffer();
-                                break;
-                            case OK:
-                                // wrapOutputBuffer: write mode
-                                int written = 0;
-                                int bufferLimit = wrapOutputBuffer.position();
-                                while (written < bufferLimit) {
-                                    int n = delegate.send(wrapOutputBufferPtr + written, bufferLimit - written);
-                                    if (n < 0) {
-                                        throw TlsSessionInitFailedException.instance("socket write error");
-                                    }
-                                    written += n;
-                                }
-                                wrapOutputBuffer.clear();
-                                break;
-                            case CLOSED:
-                                throw TlsSessionInitFailedException.instance("server closed connection unexpectedly");
-                        }
-                        break;
-                    }
-                    case NEED_UNWRAP: {
-                        int n = readFromSocket();
-                        if (n < 0) {
-                            throw TlsSessionInitFailedException.instance("socket read error");
-                        }
-                        SSLEngineResult result = sslEngine.unwrap(unwrapInputBuffer, unwrapOutputBuffer);
-                        handshakeStatus = result.getHandshakeStatus();
-                        switch (result.getStatus()) {
-                            case BUFFER_UNDERFLOW:
-                                // we need to receive more data from a socket, let's try again
-                                break;
-                            case BUFFER_OVERFLOW:
-                                if (unwrapOutputBuffer.position() != 0) {
-                                    // unwrap() produced plaintext but signalled overflow without consuming
-                                    // the next record. Nothing in the handshake loop drains this buffer,
-                                    // so re-entering NEED_UNWRAP would spin forever. Fail loudly.
-                                    throw new AssertionError("Buffer overflow during TLS handshake with non-empty output buffer. This should not happen, please report as a bug");
-                                }
-                                // in theory, this can happen if the output buffer is too small to fit a single TLS handshake record,
-                                // but that would indicate our starting buffer is too small.
-                                growUnwrapOutputBuffer();
-                                break;
-                            case OK:
-                                // good, let's see what we need to do next
-                                break;
-                            case CLOSED:
-                                throw TlsSessionInitFailedException.instance("server closed connection unexpectedly");
-                        }
-                    }
-                    break;
-                }
-            }
+            runHandshake(waiter);
             // unwrap input buffer: read mode and empty
             unwrapInputBuffer.position(0);
             unwrapInputBuffer.limit(0);
@@ -583,6 +505,113 @@ private int readFromSocket() {
         return n;
     }
 
+    /**
+     * Drives the TLS handshake state machine to completion. When the
+     * non-blocking socket would block, hands control to {@code waiter} (which
+     * parks on the event loop bounded by the connect deadline) instead of
+     * busy-spinning on read/write. Extracted from {@link #startTlsSession} so a
+     * stub {@code sslEngine} can exercise the wait paths in isolation.
+     */
+    private void runHandshake(SocketReadinessWaiter waiter) throws SSLException, TlsSessionInitFailedException {
+        SSLEngineResult.HandshakeStatus handshakeStatus = sslEngine.getHandshakeStatus();
+        // Exit on NOT_HANDSHAKING as well as FINISHED: getHandshakeStatus() (used by the NEED_TASK
+        // branch) never returns FINISHED per the JSSE contract -- it returns NOT_HANDSHAKING once the
+        // handshake completes. Without this, a delegated task that is the terminal step would leave the
+        // loop on NOT_HANDSHAKING, match no case, and busy-spin forever with no deadline escape.
+        while (handshakeStatus != SSLEngineResult.HandshakeStatus.FINISHED
+                && handshakeStatus != SSLEngineResult.HandshakeStatus.NOT_HANDSHAKING) {
+            switch (handshakeStatus) {
+                case NEED_TASK:
+                    Runnable task;
+                    while ((task = sslEngine.getDelegatedTask()) != null) {
+                        task.run();
+                    }
+                    handshakeStatus = sslEngine.getHandshakeStatus();
+                    break;
+                case NEED_WRAP: {
+                    SSLEngineResult result = sslEngine.wrap(wrapInputBuffer, wrapOutputBuffer);
+                    handshakeStatus = result.getHandshakeStatus();
+                    switch (result.getStatus()) {
+                        case BUFFER_UNDERFLOW:
+                            // there cannot be underflow since wrap() during handshake does not read from the input buffer at all
+                            throw new AssertionError("Buffer underflow during TLS handshake. This should not happen. please report as a bug");
+                        case BUFFER_OVERFLOW:
+                            if (wrapOutputBuffer.position() != 0) {
+                                // wrap() left bytes behind without producing a complete record. The OK
+                                // branch is the only place that drains and clears, so a non-empty
+                                // buffer here means we would re-enter NEED_WRAP with identical state
+                                // and spin forever. Fail loudly instead.
+                                throw new AssertionError("Buffer overflow during TLS handshake with non-empty output buffer. This should not happen, please report as a bug");
+                            }
+                            // in theory, this can happen if the output buffer is too small to fit a single TLS handshake record,
+                            // but that would indicate our starting buffer is too small.
+                            growWrapOutputBuffer();
+                            break;
+                        case OK:
+                            // wrapOutputBuffer: write mode
+                            int written = 0;
+                            int bufferLimit = wrapOutputBuffer.position();
+                            while (written < bufferLimit) {
+                                int n = delegate.send(wrapOutputBufferPtr + written, bufferLimit - written);
+                                if (n < 0) {
+                                    throw TlsSessionInitFailedException.instance("socket write error");
+                                }
+                                if (n == 0) {
+                                    // The non-blocking socket's send buffer is full. Wait for it to
+                                    // become writable -- bounded by the connect deadline -- instead of
+                                    // busy-spinning on send().
+                                    waiter.awaitReady(IOOperation.WRITE);
+                                }
+                                written += n;
+                            }
+                            wrapOutputBuffer.clear();
+                            break;
+                        case CLOSED:
+                            throw TlsSessionInitFailedException.instance("server closed connection unexpectedly");
+                    }
+                    break;
+                }
+                case NEED_UNWRAP: {
+                    int n = readFromSocket();
+                    if (n < 0) {
+                        throw TlsSessionInitFailedException.instance("socket read error");
+                    }
+                    SSLEngineResult result = sslEngine.unwrap(unwrapInputBuffer, unwrapOutputBuffer);
+                    handshakeStatus = result.getHandshakeStatus();
+                    switch (result.getStatus()) {
+                        case BUFFER_UNDERFLOW:
+                            // Not enough bytes for a complete TLS record yet. If the last read
+                            // drained the socket (n == 0, would-block on the non-blocking fd), wait
+                            // for it to become readable -- bounded by the connect deadline -- instead
+                            // of busy-spinning. A positive n means we read a partial record, so loop
+                            // immediately and read the rest.
+                            if (n == 0) {
+                                waiter.awaitReady(IOOperation.READ);
+                            }
+                            break;
+                        case BUFFER_OVERFLOW:
+                            if (unwrapOutputBuffer.position() != 0) {
+                                // unwrap() produced plaintext but signalled overflow without consuming
+                                // the next record. Nothing in the handshake loop drains this buffer,
+                                // so re-entering NEED_UNWRAP would spin forever. Fail loudly.
+                                throw new AssertionError("Buffer overflow during TLS handshake with non-empty output buffer. This should not happen, please report as a bug");
+                            }
+                            // in theory, this can happen if the output buffer is too small to fit a single TLS handshake record,
+                            // but that would indicate our starting buffer is too small.
+                            growUnwrapOutputBuffer();
+                            break;
+                        case OK:
+                            // good, let's see what we need to do next
+                            break;
+                        case CLOSED:
+                            throw TlsSessionInitFailedException.instance("server closed connection unexpectedly");
+                    }
+                }
+                break;
+            }
+        }
+    }
+
     private int writeToSocket(int bytesToSend) {
         // wrapOutputBuffer is in the write mode
         int n = delegate.send(wrapOutputBufferPtr, bytesToSend);
diff --git a/core/src/main/java/io/questdb/client/network/Net.java b/core/src/main/java/io/questdb/client/network/Net.java
index 040a2cb7..f649d330 100644
--- a/core/src/main/java/io/questdb/client/network/Net.java
+++ b/core/src/main/java/io/questdb/client/network/Net.java
@@ -36,6 +36,11 @@
 
 public final class Net {
 
+    // Sentinel returned by connectAddrInfoTimeout when the connect did not
+    // complete within the supplied budget. Distinct from -1 (generic error) and
+    // the disconnect codes so callers can flag a timeout without decoding errno.
+    @SuppressWarnings("unused")
+    public static final int CONNECT_TIMEOUT = -3;
     @SuppressWarnings("unused")
     public static final int EOTHERDISCONNECT = -2;
     @SuppressWarnings("unused")
@@ -88,6 +93,14 @@ public static void configureKeepAlive(int fd) {
 
     public static native int connectAddrInfo(int fd, long lpAddrInfo);
 
+    /**
+     * Non-blocking connect bounded by {@code timeoutMillis}. Returns 0 on
+     * success, {@link #CONNECT_TIMEOUT} on timeout, or -1 on failure (errno set,
+     * readable via {@link io.questdb.client.std.Os#errno()}). The socket is left
+     * non-blocking on success.
+     */
+    public static native int connectAddrInfoTimeout(int fd, long lpAddrInfo, int timeoutMillis);
+
     public static void freeAddrInfo(long pAddrInfo) {
         if (pAddrInfo != 0) {
             ADDR_INFO_COUNTER.decrementAndGet();
diff --git a/core/src/main/java/io/questdb/client/network/NetworkFacade.java b/core/src/main/java/io/questdb/client/network/NetworkFacade.java
index b2e97dad..d23824a5 100644
--- a/core/src/main/java/io/questdb/client/network/NetworkFacade.java
+++ b/core/src/main/java/io/questdb/client/network/NetworkFacade.java
@@ -27,6 +27,12 @@
 import org.slf4j.Logger;
 
 public interface NetworkFacade {
+    /**
+     * Return value of {@link #connectAddrInfoTimeout(int, long, int)} when the
+     * connect did not complete within the supplied budget.
+     */
+    int CONNECT_TIMEOUT = Net.CONNECT_TIMEOUT;
+
     int close(int fd);
 
     void close(int fd, Logger logger);
@@ -39,6 +45,13 @@ public interface NetworkFacade {
 
     int connectAddrInfo(int fd, long pAddrInfo);
 
+    /**
+     * Non-blocking connect bounded by {@code timeoutMillis}. Returns 0 on
+     * success, {@link #CONNECT_TIMEOUT} on timeout, or -1 on failure (with
+     * {@link #errno()} set). The socket is left non-blocking on success.
+     */
+    int connectAddrInfoTimeout(int fd, long pAddrInfo, int timeoutMillis);
+
     int errno();
 
     void freeAddrInfo(long pAddrInfo);
diff --git a/core/src/main/java/io/questdb/client/network/NetworkFacadeImpl.java b/core/src/main/java/io/questdb/client/network/NetworkFacadeImpl.java
index 11195fc2..64ea0dc7 100644
--- a/core/src/main/java/io/questdb/client/network/NetworkFacadeImpl.java
+++ b/core/src/main/java/io/questdb/client/network/NetworkFacadeImpl.java
@@ -62,6 +62,11 @@ public int connectAddrInfo(int fd, long pAddrInfo) {
         return Net.connectAddrInfo(fd, pAddrInfo);
     }
 
+    @Override
+    public int connectAddrInfoTimeout(int fd, long pAddrInfo, int timeoutMillis) {
+        return Net.connectAddrInfoTimeout(fd, pAddrInfo, timeoutMillis);
+    }
+
     @Override
     public int errno() {
         return Os.errno();
diff --git a/core/src/main/java/io/questdb/client/network/PlainSocket.java b/core/src/main/java/io/questdb/client/network/PlainSocket.java
index 06e8c23e..555affd2 100644
--- a/core/src/main/java/io/questdb/client/network/PlainSocket.java
+++ b/core/src/main/java/io/questdb/client/network/PlainSocket.java
@@ -71,7 +71,7 @@ public int send(long bufferPtr, int bufferLen) {
     }
 
     @Override
-    public void startTlsSession(CharSequence peerName) {
+    public void startTlsSession(CharSequence peerName, SocketReadinessWaiter waiter) {
         throw new UnsupportedOperationException();
     }
 
diff --git a/core/src/main/java/io/questdb/client/network/Socket.java b/core/src/main/java/io/questdb/client/network/Socket.java
index dec4db4e..0cdce517 100644
--- a/core/src/main/java/io/questdb/client/network/Socket.java
+++ b/core/src/main/java/io/questdb/client/network/Socket.java
@@ -84,9 +84,12 @@ public interface Socket extends QuietCloseable {
      * on server connections.
      *
      * @param peerName server name to use for SNI and certificate validation.
+     * @param waiter   blocks until the socket is ready for the next handshake
+     *                 read/write (bounded by the connect deadline), so the
+     *                 handshake does not busy-spin on the non-blocking socket.
      * @throws TlsSessionInitFailedException if the call fails.
      */
-    void startTlsSession(@Nullable CharSequence peerName) throws TlsSessionInitFailedException;
+    void startTlsSession(@Nullable CharSequence peerName, SocketReadinessWaiter waiter) throws TlsSessionInitFailedException;
 
     /**
      * @return true if the socket support TLS encryption; false otherwise.
diff --git a/core/src/main/java/io/questdb/client/network/SocketReadinessWaiter.java b/core/src/main/java/io/questdb/client/network/SocketReadinessWaiter.java
new file mode 100644
index 00000000..8543d3e6
--- /dev/null
+++ b/core/src/main/java/io/questdb/client/network/SocketReadinessWaiter.java
@@ -0,0 +1,46 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.network;
+
+/**
+ * Blocks until a non-blocking socket is ready for a given I/O operation, or
+ * throws a timeout-flagged exception once the caller's deadline passes.
+ * <p>
+ * Used to drive the TLS handshake off the client's event loop: instead of
+ * busy-spinning on a non-blocking socket that returns "would block", the
+ * handshake hands control to this waiter, which parks on epoll/kqueue/select
+ * with the remaining connect budget. This bounds the handshake by the same
+ * deadline as the TCP connect and keeps a stalled peer from pinning a CPU.
+ */
+@FunctionalInterface
+public interface SocketReadinessWaiter {
+    /**
+     * Blocks until the socket is ready for {@code ioOperation}, or throws a
+     * timeout-flagged exception when the connect deadline is exceeded.
+     *
+     * @param ioOperation {@link IOOperation#READ} or {@link IOOperation#WRITE}
+     */
+    void awaitReady(int ioOperation);
+}
diff --git a/core/src/main/java/io/questdb/client/std/MemoryTag.java b/core/src/main/java/io/questdb/client/std/MemoryTag.java
index 984f6fdb..643ceb58 100644
--- a/core/src/main/java/io/questdb/client/std/MemoryTag.java
+++ b/core/src/main/java/io/questdb/client/std/MemoryTag.java
@@ -38,4 +38,31 @@ public final class MemoryTag {
     public static final int NATIVE_TLS_RSS = NATIVE_TEXT_PARSER_RSS + 1;
     public static final int NATIVE_ND_ARRAY = NATIVE_TLS_RSS + 1;
     public static final int SIZE = NATIVE_ND_ARRAY + 1;
+
+    public static String nameOf(int tag) {
+        switch (tag) {
+            case MMAP_DEFAULT:
+                return "MMAP_DEFAULT";
+            case NATIVE_PATH:
+                return "NATIVE_PATH";
+            case NATIVE_DEFAULT:
+                return "NATIVE_DEFAULT";
+            case NATIVE_DIRECT_UTF8_SINK:
+                return "NATIVE_DIRECT_UTF8_SINK";
+            case NATIVE_HTTP_CONN:
+                return "NATIVE_HTTP_CONN";
+            case NATIVE_ILP_RSS:
+                return "NATIVE_ILP_RSS";
+            case NATIVE_IO_DISPATCHER_RSS:
+                return "NATIVE_IO_DISPATCHER_RSS";
+            case NATIVE_TEXT_PARSER_RSS:
+                return "NATIVE_TEXT_PARSER_RSS";
+            case NATIVE_TLS_RSS:
+                return "NATIVE_TLS_RSS";
+            case NATIVE_ND_ARRAY:
+                return "NATIVE_ND_ARRAY";
+            default:
+                return "unknown[" + tag + "]";
+        }
+    }
 }
\ No newline at end of file
diff --git a/core/src/main/resources/io/questdb/client/bin/darwin-aarch64/libquestdb.dylib b/core/src/main/resources/io/questdb/client/bin/darwin-aarch64/libquestdb.dylib
deleted file mode 100644
index 82d21e59..00000000
Binary files a/core/src/main/resources/io/questdb/client/bin/darwin-aarch64/libquestdb.dylib and /dev/null differ
diff --git a/core/src/main/resources/io/questdb/client/bin/darwin-x86-64/libquestdb.dylib b/core/src/main/resources/io/questdb/client/bin/darwin-x86-64/libquestdb.dylib
deleted file mode 100644
index 647a12cb..00000000
Binary files a/core/src/main/resources/io/questdb/client/bin/darwin-x86-64/libquestdb.dylib and /dev/null differ
diff --git a/core/src/main/resources/io/questdb/client/bin/linux-aarch64/libquestdb.so b/core/src/main/resources/io/questdb/client/bin/linux-aarch64/libquestdb.so
deleted file mode 100644
index 94ad41c1..00000000
Binary files a/core/src/main/resources/io/questdb/client/bin/linux-aarch64/libquestdb.so and /dev/null differ
diff --git a/core/src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so b/core/src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so
deleted file mode 100644
index 15c0135d..00000000
Binary files a/core/src/main/resources/io/questdb/client/bin/linux-x86-64/libquestdb.so and /dev/null differ
diff --git a/core/src/main/resources/io/questdb/client/bin/windows-x86-64/libquestdb.dll b/core/src/main/resources/io/questdb/client/bin/windows-x86-64/libquestdb.dll
deleted file mode 100755
index e95dcecd..00000000
Binary files a/core/src/main/resources/io/questdb/client/bin/windows-x86-64/libquestdb.dll and /dev/null differ
diff --git a/core/src/test/java/io/questdb/client/test/QuestDBBuilderTest.java b/core/src/test/java/io/questdb/client/test/QuestDBBuilderTest.java
index 1734360b..5b06513c 100644
--- a/core/src/test/java/io/questdb/client/test/QuestDBBuilderTest.java
+++ b/core/src/test/java/io/questdb/client/test/QuestDBBuilderTest.java
@@ -51,150 +51,50 @@ public void testBuilderCallAfterFromConfigOverridesPoolKeysFromString() {
         Assert.assertEquals(150L, b.poolConfigSnapshotForTest().get("acquire_timeout_ms"));
     }
 
-    @Test
-    public void testConflictingIntPoolKeyAcrossSidesRejected() {
-        // Both sides carry sender_pool_max (an int pool key) with different
-        // values -> build fails via resolvePoolInt's conflict check. The long
-        // pool keys are covered by testConflictingPoolKeysAcrossSidesRejected;
-        // this guards the separate int code path.
-        try (QuestDB ignored = QuestDB.builder()
-                .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;sender_pool_max=2;")
-                .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;sender_pool_max=5;")
-                .build()) {
-            Assert.fail("expected conflicting pool config");
-        } catch (IllegalArgumentException e) {
-            Assert.assertTrue(e.getMessage(), e.getMessage().contains("conflicting pool config: sender_pool_max"));
-        }
-    }
-
-    @Test
-    public void testConflictingPoolKeysAcrossSidesRejected() {
-        // Both sides carry acquire_timeout_ms with different values -> build fails.
-        try (QuestDB ignored = QuestDB.builder()
-                .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;acquire_timeout_ms=1000;")
-                .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;acquire_timeout_ms=2000;")
-                .build()) {
-            Assert.fail("expected conflicting pool config");
-        } catch (IllegalArgumentException e) {
-            Assert.assertTrue(e.getMessage(), e.getMessage().contains("conflicting pool config: acquire_timeout_ms"));
-        }
-    }
-
-    @Test
-    public void testConnectRejectsNonWsSchemaOnSingleString() {
-        // QuestDB.connect(single string) must enforce the ws/wss schema, just
-        // like the builder's fromConfig().
-        assertSchemaRejected(() -> QuestDB.connect("http::addr=h:9000;"));
-    }
-
-    @Test
-    public void testConnectRejectsNonWsSchemaOnTwoArg() {
-        // QuestDB.connect(ingest, query) rejects a non-ws schema on either side.
-        assertSchemaRejected(() -> QuestDB.connect("tcp::addr=h:9009;", "ws::addr=h:9000;"));
-        assertSchemaRejected(() -> QuestDB.connect("ws::addr=h:9000;", "udp::addr=h:9009;"));
-    }
-
     @Test
     public void testConnectSingleStringValidatesAndBuilds() {
-        // QuestDB.connect(single string) hands the same ws:: string to both the
-        // ingest and query sides. min=0 on both pools validates both clients
-        // without connecting, so build() returns a live handle.
+        // QuestDB.connect(single string) hands the same ws:: cluster string to
+        // both the ingest and query pools. min=0 on both pools validates both
+        // clients without connecting, so build() returns a live handle.
         try (QuestDB ignored = QuestDB.connect(
                 "ws::addr=127.0.0.1:1;sender_pool_min=0;query_pool_min=0;")) {
             Assert.assertNotNull(ignored);
         }
     }
 
-    @Test
-    public void testConnectStringWithPoolKeysAppliedToBuilder() {
-        // Pool keys supplied via separate ingest/query strings are accepted;
-        // min=0 so nothing connects.
-        try (QuestDB ignored = QuestDB.builder()
-                .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;sender_pool_max=1;")
-                .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;query_pool_max=1;")
-                .build()) {
-            Assert.assertNotNull(ignored);
-        }
-    }
-
-    @Test
-    public void testConnectTwoArgValidatesAndBuilds() {
-        // QuestDB.connect(ingest, query) sets the two sides independently;
-        // min=0 on each validates both clients without connecting.
-        try (QuestDB ignored = QuestDB.connect(
-                "ws::addr=127.0.0.1:1;sender_pool_min=0;",
-                "ws::addr=127.0.0.1:1;query_pool_min=0;")) {
-            Assert.assertNotNull(ignored);
-        }
-    }
-
-    @Test
-    public void testExplicitPoolKeyWinsOverConflictingStrings() {
-        // The two strings disagree on acquire_timeout_ms, but an explicit builder
-        // call sets it: explicit wins and the conflict check is skipped, whether
-        // the explicit call comes after or before the config strings. The resolved
-        // value is the explicit 500, not either string's value.
-        QuestDBBuilder after = QuestDB.builder()
-                .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;acquire_timeout_ms=1000;")
-                .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;acquire_timeout_ms=2000;")
-                .acquireTimeoutMillis(500);
-        try (QuestDB ignored = after.build()) {
-            Assert.assertNotNull(ignored);
-        }
-        Assert.assertEquals(500L, after.poolConfigSnapshotForTest().get("acquire_timeout_ms"));
-
-        QuestDBBuilder before = QuestDB.builder()
-                .acquireTimeoutMillis(500)
-                .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;acquire_timeout_ms=1000;")
-                .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;acquire_timeout_ms=2000;");
-        try (QuestDB ignored = before.build()) {
-            Assert.assertNotNull(ignored);
-        }
-        Assert.assertEquals(500L, before.poolConfigSnapshotForTest().get("acquire_timeout_ms"));
-    }
-
-    @Test
-    public void testHttpIngestConfigRejected() {
-        assertSchemaRejected(() -> QuestDB.builder().ingestConfig("http::addr=h:9000;"));
-    }
-
-    @Test
-    public void testHttpSingleConfigRejected() {
-        assertSchemaRejected(() -> QuestDB.builder().fromConfig("http::addr=h:9000;"));
-    }
-
     @Test
     public void testMalformedEgressConfigRejectedAtBuildWithMinZero() {
         // query_pool_min=0 pre-warms nothing, so build() never constructs a
-        // QwpQueryClient -- yet it must still reject a malformed query config up
-        // front via QwpQueryClient.validateConfig, mirroring the ingress side.
+        // QwpQueryClient -- yet it must still reject a malformed egress key in
+        // the single cluster config up front, mirroring the ingress side.
         // Covers a typed enum (compression) and a bounded int (compression_level).
-        assertEgressBuildRejected(
-                "ws::addr=127.0.0.1:1;compression=gzip;query_pool_min=0;query_pool_max=2;", "compression");
-        assertEgressBuildRejected(
-                "ws::addr=127.0.0.1:1;compression_level=99;query_pool_min=0;query_pool_max=2;", "compression_level");
+        assertBuildRejected(
+                "ws::addr=127.0.0.1:1;compression=gzip;sender_pool_min=0;query_pool_min=0;query_pool_max=2;",
+                "compression");
+        assertBuildRejected(
+                "ws::addr=127.0.0.1:1;compression_level=99;sender_pool_min=0;query_pool_min=0;query_pool_max=2;",
+                "compression_level");
     }
 
     @Test
     public void testMalformedIngressConfigRejectedAtBuildWithMinZero() {
         // sender_pool_min=0 pre-warms nothing, so build() never constructs a
-        // Sender -- yet it must still reject a malformed ingest config up front,
-        // matching the egress side. Covers a typed enum (tls_verify), a
+        // Sender -- yet it must still reject a malformed ingress key in the
+        // single cluster config up front. Covers a typed enum (tls_verify), a
         // registry-STRING value that only the real Sender parse validates
-        // (auto_flush_rows), and WebSocket build-time checks that only the full
-        // no-connect validation reaches: auto_flush=off and auto_flush_interval=off
-        // both disable auto-flush (unsupported on WebSocket), and sf_durability=flush
-        // is not yet supported.
-        assertIngressBuildRejected(
-                "wss::addr=127.0.0.1:1;tls_verify=strict;sender_pool_min=0;sender_pool_max=2;", "tls_verify");
-        assertIngressBuildRejected(
-                "ws::addr=127.0.0.1:1;auto_flush_rows=abc;sender_pool_min=0;sender_pool_max=2;", "auto_flush_rows");
-        assertIngressBuildRejected(
-                "ws::addr=127.0.0.1:1;auto_flush_interval=off;sender_pool_min=0;sender_pool_max=2;", "auto-flush");
-        assertIngressBuildRejected(
-                "ws::addr=127.0.0.1:1;auto_flush=off;sender_pool_min=0;sender_pool_max=2;", "auto-flush");
-        assertIngressBuildRejected(
-                "ws::addr=127.0.0.1:1;sf_durability=flush;sender_pool_min=0;sender_pool_max=2;", "not yet supported");
+        // (auto_flush_rows), and WebSocket build-time checks: auto_flush=off and
+        // auto_flush_interval=off both disable auto-flush (unsupported on
+        // WebSocket), and sf_durability=flush is not yet supported.
+        assertBuildRejected(
+                "wss::addr=127.0.0.1:1;tls_verify=strict;sender_pool_min=0;query_pool_min=0;", "tls_verify");
+        assertBuildRejected(
+                "ws::addr=127.0.0.1:1;auto_flush_rows=abc;sender_pool_min=0;query_pool_min=0;", "auto_flush_rows");
+        assertBuildRejected(
+                "ws::addr=127.0.0.1:1;auto_flush_interval=off;sender_pool_min=0;query_pool_min=0;", "auto-flush");
+        assertBuildRejected(
+                "ws::addr=127.0.0.1:1;auto_flush=off;sender_pool_min=0;query_pool_min=0;", "auto-flush");
+        assertBuildRejected(
+                "ws::addr=127.0.0.1:1;sf_durability=flush;sender_pool_min=0;query_pool_min=0;", "not yet supported");
     }
 
     @Test
@@ -212,22 +112,12 @@ public void testMalformedPoolValueRejectedAtBuild() {
     }
 
     @Test
-    public void testMissingIngestConfigThrows() {
-        try {
-            QuestDB.builder().queryConfig("ws::addr=h:9000;").build().close();
-            Assert.fail();
-        } catch (IllegalStateException e) {
-            Assert.assertTrue(e.getMessage().contains("ingest"));
-        }
-    }
-
-    @Test
-    public void testMissingQueryConfigThrows() {
+    public void testMissingConfigThrows() {
         try {
-            QuestDB.builder().ingestConfig("ws::addr=h:9000;").build().close();
+            QuestDB.builder().build().close();
             Assert.fail();
         } catch (IllegalStateException e) {
-            Assert.assertTrue(e.getMessage().contains("query"));
+            Assert.assertTrue(e.getMessage(), e.getMessage().contains("configuration"));
         }
     }
 
@@ -254,26 +144,37 @@ public void testNegativePoolSizesRejected() {
         }
     }
 
+    @Test
+    public void testNonWsSchemaRejected() {
+        // The single cluster config (and QuestDB.connect) must use ws/wss.
+        assertSchemaRejected(() -> QuestDB.builder().fromConfig("http::addr=h:9000;"));
+        assertSchemaRejected(() -> QuestDB.builder().fromConfig("tcp::addr=h:9009;"));
+        assertSchemaRejected(() -> QuestDB.builder().fromConfig("udp::addr=h:9009;"));
+        assertSchemaRejected(() -> QuestDB.connect("http::addr=h:9000;").close());
+    }
+
     @Test
     public void testQueryPoolBuildFailureUnwindsSenderPool() throws Exception {
-        // Sender pool builds against a healthy ws ingest endpoint; the query
-        // pool fails on a dead address. The handle must close the already-built
-        // sender pool (its connected senders) rather than leak them.
-        try (TestWebSocketServer ingest = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() {
+        // One server, one cluster config: the server accepts ingest write-path
+        // upgrades but rejects egress read-path upgrades, so the sender pool
+        // connects while the query pool's connect fails. The failed build() must
+        // close the already-built sender pool (its connected senders) rather than
+        // leak them.
+        try (TestWebSocketServer server = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() {
         })) {
-            ingest.start();
-            Assert.assertTrue(ingest.awaitStart(5, TimeUnit.SECONDS));
-            int port = ingest.getPort();
+            server.setRejectReadUpgrade(true);
+            server.start();
+            Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+            int port = server.getPort();
             try {
                 QuestDB.builder()
-                        .ingestConfig("ws::addr=localhost:" + port + ";")
-                        .queryConfig("ws::addr=127.0.0.1:1;auth_timeout_ms=200;")
+                        .fromConfig("ws::addr=localhost:" + port + ";auth_timeout_ms=200;")
                         .senderPoolSize(2)
                         .queryPoolSize(2)
                         .acquireTimeoutMillis(500)
                         .build()
                         .close();
-                Assert.fail("expected build to fail when query pool cannot connect");
+                Assert.fail("expected build to fail when the query pool cannot connect");
             } catch (RuntimeException expected) {
                 // The exact exception comes from QwpQueryClient.connect(). The
                 // build failing only proves the query pool gave up; the
@@ -284,75 +185,51 @@ public void testQueryPoolBuildFailureUnwindsSenderPool() throws Exception {
             // saw two ingest handshakes (proving the senders connected and the
             // assertion below is not vacuous)...
             awaitTrue("sender pool should have connected two ingest senders",
-                    () -> ingest.handshakeCount() >= 2);
+                    () -> server.handshakeCount() >= 2);
             // ...and the failed build() must have closed every one of them, so
             // no sender connection is left live on the server. The server
             // observes the client-side socket close asynchronously, so poll.
             awaitTrue("failed build() must close the already-built sender pool, leaving no live connection",
-                    () -> ingest.liveConnectionCount() == 0);
-        }
-    }
-
-    @Test
-    public void testSamePoolKeyValueAcrossSidesOk() {
-        // The same key at the same value on both sides builds cleanly.
-        try (QuestDB ignored = QuestDB.builder()
-                .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;query_pool_min=0;acquire_timeout_ms=1500;")
-                .queryConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;query_pool_min=0;acquire_timeout_ms=1500;")
-                .build()) {
-            Assert.assertNotNull(ignored);
+                    () -> server.liveConnectionCount() == 0);
         }
     }
 
     @Test
     public void testSharedVocabularyConnectsBothPoolsLive() throws Exception {
-        // The headline use case: one connect-string vocabulary carrying BOTH
+        // The headline use case: one cluster connect-string carrying BOTH
         // ingress-only keys (auto_flush_rows, sender_id) and egress-only keys
-        // (compression, max_batch_rows, target, failover) drives both LIVE
-        // clients through the facade -- each side applies the keys it owns and
-        // silently ignores the rest. Other tests cover this validate-only
-        // (min=0) or on a single side; this one pre-warms min=1 so both pools
-        // actually connect.
-        //
-        // The mock serves ingest (ACK) and query (SERVER_INFO) semantics on
-        // separate sockets, so ingest and query connect to separate servers. A
-        // single ws:: address serving both is exercised end-to-end against a
-        // real server in the parent repo.
-        try (TestWebSocketServer ingest = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() {
-        });
-             TestWebSocketServer query = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() {
-             })) {
-            ingest.start();
-            query.setSendServerInfo(true); // the egress client's connect() waits for SERVER_INFO
-            query.start();
-            Assert.assertTrue(ingest.awaitStart(5, TimeUnit.SECONDS));
-            Assert.assertTrue(query.awaitStart(5, TimeUnit.SECONDS));
-
-            // Identical vocabulary on both sides, differing only in addr -- the
-            // same mixed key set a single-string connect() would hand to both
-            // clients. The pool keys carry the same value on both sides, so the
-            // builder's cross-string conflict check passes.
-            String shared = "auto_flush_rows=100;sender_id=probe-1;"                          // ingress-only
-                    + "compression=auto;max_batch_rows=512;target=any;failover=off;"          // egress-only
-                    + "auth_timeout_ms=2000;"                                                 // COMMON
+        // (compression, max_batch_rows, target, failover) drives both LIVE pools
+        // -- each side applies the keys it owns and silently ignores the rest.
+        // One mock server serves both: an ACK stream on the ingest write path and
+        // a SERVER_INFO frame on the egress read path (the read path is gated so
+        // the ingest connection's ACK stream is never disturbed).
+        try (TestWebSocketServer server = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() {
+        })) {
+            server.setSendServerInfo(true); // the egress client's connect() waits for SERVER_INFO
+            server.start();
+            Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+
+            // A single cluster config carrying the mixed key set. The pools
+            // pre-warm min=1, so the shared vocabulary connects a live sender AND
+            // a live query client, not merely validates.
+            String cfg = "ws::addr=localhost:" + server.getPort() + ";"
+                    + "auto_flush_rows=100;sender_id=probe-1;"                          // ingress-only
+                    + "compression=auto;max_batch_rows=512;target=any;failover=off;"    // egress-only
+                    + "auth_timeout_ms=2000;"                                           // common
                     + "sender_pool_min=1;sender_pool_max=2;query_pool_min=1;query_pool_max=2;"; // pool
-            try (QuestDB db = QuestDB.builder()
-                    .ingestConfig("ws::addr=localhost:" + ingest.getPort() + ";" + shared)
-                    .queryConfig("ws::addr=localhost:" + query.getPort() + ";" + shared)
-                    .build()) {
-                // build() returned, so both pools pre-warmed their min=1 slot:
-                // the shared vocabulary connected a live sender AND a live query
-                // client, not merely validated.
+            try (QuestDB db = QuestDB.builder().fromConfig(cfg).build()) {
                 Assert.assertNotNull(db.borrowSender());
-                Assert.assertNotNull(db.query());
+                try (io.questdb.client.Query q = db.borrowQuery()) {
+                    Assert.assertNotNull(q);
+                }
             }
         }
     }
 
     @Test
     public void testSharedWsConfigWithPoolKeys() {
-        // A shared ws:: string carries pool keys; min=0 so build does only
-        // parse-only validation (no connect).
+        // A cluster ws:: string carries pool keys for both pools; min=0 so build
+        // does only parse-only validation (no connect).
         try (QuestDB ignored = QuestDB.builder()
                 .fromConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;sender_pool_max=3;"
                         + "query_pool_min=0;query_pool_max=2;acquire_timeout_ms=1234;")
@@ -361,41 +238,13 @@ public void testSharedWsConfigWithPoolKeys() {
         }
     }
 
-    @Test
-    public void testTcpIngestConfigRejected() {
-        assertSchemaRejected(() -> QuestDB.builder().ingestConfig("tcp::addr=h:9009;"));
-    }
-
-    @Test
-    public void testUdpIngestConfigRejected() {
-        assertSchemaRejected(() -> QuestDB.builder().queryConfig("udp::addr=h:9009;"));
-    }
-
-    private static void assertEgressBuildRejected(String query, String expectedFragment) {
-        try {
-            QuestDB.builder()
-                    .ingestConfig("ws::addr=127.0.0.1:1;sender_pool_min=0;sender_pool_max=2;")
-                    .queryConfig(query)
-                    .build()
-                    .close();
-            Assert.fail("expected build() to reject the malformed query config: " + query);
-        } catch (RuntimeException e) {
-            Assert.assertNotNull(e.getMessage());
-            Assert.assertTrue(e.getMessage(), e.getMessage().contains(expectedFragment));
-        }
-    }
-
-    private static void assertIngressBuildRejected(String ingest, String expectedFragment) {
+    private static void assertBuildRejected(String config, String expectedFragment) {
         try {
-            QuestDB.builder()
-                    .ingestConfig(ingest)
-                    .queryConfig("ws::addr=127.0.0.1:1;query_pool_min=0;query_pool_max=2;")
-                    .build()
-                    .close();
-            Assert.fail("expected build() to reject the malformed ingest config: " + ingest);
+            QuestDB.builder().fromConfig(config).build().close();
+            Assert.fail("expected build() to reject the malformed config: " + config);
         } catch (RuntimeException e) {
-            // Ingress value errors surface as LineSenderException; both it and the
-            // egress IllegalArgumentException are RuntimeException.
+            // Ingress value errors surface as LineSenderException; egress errors
+            // as IllegalArgumentException -- both are RuntimeException.
             Assert.assertNotNull(e.getMessage());
             Assert.assertTrue(e.getMessage(), e.getMessage().contains(expectedFragment));
         }
diff --git a/core/src/test/java/io/questdb/client/test/QuestDBFacadeCallbacksTest.java b/core/src/test/java/io/questdb/client/test/QuestDBFacadeCallbacksTest.java
new file mode 100644
index 00000000..3a8b96c1
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/QuestDBFacadeCallbacksTest.java
@@ -0,0 +1,138 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test;
+
+import io.questdb.client.QuestDB;
+import io.questdb.client.SenderConnectionEvent;
+import io.questdb.client.SenderConnectionListener;
+import io.questdb.client.SenderError;
+import io.questdb.client.SenderErrorHandler;
+import io.questdb.client.test.cutlass.qwp.client.TestPorts;
+import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer;
+import org.jetbrains.annotations.NotNull;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+
+/**
+ * Proves the ingest-side async callbacks exposed on the {@link QuestDB} facade
+ * ({@link io.questdb.client.QuestDBBuilder#errorHandler}/{@code connectionListener})
+ * actually reach the pooled {@link io.questdb.client.Sender}s -- not merely the
+ * lower-level {@code Sender.builder()}.
+ * <p>
+ * Each test eagerly prewarms one ingest sender ({@code sender_pool_min=1})
+ * pointed at a dead port in {@code initial_connect_retry=async} mode with a
+ * tight reconnect budget: the pool's I/O thread exhausts the budget in the
+ * background and surfaces the failure through whichever facade-wired callback is
+ * under test. No server is required.
+ */
+public class QuestDBFacadeCallbacksTest {
+
+    private static final TestWebSocketServer.WebSocketServerHandler NOOP_HANDLER =
+            new TestWebSocketServer.WebSocketServerHandler() {
+                @Override
+                public void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) {
+                }
+            };
+
+    @Test
+    public void testFacadeConnectionListenerReceivesEvents() throws Exception {
+        int port = TestPorts.findUnusedPort();
+        CountDownLatch sawEvent = new CountDownLatch(1);
+        SenderConnectionListener listener = new SenderConnectionListener() {
+            @Override
+            public void onEvent(@NotNull SenderConnectionEvent event) {
+                sawEvent.countDown();
+            }
+        };
+        try (QuestDB ignored = QuestDB.builder()
+                .fromConfig(config(port))
+                .connectionListener(listener)
+                .build()) {
+            Assert.assertTrue(
+                    "facade-wired connectionListener must observe at least one connection event",
+                    sawEvent.await(5, TimeUnit.SECONDS));
+        }
+    }
+
+    @Test
+    public void testFacadeErrorHandlerReceivesAsyncIngestError() throws Exception {
+        // A 401 server produces a genuine auth terminal that surfaces even in
+        // async mode; the facade-wired errorHandler must receive it. (Under
+        // Invariant B a mere connection error would retry forever and never
+        // surface -- only a genuine terminal like auth does.)
+        try (TestWebSocketServer server = new TestWebSocketServer(NOOP_HANDLER)) {
+            server.setRejectWithStatus(401, "Unauthorized");
+            server.start();
+            Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+            ErrorInbox inbox = new ErrorInbox();
+            try (QuestDB ignored = QuestDB.builder()
+                    .fromConfig(config(server.getPort()))
+                    .errorHandler(inbox)
+                    .build()) {
+                Assert.assertTrue(
+                        "facade-wired errorHandler must receive the async auth-terminal SenderError",
+                        inbox.await(5, TimeUnit.SECONDS));
+                Assert.assertNotNull("a SenderError must be delivered", inbox.get());
+            }
+        }
+    }
+
+    // One cluster config drives both pools. Eagerly prewarm one sender
+    // (sender_pool_min=1) so build() exercises the production
+    // buildManagedSlotSender path that applies the facade callbacks; async + a
+    // tight budget -> the I/O thread fails fast against the dead port.
+    // query_pool_min=0 -> the query pool never connects, so the test is isolated
+    // to the ingest callbacks.
+    private static String config(int port) {
+        return "ws::addr=localhost:" + port + ";sender_pool_min=1;sender_pool_max=1"
+                + ";query_pool_min=0;query_pool_max=1"
+                + ";initial_connect_retry=async;reconnect_max_duration_millis=400"
+                + ";reconnect_initial_backoff_millis=10;reconnect_max_backoff_millis=50"
+                + ";close_flush_timeout_millis=0;";
+    }
+
+    private static final class ErrorInbox implements SenderErrorHandler {
+        private final CountDownLatch latch = new CountDownLatch(1);
+        private final AtomicReference<SenderError> first = new AtomicReference<>();
+
+        boolean await(long timeout, TimeUnit unit) throws InterruptedException {
+            return latch.await(timeout, unit);
+        }
+
+        SenderError get() {
+            return first.get();
+        }
+
+        @Override
+        public void onError(@NotNull SenderError error) {
+            first.compareAndSet(null, error);
+            latch.countDown();
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/QuestDBFacadeDrainerListenerTest.java b/core/src/test/java/io/questdb/client/test/QuestDBFacadeDrainerListenerTest.java
new file mode 100644
index 00000000..9dbfbc89
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/QuestDBFacadeDrainerListenerTest.java
@@ -0,0 +1,465 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test;
+
+import io.questdb.client.QuestDB;
+import io.questdb.client.Sender;
+import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner;
+import io.questdb.client.std.Files;
+import io.questdb.client.std.MemoryTag;
+import io.questdb.client.std.Unsafe;
+import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer;
+import io.questdb.client.test.tools.TestUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.BooleanSupplier;
+
+/**
+ * Proves the {@link io.questdb.client.QuestDBBuilder#drainerListener} and
+ * {@link Sender.LineSenderBuilder#drainerListener} hooks actually reach the
+ * background orphan-slot drainers, end-to-end against a real
+ * {@link TestWebSocketServer} — and that the M10 stream split holds on the
+ * wire: a durable-ack capability gap (server upgrades but withholds
+ * {@code X-QWP-Durable-Ack}) lands on {@code onDurableAckUnavailable} while a
+ * transient all-replica failover window (421 + {@code X-QuestDB-Role:
+ * REPLICA}) lands on {@code onPrimaryUnavailable}, with the other stream
+ * staying silent.
+ * <p>
+ * Fixture shape: an orphan slot is seeded under {@code sf_dir} with unacked
+ * frames; the config enables {@code drain_orphans} and
+ * {@code request_durable_ack=on}. The server starts in the failure condition
+ * under test (durable-ack header suppressed, or role-rejecting), so the
+ * drainer deterministically observes it — no race against the drainer's first
+ * connect. Once the listener has recorded the scripted attempts, the server
+ * "settles" (header restored / reject cleared) and the drain must run to
+ * completion: no escalation, no {@code .failed} sentinel, slot emptied. The
+ * foreground sender uses {@code initial_connect_retry=async} so build() never
+ * blocks or fails on the same scripted condition.
+ */
+public class QuestDBFacadeDrainerListenerTest {
+
+    private static final int SEEDED_FRAMES = 5;
+    private static final long SEGMENT_SIZE_BYTES = 16384L;
+
+    private String sfDir;
+
+    @Before
+    public void setUp() {
+        sfDir = Paths.get(System.getProperty("java.io.tmpdir"),
+                "qdb-facade-drainer-listener-" + System.nanoTime()).toString();
+        Assert.assertEquals("mkdir sf_dir", 0, Files.mkdir(sfDir, Files.DIR_MODE_DEFAULT));
+    }
+
+    @After
+    public void tearDown() {
+        if (sfDir != null) rmDirRec(sfDir);
+    }
+
+    /**
+     * Facade plumbing E2E: the {@code QuestDB.builder().drainerListener(...)}
+     * hook must observe the pooled senders' drainer events. The server
+     * completes the WS upgrade WITHOUT advertising durable ack for the first
+     * attempts (capability gap), then advertises it; the listener must see
+     * {@code onDurableAckUnavailable} with attempts {@code 1..N} (one
+     * uninterrupted episode) and the drain must then succeed.
+     */
+    @Test
+    public void testFacadeDrainerListenerObservesCapabilityGapThenDrainSucceeds() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            seedOrphanSlot("ghost");
+            DurableAckAllHandler handler = new DurableAckAllHandler();
+            try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) {
+                // Deterministic capability gap: withheld BEFORE the first
+                // drainer connect, restored only after the listener has
+                // recorded the gap episode.
+                server.setSuppressDurableAckHeader(true);
+                server.start();
+                Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+
+                RecordingDrainerListener listener = new RecordingDrainerListener();
+                try (QuestDB ignored = QuestDB.builder()
+                        .fromConfig(facadeConfig(server.getPort()))
+                        .drainerListener(listener)
+                        .build()) {
+                    awaitTrue(10_000, () -> listener.daAttempts.size() >= 3,
+                            "facade-wired drainer listener must observe the capability-gap "
+                                    + "retries via onDurableAckUnavailable");
+                    // Cluster "settles": the next sweep connects and drains.
+                    server.setSuppressDurableAckHeader(false);
+                    awaitDrainedSlot("ghost");
+                }
+                assertSingleGapEpisodeThenSilence(listener);
+            }
+        });
+    }
+
+    /**
+     * Role-reject discrimination E2E: with every handshake answered by 421 +
+     * {@code X-QuestDB-Role: REPLICA} (transient all-replica failover
+     * window), the facade-wired listener must receive
+     * {@code onPrimaryUnavailable} — and {@code onDurableAckUnavailable} must
+     * stay SILENT for the whole window (the released 1.3.4 contract fed both
+     * conditions to the DA callback; this pins the M10 split on the wire).
+     */
+    @Test
+    public void testFacadeDrainerListenerDiscriminatesRoleRejectWindow() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            seedOrphanSlot("ghost");
+            DurableAckAllHandler handler = new DurableAckAllHandler();
+            try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) {
+                // Deterministic all-replica window: rejecting BEFORE the first
+                // drainer connect; the durable-ack header is never withheld,
+                // so no capability gap can ever fire in this test.
+                server.setRejectWithRole("REPLICA");
+                server.start();
+                Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+
+                RecordingDrainerListener listener = new RecordingDrainerListener();
+                try (QuestDB ignored = QuestDB.builder()
+                        .fromConfig(facadeConfig(server.getPort()))
+                        .drainerListener(listener)
+                        .build()) {
+                    awaitTrue(10_000, () -> listener.primaryAttempts.size() >= 3,
+                            "facade-wired drainer listener must observe the all-replica "
+                                    + "window via onPrimaryUnavailable");
+                    Assert.assertEquals("onDurableAckUnavailable must stay SILENT during a "
+                                    + "role-reject window — that is the whole point of the M10 split",
+                            0, listener.daAttempts.size());
+                    // Primary reappears: the next sweep connects and drains.
+                    server.setRejectWithRole(null);
+                    awaitDrainedSlot("ghost");
+                }
+                // Post-close exact assertions on the complete stream.
+                List<Integer> primary = listener.primaryAttemptsSnapshot();
+                Assert.assertTrue("expected at least the awaited role-reject attempts, got "
+                        + primary, primary.size() >= 3);
+                for (int i = 0; i < primary.size(); i++) {
+                    Assert.assertEquals("primary stream must be the uninterrupted 1-based "
+                                    + "role-reject count, got " + primary,
+                            Integer.valueOf(i + 1), primary.get(i));
+                }
+                Assert.assertEquals("no capability gap ever existed: the DA stream must be "
+                                + "empty end-to-end", 0, listener.daAttempts.size());
+                Assert.assertEquals("a role-reject window must NEVER escalate (Invariant B)",
+                        0, listener.persistentFailures.get());
+                Assert.assertFalse("no .failed sentinel for a transient window",
+                        Files.exists(sfDir + "/ghost/" + OrphanScanner.FAILED_SENTINEL_NAME));
+            }
+        });
+    }
+
+    /**
+     * Same capability-gap scenario as the facade test, one level down through
+     * {@code Sender.builder().drainerListener(...)} — pins the plumbing that
+     * the pool path composes (builder field → {@code setDrainerListener} →
+     * drainer pool → drainer), and awaits the drain outcome via the sender's
+     * public drainer counters.
+     */
+    @Test
+    public void testSenderBuilderDrainerListenerObservesCapabilityGapThenDrainSucceeds() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            seedOrphanSlot("ghost");
+            DurableAckAllHandler handler = new DurableAckAllHandler();
+            try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) {
+                server.setSuppressDurableAckHeader(true);
+                server.start();
+                Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+
+                String cfg = "ws::addr=localhost:" + server.getPort()
+                        + ";sf_dir=" + sfDir
+                        + ";sender_id=primary"
+                        + ";request_durable_ack=on"
+                        + ";drain_orphans=true"
+                        + ";max_background_drainers=1"
+                        + ";initial_connect_retry=async"
+                        + ";reconnect_initial_backoff_millis=25"
+                        + ";reconnect_max_backoff_millis=200"
+                        + ";close_flush_timeout_millis=0;";
+                RecordingDrainerListener listener = new RecordingDrainerListener();
+                Sender sender = Sender.builder(cfg)
+                        .drainerListener(listener)
+                        .build();
+                try {
+                    QwpWebSocketSender ws = (QwpWebSocketSender) sender;
+                    awaitTrue(10_000, () -> listener.daAttempts.size() >= 3,
+                            "builder-wired drainer listener must observe the capability-gap "
+                                    + "retries via onDurableAckUnavailable");
+                    server.setSuppressDurableAckHeader(false);
+                    awaitTrue(15_000, () -> ws.getTotalBackgroundDrainersSucceeded() >= 1,
+                            "drainer must drain the slot fully once the gap clears");
+                } finally {
+                    // The FOREGROUND sender's async initial connect hit the
+                    // scripted capability gap and latched a terminal HALT
+                    // before the server settled (durable ack is loud-fail for
+                    // a foreground producer). close() completes its full
+                    // teardown and then rethrows that latched terminal --
+                    // expected here, and orthogonal to the drainer stream
+                    // this test pins. The pool facade swallows the same
+                    // rethrow in SenderPool.close(), which is why the facade
+                    // tests use plain try-with-resources.
+                    try {
+                        sender.close();
+                        Assert.fail("close() must loudly rethrow the foreground's "
+                                + "latched capability-gap terminal");
+                    } catch (io.questdb.client.cutlass.line.LineSenderException expected) {
+                        Assert.assertTrue("expected the foreground durable-ack terminal, got: "
+                                        + expected.getMessage(),
+                                expected.getMessage().contains("durable-ack"));
+                    }
+                }
+                assertSingleGapEpisodeThenSilence(listener);
+            }
+        });
+    }
+
+    // One cluster config drives the facade. sender_pool_min=1 eagerly prewarms
+    // the one sender whose build() dispatches the orphan drainer;
+    // query_pool_min=0 keeps the read pool out of the picture. async initial
+    // connect: the foreground sender must not block or fail build() on the
+    // very condition the drainer is scripted to observe. Small drainer
+    // backoffs make the awaited attempts prompt while leaving plenty of
+    // headroom under the 16-attempt capability-gap settle budget between
+    // "third callback recorded" and "header restored".
+    private String facadeConfig(int port) {
+        return "ws::addr=localhost:" + port
+                + ";sf_dir=" + sfDir
+                + ";sender_id=pool"
+                + ";request_durable_ack=on"
+                + ";drain_orphans=true"
+                + ";max_background_drainers=1"
+                + ";sender_pool_min=1;sender_pool_max=1"
+                + ";query_pool_min=0;query_pool_max=1"
+                + ";initial_connect_retry=async"
+                + ";reconnect_initial_backoff_millis=25"
+                + ";reconnect_max_backoff_millis=200"
+                + ";close_flush_timeout_millis=0;";
+    }
+
+    // The two capability-gap tests end the same way: one uninterrupted gap
+    // episode numbered 1..K (no role reject ever intervened, so no reset and
+    // no primary-stream traffic), then the drain succeeded without escalation.
+    private void assertSingleGapEpisodeThenSilence(RecordingDrainerListener listener) {
+        List<Integer> da = listener.daAttemptsSnapshot();
+        Assert.assertTrue("expected at least the awaited gap attempts, got " + da,
+                da.size() >= 3);
+        for (int i = 0; i < da.size(); i++) {
+            Assert.assertEquals("DA stream must be the 1-based attempt count of a single "
+                            + "uninterrupted capability-gap episode, got " + da,
+                    Integer.valueOf(i + 1), da.get(i));
+        }
+        Assert.assertEquals("expected slot path on every DA delivery",
+                Collections.nCopies(da.size(), sfDir + "/ghost"), listener.daSlotPaths);
+        Assert.assertEquals("no role reject was scripted: the primary stream must be empty",
+                0, listener.primaryAttempts.size());
+        Assert.assertEquals("the gap cleared inside the settle budget: no escalation",
+                0, listener.persistentFailures.get());
+        Assert.assertFalse("no .failed sentinel after a successful drain",
+                Files.exists(sfDir + "/ghost/" + OrphanScanner.FAILED_SENTINEL_NAME));
+    }
+
+    // The drainer unlinks the slot's segment files once fully drained, so the
+    // slot stops being a candidate orphan. Probed per-slot (not via a
+    // whole-dir scan) because the foreground sender's own LIVE slot holds a
+    // pre-created segment file for as long as the sender is up, so a
+    // dir-level scan never reaches zero. A .failed sentinel would ALSO make
+    // the slot a non-candidate, so the sentinel is asserted absent explicitly.
+    private void awaitDrainedSlot(String slotName) throws InterruptedException {
+        String slotPath = sfDir + "/" + slotName;
+        awaitTrue(15_000, () -> !OrphanScanner.isCandidateOrphan(slotPath),
+                "drainer must empty the seeded orphan slot once the server settles");
+        Assert.assertFalse("slot must drain cleanly, not quarantine",
+                Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+    }
+
+    private static void awaitTrue(long timeoutMillis, BooleanSupplier condition, String message)
+            throws InterruptedException {
+        long deadline = System.currentTimeMillis() + timeoutMillis;
+        while (System.currentTimeMillis() < deadline) {
+            if (condition.getAsBoolean()) {
+                return;
+            }
+            Thread.sleep(10);
+        }
+        Assert.assertTrue(message + " (timed out after " + timeoutMillis + "ms)",
+                condition.getAsBoolean());
+    }
+
+    // Seeds <sfDir>/<slotName> with unacked frames — the on-disk shape a
+    // crashed sender leaves behind (same recipe as
+    // BackgroundDrainerMidDrainCapabilityGapTest). The engine creates the
+    // slot dir itself; closing it with unacked data leaves the .sfa segments
+    // in place, so the slot is a candidate orphan.
+    private void seedOrphanSlot(String slotName) {
+        String slotPath = sfDir + "/" + slotName;
+        try (CursorSendEngine engine = new CursorSendEngine(slotPath, SEGMENT_SIZE_BYTES)) {
+            long buf = Unsafe.malloc(16, MemoryTag.NATIVE_DEFAULT);
+            try {
+                byte[] payload = "frame-bytes-padd".getBytes(StandardCharsets.US_ASCII);
+                for (int i = 0; i < payload.length; i++) {
+                    Unsafe.getUnsafe().putByte(buf + i, payload[i]);
+                }
+                for (int i = 0; i < SEEDED_FRAMES; i++) {
+                    engine.appendBlocking(buf, 16);
+                }
+            } finally {
+                Unsafe.free(buf, 16, MemoryTag.NATIVE_DEFAULT);
+            }
+        }
+        Assert.assertEquals("seeded slot must be a candidate orphan",
+                1, OrphanScanner.scan(sfDir, "observer").size());
+    }
+
+    private static void rmDirRec(String dir) {
+        if (!Files.exists(dir)) return;
+        long find = Files.findFirst(dir);
+        if (find > 0) {
+            try {
+                int rc = 1;
+                while (rc > 0) {
+                    String name = Files.utf8ToString(Files.findName(find));
+                    if (name != null && !".".equals(name) && !"..".equals(name)) {
+                        String child = dir + "/" + name;
+                        if (!Files.remove(child)) rmDirRec(child);
+                    }
+                    rc = Files.findNext(find);
+                }
+            } finally {
+                Files.findClose(find);
+            }
+        }
+        Files.remove(dir);
+    }
+
+    /**
+     * Thread-safe recording listener. Snapshot accessors copy under the same
+     * monitor the callbacks append under, so end-of-test assertions never
+     * observe a list mid-append.
+     */
+    private static final class RecordingDrainerListener implements BackgroundDrainerListener {
+        final List<Integer> daAttempts = Collections.synchronizedList(new ArrayList<>());
+        final List<String> daSlotPaths = Collections.synchronizedList(new ArrayList<>());
+        final AtomicInteger persistentFailures = new AtomicInteger();
+        final List<Integer> primaryAttempts = Collections.synchronizedList(new ArrayList<>());
+
+        @Override
+        public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) {
+            persistentFailures.incrementAndGet();
+        }
+
+        @Override
+        public void onDurableAckUnavailable(String slotPath, int attemptNumber) {
+            daSlotPaths.add(slotPath);
+            daAttempts.add(attemptNumber);
+        }
+
+        @Override
+        public void onPrimaryUnavailable(String slotPath, int attemptNumber) {
+            primaryAttempts.add(attemptNumber);
+        }
+
+        List<Integer> daAttemptsSnapshot() {
+            synchronized (daAttempts) {
+                return new ArrayList<>(daAttempts);
+            }
+        }
+
+        List<Integer> primaryAttemptsSnapshot() {
+            synchronized (primaryAttempts) {
+                return new ArrayList<>(primaryAttempts);
+            }
+        }
+    }
+
+    /**
+     * Acks every inbound frame with STATUS_OK + STATUS_DURABLE_ACK on a
+     * per-connection wire sequence, so a durable-ack-mode drain runs to
+     * completion on whichever connection finally gets through (same ack
+     * shape as BackgroundDrainerMidDrainCapabilityGapTest's handler, without
+     * the scripted drop). State is keyed per ClientHandler identity; acks are
+     * best-effort because a connection may be racing its own close.
+     */
+    private static final class DurableAckAllHandler implements TestWebSocketServer.WebSocketServerHandler {
+        private static final String TABLE = "trades";
+        private final java.util.Map<TestWebSocketServer.ClientHandler, long[]> wireSeqByConn =
+                new java.util.IdentityHashMap<>();
+
+        @Override
+        public synchronized void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) {
+            long[] counter = wireSeqByConn.get(client);
+            if (counter == null) {
+                counter = new long[1];
+                wireSeqByConn.put(client, counter);
+            }
+            long seq = counter[0]++;
+            try {
+                client.sendBinary(okFrame(seq, seq));
+                client.sendBinary(durableAckFrame(seq));
+            } catch (IOException ignored) {
+                // best-effort: the drainer replays on its next connection
+            }
+        }
+
+        private static byte[] durableAckFrame(long seqTxn) {
+            byte[] name = TABLE.getBytes(StandardCharsets.UTF_8);
+            ByteBuffer bb = ByteBuffer.allocate(1 + 2 + 2 + name.length + 8)
+                    .order(ByteOrder.LITTLE_ENDIAN);
+            bb.put((byte) 0x02); // STATUS_DURABLE_ACK
+            bb.putShort((short) 1); // tableCount
+            bb.putShort((short) name.length);
+            bb.put(name);
+            bb.putLong(seqTxn);
+            return bb.array();
+        }
+
+        private static byte[] okFrame(long wireSeq, long seqTxn) {
+            byte[] name = TABLE.getBytes(StandardCharsets.UTF_8);
+            ByteBuffer bb = ByteBuffer.allocate(1 + 8 + 2 + 2 + name.length + 8)
+                    .order(ByteOrder.LITTLE_ENDIAN);
+            bb.put((byte) 0x00); // STATUS_OK
+            bb.putLong(wireSeq);
+            bb.putShort((short) 1); // tableCount
+            bb.putShort((short) name.length);
+            bb.put(name);
+            bb.putLong(seqTxn);
+            return bb.array();
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/QuestDBLazyConnectTest.java b/core/src/test/java/io/questdb/client/test/QuestDBLazyConnectTest.java
new file mode 100644
index 00000000..47dd5fa8
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/QuestDBLazyConnectTest.java
@@ -0,0 +1,150 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test;
+
+import io.questdb.client.QuestDB;
+import io.questdb.client.QuestDBBuilder;
+import io.questdb.client.Sender;
+import io.questdb.client.test.cutlass.qwp.client.TestPorts;
+import org.junit.Assert;
+import org.junit.Test;
+
+/**
+ * {@code lazy_connect=true} makes a {@link QuestDB} facade tolerate the server
+ * being down at startup <em>without</em> disabling reads: the ingest side
+ * connects asynchronously (writes buffer until the wire is up) and the read pool
+ * connects lazily on first use. Reads stay enabled and connect once the server
+ * is up (the recovery lifecycle is covered end-to-end by
+ * {@link QuestDBServerRecoveryTest}).
+ * <p>
+ * Because both sides must start non-blocking, a knob that forces a blocking /
+ * fail-fast startup ({@code initial_connect_retry} other than {@code async}, or
+ * an explicit {@code query_pool_min > 0}) is a configuration conflict and is
+ * rejected up front with a clear remedy.
+ */
+public class QuestDBLazyConnectTest {
+
+    @Test(timeout = 30_000)
+    public void testLazyConnectStartsAndWritesWhileServerDown() {
+        int port = TestPorts.findUnusedPort();
+        // No server at `port`, sender_pool_min defaults to 1, and the only
+        // resilience knob is lazy_connect=true. (a) build() must return promptly
+        // -- the read pool defaults to min=0 and the ingest side goes async, so
+        // neither side fail-fasts -- and (b) a write must buffer without throwing.
+        try (QuestDB db = QuestDB.connect("ws::addr=localhost:" + port
+                + ";lazy_connect=true;reconnect_max_duration_millis=200"
+                + ";reconnect_initial_backoff_millis=10;reconnect_max_backoff_millis=50"
+                + ";close_flush_timeout_millis=0;")) {
+            Sender sender = db.borrowSender();
+            Assert.assertNotNull("a sender must be available with no server present", sender);
+            sender.table("t").longColumn("v", 1L).atNow();
+        }
+    }
+
+    @Test(timeout = 30_000)
+    public void testLazyConnectKeepsReadsEnabledWhileServerDown() {
+        int port = TestPorts.findUnusedPort();
+        // Reads are ENABLED, just deferred: under lazy_connect the read pool
+        // defaults to min=0, so build() does not eagerly connect or fail-fast
+        // while the server is down. The read client connects lazily on the
+        // first borrowQuery() once the server is up (covered end-to-end by
+        // QuestDBServerRecoveryTest). This is the whole point of lazy_connect
+        // over the old write-only mode, which disabled reads outright.
+        try (QuestDB db = QuestDB.connect("ws::addr=localhost:" + port
+                + ";lazy_connect=true;close_flush_timeout_millis=0;")) {
+            Assert.assertNotNull("the handle must build read-enabled while the server is down", db);
+        }
+    }
+
+    @Test
+    public void testLazyConnectAcceptsOnAndAllowsExplicitAsync() {
+        int port = TestPorts.findUnusedPort();
+        // lazy_connect accepts on/off as well as true/false, and an explicit
+        // initial_connect_retry=async is consistent with it (no conflict).
+        try (QuestDB db = QuestDB.connect("ws::addr=localhost:" + port
+                + ";lazy_connect=on;initial_connect_retry=async;query_pool_min=0"
+                + ";close_flush_timeout_millis=0;")) {
+            Assert.assertNotNull(db);
+        }
+    }
+
+    @Test
+    public void testLazyConnectConflictsWithBlockingInitialConnectRetry() {
+        // off/false (OFF) and on/true/sync (SYNC) all block or fail-fast at
+        // startup, so each conflicts with lazy_connect and must be rejected with
+        // a clear remedy.
+        assertLazyConflict("initial_connect_retry=off", "initial_connect_retry", "async");
+        assertLazyConflict("initial_connect_retry=sync", "initial_connect_retry", "async");
+        assertLazyConflict("initial_connect_retry=on", "initial_connect_retry", "async");
+    }
+
+    @Test
+    public void testLazyConnectConflictsWithExplicitQueryPoolMinInConfig() {
+        // An explicit query_pool_min > 0 makes the read pool eagerly fail-fast at
+        // startup, contradicting lazy_connect.
+        assertLazyConflict("query_pool_min=1", "query_pool_min", "0");
+        assertLazyConflict("query_pool_min=2", "query_pool_min", "0");
+        // query_pool_min=0 is exactly what lazy_connect wants -- no conflict.
+        int port = TestPorts.findUnusedPort();
+        try (QuestDB db = QuestDB.connect("ws::addr=localhost:" + port
+                + ";lazy_connect=true;query_pool_min=0;close_flush_timeout_millis=0;")) {
+            Assert.assertNotNull(db);
+        }
+    }
+
+    @Test
+    public void testLazyConnectConflictsWithExplicitQueryPoolMinFromBuilder() {
+        // The conflict also fires when query_pool_min > 0 comes from an explicit
+        // builder call (queryPoolMin / queryPoolSize), not just the connect string.
+        int port = TestPorts.findUnusedPort();
+        assertLazyConflict(QuestDB.builder()
+                .fromConfig("ws::addr=localhost:" + port + ";lazy_connect=true;close_flush_timeout_millis=0;")
+                .queryPoolMin(1), "query_pool_min", "0");
+        assertLazyConflict(QuestDB.builder()
+                .fromConfig("ws::addr=localhost:" + port + ";lazy_connect=true;close_flush_timeout_millis=0;")
+                .queryPoolSize(2), "query_pool_min", "0");
+    }
+
+    private static void assertLazyConflict(String extraKeys, String... expectedFragments) {
+        int port = TestPorts.findUnusedPort();
+        assertLazyConflict(QuestDB.builder().fromConfig("ws::addr=localhost:" + port
+                + ";lazy_connect=true;" + extraKeys + ";close_flush_timeout_millis=0;"), expectedFragments);
+    }
+
+    private static void assertLazyConflict(QuestDBBuilder builder, String... expectedFragments) {
+        try {
+            builder.build().close();
+            Assert.fail("expected lazy_connect configuration conflict");
+        } catch (IllegalArgumentException e) {
+            String msg = e.getMessage();
+            Assert.assertNotNull(msg);
+            Assert.assertTrue(msg, msg.contains("lazy_connect"));
+            for (int i = 0; i < expectedFragments.length; i++) {
+                Assert.assertTrue("'" + msg + "' should mention '" + expectedFragments[i] + "'",
+                        msg.contains(expectedFragments[i]));
+            }
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/QuestDBServerRecoveryTest.java b/core/src/test/java/io/questdb/client/test/QuestDBServerRecoveryTest.java
new file mode 100644
index 00000000..c68be090
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/QuestDBServerRecoveryTest.java
@@ -0,0 +1,114 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test;
+
+import io.questdb.client.QuestDB;
+import io.questdb.client.Sender;
+import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.concurrent.TimeUnit;
+import java.util.function.BooleanSupplier;
+
+/**
+ * End-to-end resilience: the facade starts with the server down, the producer
+ * keeps writing (buffered), and once the server comes up the write side
+ * reconnects and the read side -- previously deferred so it could not fail-fast
+ * the build -- can connect.
+ * <p>
+ * The mock cannot answer a real SELECT (result frames are exercised against a
+ * real server in the parent repo), so the read step asserts the query client
+ * <em>connects</em> once the server is up, not the row contents.
+ */
+public class QuestDBServerRecoveryTest {
+
+    @Test(timeout = 60_000)
+    public void testFacadeStartsWhileServerDownThenWritesAndReaderConnectsOnRecovery() throws Exception {
+        // One mock server (the whole "cluster"), bound so the port is known but
+        // NOT accepting yet: the address is reachable but no WebSocket upgrade
+        // completes, so the server is effectively "down". It serves ingest ACK
+        // on the write path and a SERVER_INFO frame on the read path -- the read
+        // path is gated so the ingest connection's ACK stream is never disturbed.
+        try (TestWebSocketServer server = new TestWebSocketServer(new TestWebSocketServer.WebSocketServerHandler() {
+        })) {
+            server.setSendServerInfo(true); // the egress client's connect() waits for SERVER_INFO
+            // One cluster config drives both pools:
+            // lazy_connect=true expands to exactly this resilience: the ingest
+            // side goes async (the producer never blocks; writes buffer until the
+            // wire is up) and the read pool defaults to min=0 (the otherwise
+            // fail-fast reader never sinks the build while the server is down,
+            // and connects lazily on the first query).
+            String cfg = "ws::addr=localhost:" + server.getPort()
+                    + ";lazy_connect=true"
+                    + ";sender_pool_min=1;sender_pool_max=1;query_pool_max=1"
+                    + ";auth_timeout_ms=2000;reconnect_initial_backoff_millis=20"
+                    + ";reconnect_max_backoff_millis=100;reconnect_max_duration_millis=600000"
+                    + ";close_flush_timeout_millis=1000;";
+
+            // (1) server down + (2) client starts:
+            try (QuestDB db = QuestDB.builder().fromConfig(cfg).build()) {
+                Assert.assertEquals("no handshake while the server is down", 0, server.handshakeCount());
+
+                // lazy_connect keeps reads ENABLED, just deferred: the read pool
+                // defaults to min=0, so nothing connects while the server is
+                // down. The read client connects lazily on the first
+                // borrowQuery() once the server is up (step 5).
+
+                // (3) client writes -> buffers in the cursor SF engine; the call
+                // must not throw even though the server is down.
+                Sender sender = db.borrowSender();
+                sender.table("t").longColumn("v", 1L).atNow();
+
+                // (4) server starts:
+                server.start();
+                Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+
+                // The write side reconnects on its own once the server is up.
+                awaitTrue("ingest must connect after the server comes up",
+                        () -> server.handshakeCount() >= 1);
+
+                // (5) client can now read: the deferred reader connects on the
+                // first borrowQuery() (the mock does not serve rows, so we
+                // assert the connection, not the result).
+                int handshakesBeforeQuery = server.handshakeCount();
+                db.borrowQuery().close();
+                awaitTrue("query client must connect after the server comes up",
+                        () -> server.handshakeCount() >= handshakesBeforeQuery + 1);
+            }
+        }
+    }
+
+    private static void awaitTrue(String message, BooleanSupplier condition) throws InterruptedException {
+        long deadline = System.nanoTime() + TimeUnit.SECONDS.toNanos(15);
+        while (System.nanoTime() < deadline) {
+            if (condition.getAsBoolean()) {
+                return;
+            }
+            Thread.sleep(20);
+        }
+        Assert.assertTrue(message, condition.getAsBoolean());
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/http/client/WebSocketClientTest.java b/core/src/test/java/io/questdb/client/test/cutlass/http/client/WebSocketClientTest.java
index cf121d8c..cefdac35 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/http/client/WebSocketClientTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/http/client/WebSocketClientTest.java
@@ -31,16 +31,61 @@
 import io.questdb.client.cutlass.http.client.WebSocketSendBuffer;
 import io.questdb.client.network.PlainSocketFactory;
 import io.questdb.client.network.Socket;
+import io.questdb.client.network.SocketReadinessWaiter;
 import org.junit.Assert;
 import org.junit.Test;
 
 import java.lang.reflect.Field;
 import java.lang.reflect.Method;
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.atomic.AtomicReference;
 
 import static io.questdb.client.test.tools.TestUtils.assertMemoryLeak;
 
 public class WebSocketClientTest {
 
+    /**
+     * close() frees native memory (recv/fragment buffers, send buffers), so
+     * its guard must be a CAS, not a volatile check-then-act: two concurrent
+     * closers passing the flag check together would both run
+     * disconnect()/Unsafe.free -- a native double-free. Closers can race in
+     * practice: the owner thread's teardown vs the I/O thread's exit path vs
+     * stale duplicate references (see CursorWebSocketSendLoop). The memory
+     * counters checked by assertMemoryLeak flag a double-free as a counter
+     * mismatch.
+     */
+    @Test
+    public void testConcurrentCloseRunsTeardownExactlyOnce() throws Exception {
+        assertMemoryLeak(() -> {
+            final int threads = 4;
+            final int iterations = 200;
+            for (int i = 0; i < iterations; i++) {
+                StubWebSocketClient client = new StubWebSocketClient();
+                CyclicBarrier barrier = new CyclicBarrier(threads);
+                AtomicReference<Throwable> failure = new AtomicReference<>();
+                Thread[] closers = new Thread[threads];
+                for (int t = 0; t < threads; t++) {
+                    closers[t] = new Thread(() -> {
+                        try {
+                            barrier.await();
+                            client.close();
+                        } catch (Throwable e) {
+                            failure.compareAndSet(null, e);
+                        }
+                    });
+                    closers[t].start();
+                }
+                for (Thread closer : closers) {
+                    closer.join();
+                }
+                Throwable t = failure.get();
+                if (t != null) {
+                    throw new AssertionError("concurrent close failed on iteration " + i, t);
+                }
+            }
+        });
+    }
+
     @Test
     public void testExtractMaxBatchSizeAbsentHeaderReturnsZero() throws Exception {
         String response = "HTTP/1.1 101 Switching Protocols\r\n"
@@ -263,7 +308,7 @@ public int send(long bufferPtr, int bufferLen) {
         }
 
         @Override
-        public void startTlsSession(CharSequence peerName) {
+        public void startTlsSession(CharSequence peerName, SocketReadinessWaiter waiter) {
             throw new UnsupportedOperationException();
         }
 
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/BackgroundConnectTimeoutDefaultTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/BackgroundConnectTimeoutDefaultTest.java
new file mode 100644
index 00000000..d5f1d660
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/BackgroundConnectTimeoutDefaultTest.java
@@ -0,0 +1,81 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client;
+
+import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static io.questdb.client.cutlass.qwp.client.QwpWebSocketSender.DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS;
+import static io.questdb.client.cutlass.qwp.client.QwpWebSocketSender.effectiveConnectTimeoutMs;
+
+/**
+ * Background (drainer) connect walks must never inherit the untimed native
+ * connect that connect_timeout=0 (the default) means for the foreground.
+ * <p>
+ * During an outage a drainer is routinely parked inside a blocking native
+ * connect ({@code nf.connectAddrInfo}) that neither unpark nor interrupt
+ * cancels. The drainer pool's close sequence (2.5s graceful drain +
+ * requestStop + 500ms + shutdownNow) then reliably lands on the failed-stop
+ * teardown protocol: the WebSocket client and microbatch buffers are
+ * deliberately leaked and the SF slot lock is held until the OS connect
+ * deadline (SYN retries, 60-130s on Linux) resolves the stuck call. A finite
+ * background default bounds that window to seconds. Foreground semantics are
+ * intentionally untouched: an explicit user value is honoured verbatim on
+ * both paths, and the foreground's unset default stays untimed.
+ */
+public class BackgroundConnectTimeoutDefaultTest {
+
+    @Test
+    public void testBackgroundExplicitValueHonoured() {
+        Assert.assertEquals(500, effectiveConnectTimeoutMs(true, 500));
+        Assert.assertEquals(60_000, effectiveConnectTimeoutMs(true, 60_000));
+    }
+
+    @Test
+    public void testBackgroundUnsetGetsFiniteDefault() {
+        Assert.assertEquals(DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS, effectiveConnectTimeoutMs(true, 0));
+        // Defensive: builder validation rejects negatives, but the resolver
+        // must not turn a bad value back into an untimed background connect.
+        Assert.assertEquals(DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS, effectiveConnectTimeoutMs(true, -1));
+    }
+
+    @Test
+    public void testDefaultIsFinite() {
+        Assert.assertTrue(DEFAULT_BACKGROUND_CONNECT_TIMEOUT_MS > 0);
+    }
+
+    @Test
+    public void testForegroundExplicitValueHonoured() {
+        Assert.assertEquals(500, effectiveConnectTimeoutMs(false, 500));
+    }
+
+    @Test
+    public void testForegroundUnsetStaysUntimed() {
+        // 0 => WebSocketClient falls back to nf.connectAddrInfo (OS-bounded).
+        // Historical foreground behaviour, deliberately preserved.
+        Assert.assertEquals(0, effectiveConnectTimeoutMs(false, 0));
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseDrainTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseDrainTest.java
index ef012229..a233e0e1 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseDrainTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseDrainTest.java
@@ -38,8 +38,7 @@
 import java.util.concurrent.atomic.AtomicLong;
 
 /**
- * Regression tests for the close() drain semantics specified in
- * design/qwp-cursor-durability.md.
+ * Regression tests for the close() drain semantics.
  * <p>
  * Without {@code close_flush_timeout_millis}, close() returned as soon as
  * the cursor I/O loop's {@code running} flag flipped — meaning frames
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseSafetyNetTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseSafetyNetTest.java
index fe3bb059..2a266212 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseSafetyNetTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/CloseSafetyNetTest.java
@@ -30,6 +30,7 @@
 import io.questdb.client.SenderErrorHandler;
 import io.questdb.client.cutlass.line.LineSenderException;
 import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender;
+import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer;
 import org.jetbrains.annotations.NotNull;
 import org.junit.Assert;
 import org.junit.Rule;
@@ -62,47 +63,59 @@ public class CloseSafetyNetTest {
     public final TemporaryFolder sfDir = TemporaryFolder.builder().assureDeletion().build();
 
     @Test(timeout = 30_000)
-    public void testCloseRethrowsUnsurfacedTerminalWithoutCustomHandler() {
-        // No server, no handler, tight reconnect budget: the I/O thread
-        // latches a never-connected budget-exhaustion terminal that nothing
-        // has surfaced to the user. close() must throw it.
-        Sender sender = Sender.fromConfig(cfg());
-        boolean closed = false;
-        try {
-            awaitLatchedTerminal((QwpWebSocketSender) sender);
+    public void testCloseRethrowsUnsurfacedTerminalWithoutCustomHandler() throws Exception {
+        // A 401 server, no handler: the I/O thread latches a genuine auth
+        // terminal (ws-upgrade-failed / SECURITY_ERROR) that nothing has
+        // surfaced to the user. close() must throw it. (Under Invariant B a
+        // mere connection error would retry forever and never latch -- only a
+        // genuine terminal like auth does.)
+        try (TestWebSocketServer server = new TestWebSocketServer(NOOP_HANDLER)) {
+            server.setRejectWithStatus(401, "Unauthorized");
+            server.start();
+            Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+            Sender sender = Sender.fromConfig(cfg(server.getPort()));
+            boolean closed = false;
             try {
-                closed = true;
-                sender.close();
-                Assert.fail("close() must rethrow a terminal error that no synchronous "
-                        + "caller and no custom handler has seen");
-            } catch (LineSenderException e) {
-                String msg = e.getMessage() == null ? "" : e.getMessage();
-                Assert.assertTrue("close() must rethrow the latched terminal: " + msg,
-                        msg.contains("never-connected-budget-exhausted"));
-                Assert.assertTrue("the latched instance is the typed server exception",
-                        e instanceof LineSenderServerException);
-            }
-        } finally {
-            if (!closed) {
-                sender.close();
+                awaitLatchedTerminal((QwpWebSocketSender) sender);
+                try {
+                    closed = true;
+                    sender.close();
+                    Assert.fail("close() must rethrow a terminal error that no synchronous "
+                            + "caller and no custom handler has seen");
+                } catch (LineSenderException e) {
+                    String msg = e.getMessage() == null ? "" : e.getMessage();
+                    Assert.assertTrue("close() must rethrow the latched terminal: " + msg,
+                            msg.contains("ws-upgrade-failed") || msg.contains("401"));
+                    Assert.assertTrue("the latched instance is the typed server exception",
+                            e instanceof LineSenderServerException);
+                }
+            } finally {
+                if (!closed) {
+                    sender.close();
+                }
             }
         }
     }
 
     @Test(timeout = 30_000)
     public void testCloseStaysSilentWhenCustomHandlerAlreadyDelivered() throws Exception {
-        // Same terminal, but the user installed a custom error handler and
+        // Same auth terminal, but the user installed a custom error handler and
         // the dispatcher delivered the error to it. close() must NOT
         // double-signal.
-        ErrorInbox inbox = new ErrorInbox();
-        Sender sender = Sender.builder(cfg())
-                .errorHandler(inbox)
-                .build();
-        Assert.assertTrue("terminal must reach the custom handler within 10s",
-                inbox.await(10, TimeUnit.SECONDS));
-        Assert.assertNotNull(inbox.get());
-        // The handler owns the error now; a rethrow here would double-signal.
-        sender.close();
+        try (TestWebSocketServer server = new TestWebSocketServer(NOOP_HANDLER)) {
+            server.setRejectWithStatus(401, "Unauthorized");
+            server.start();
+            Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+            ErrorInbox inbox = new ErrorInbox();
+            Sender sender = Sender.builder(cfg(server.getPort()))
+                    .errorHandler(inbox)
+                    .build();
+            Assert.assertTrue("terminal must reach the custom handler within 10s",
+                    inbox.await(10, TimeUnit.SECONDS));
+            Assert.assertNotNull(inbox.get());
+            // The handler owns the error now; a rethrow here would double-signal.
+            sender.close();
+        }
     }
 
     /**
@@ -120,8 +133,8 @@ private static void awaitLatchedTerminal(QwpWebSocketSender sender) {
         }
     }
 
-    private String cfg() {
-        return "ws::addr=localhost:" + TestPorts.findUnusedPort()
+    private String cfg(int port) {
+        return "ws::addr=localhost:" + port
                 + ";sf_dir=" + sfDir.getRoot().getAbsolutePath()
                 + ";initial_connect_retry=async"
                 + ";reconnect_max_duration_millis=400"
@@ -130,6 +143,13 @@ private String cfg() {
                 + ";close_flush_timeout_millis=0;";
     }
 
+    private static final TestWebSocketServer.WebSocketServerHandler NOOP_HANDLER =
+            new TestWebSocketServer.WebSocketServerHandler() {
+                @Override
+                public void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) {
+                }
+            };
+
     private static class ErrorInbox implements SenderErrorHandler {
         private final CountDownLatch latch = new CountDownLatch(1);
         private final AtomicReference<SenderError> ref = new AtomicReference<>();
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectAsyncTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectAsyncTest.java
index 0733de8f..fd1c604c 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectAsyncTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectAsyncTest.java
@@ -49,10 +49,11 @@
 /**
  * Behavior of {@code initial_connect_retry=async}: the producer-thread
  * {@code Sender.fromConfig} must return immediately even when no server
- * is reachable; the I/O thread retries connect in the background, and
- * terminal failures (auth/upgrade reject, budget exhaustion) are
- * delivered through the async error inbox rather than thrown at the
- * call site.
+ * is reachable; the I/O thread retries connect in the background. Plain
+ * connect failures are retried indefinitely (Invariant B: no wall-clock
+ * budget give-up); only genuine terminals (auth/upgrade reject,
+ * durable-ack capability gap) are delivered through the async error
+ * inbox rather than thrown at the call site.
  */
 public class InitialConnectAsyncTest {
 
@@ -106,17 +107,19 @@ public void testAsyncAuthFailureDeliversToErrorInbox() throws Exception {
     }
 
     @Test
-    public void testAsyncBudgetExhaustionDeliversToErrorInbox() throws Exception {
-        // No server. With async mode and a tight cap, the I/O thread
-        // exhausts its connect budget and surfaces a SenderError to the
-        // user-supplied handler. fromConfig itself does not throw; only
-        // close() rethrows the latched terminal so a user who never
-        // installed a handler still sees the failure on shutdown.
+    public void testAsyncNoServerRetriesForeverNoTerminal() throws Exception {
+        // INVARIANT B: an SF sender in async mode pointed at a dead port must
+        // NEVER surface a connection-error terminal -- a down server is transient
+        // (it may appear; the data is safe in SF), so the I/O thread retries
+        // forever. reconnect_max_duration_millis is IGNORED as a give-up deadline:
+        // no SenderError lands, the sender stays usable, and wasEverConnected()
+        // stays false. Only a GENUINE terminal (auth/upgrade) or SF exhaustion may
+        // surface -- see testAsyncAuthFailureDeliversToErrorInbox.
         int port = TestPorts.findUnusedPort();
         ErrorInbox inbox = new ErrorInbox();
         String cfg = "ws::addr=localhost:" + port
                 + sfDirOpt() + ";initial_connect_retry=async"
-                + ";reconnect_max_duration_millis=400"
+                + ";reconnect_max_duration_millis=200"
                 + ";reconnect_initial_backoff_millis=10"
                 + ";reconnect_max_backoff_millis=50"
                 + ";close_flush_timeout_millis=0;";
@@ -124,38 +127,25 @@ public void testAsyncBudgetExhaustionDeliversToErrorInbox() throws Exception {
                 .errorHandler(inbox)
                 .build();
         try {
-            // Wait up to 5s for the I/O thread to exhaust its budget.
-            Assert.assertTrue(
-                    "async budget exhaustion must surface a SenderError within 5s",
-                    inbox.await(5, TimeUnit.SECONDS));
-            SenderError err = inbox.get();
-            Assert.assertNotNull(
-                    "async budget exhaustion must surface a SenderError to the inbox",
-                    err);
-            Assert.assertEquals(
-                    "budget exhaustion is a HALT-policy terminal",
-                    SenderError.Policy.HALT, err.getAppliedPolicy());
-            Assert.assertEquals(
-                    "category must be PROTOCOL_VIOLATION for budget exhaustion",
-                    SenderError.Category.PROTOCOL_VIOLATION, err.getCategory());
-            String msg = err.getServerMessage() == null ? "" : err.getServerMessage();
-            Assert.assertTrue(
-                    "error message must use never-connected tag (no successful connect): " + msg,
-                    msg.contains("never-connected-budget-exhausted"));
-            Assert.assertTrue(
-                    "error message must hint at config-likely cause: " + msg,
-                    msg.contains("never reached the server"));
+            // Observe well past the (ignored) 200ms budget: no terminal lands.
             Assert.assertFalse(
-                    "wasEverConnected() must be false when no connect ever succeeded",
+                    "async SF sender must NOT surface a connection-error terminal "
+                            + "(Invariant B: retries forever past the budget)",
+                    inbox.await(1500, TimeUnit.MILLISECONDS));
+            Assert.assertNull("no SenderError may be delivered for a down server", inbox.get());
+            // Sender stays usable -- producer keeps appending to SF.
+            sender.table("foo").longColumn("v", 1L).atNow();
+            sender.flush();
+            Assert.assertFalse(
+                    "wasEverConnected() stays false while no server is reachable",
                     ((QwpWebSocketSender) sender).wasEverConnected());
         } finally {
-            assertCloseRethrowsTerminal(sender,
-                    "never-connected-budget-exhausted");
+            sender.close();
         }
     }
 
     @Test
-    public void testAsyncDeliversBufferedRowsWhenServerArrivesLate() {
+    public void testAsyncDeliversBufferedRowsWhenServerArrivesLate() throws Exception {
         // Sender opens before the server is listening. Frames are
         // appended to the cursor SF engine on the producer thread. The
         // I/O thread retries connect in the background; once the server
@@ -169,7 +159,10 @@ public void testAsyncDeliversBufferedRowsWhenServerArrivesLate() {
                     + ";reconnect_initial_backoff_millis=20"
                     + ";reconnect_max_backoff_millis=200"
                     + ";close_flush_timeout_millis=2000;";
-            try (Sender sender = Sender.fromConfig(cfg)) {
+            // fromConfig/flush/setup failures must fail the test -- only
+            // close() teardown noise is tolerated (see closeQuietly).
+            Sender sender = Sender.fromConfig(cfg);
+            try {
                 QwpWebSocketSender wss = (QwpWebSocketSender) sender;
                 // wasEverConnected starts false in async mode — the I/O
                 // thread has not yet completed an upgrade.
@@ -198,9 +191,9 @@ public void testAsyncDeliversBufferedRowsWhenServerArrivesLate() {
                 Assert.assertTrue(
                         "wasEverConnected() must flip to true after the I/O thread connects",
                         ((QwpWebSocketSender) sender).wasEverConnected());
+            } finally {
+                closeQuietly(sender);
             }
-        } catch (Exception ignored) {
-            // already closed
         }
     }
 
@@ -233,13 +226,12 @@ public void testAsyncReturnsImmediatelyWithNoServer() {
     }
 
     @Test
-    public void testConnectionLostBudgetExhaustionTagsDifferently() {
-        // Server is up at first (initial connect succeeds + ACKs one
-        // batch), then we tear it down. The I/O loop tries to reconnect,
-        // every attempt hits TCP refused, and the budget exhausts.
-        // Because the loop did connect at least once before the outage,
-        // the SenderError must use the connection-lost tag and the sender
-        // must report wasEverConnected()==true.
+    public void testConnectionLostRetriesForeverNoTerminal() throws Exception {
+        // INVARIANT B: after a successful connect, if the server drops, the
+        // mid-stream reconnect must retry FOREVER -- it must NEVER surface a
+        // connection-lost terminal on a wall-clock budget. The rows are safe in
+        // SF and the server may return, so reconnect_max_duration_millis is
+        // ignored as a give-up deadline. wasEverConnected() stays true.
         AckHandler handler = new AckHandler();
         try (TestWebSocketServer server = new TestWebSocketServer(handler)) {
             int port = server.getPort();
@@ -248,7 +240,7 @@ public void testConnectionLostBudgetExhaustionTagsDifferently() {
 
             ErrorInbox inbox = new ErrorInbox();
             String cfg = "ws::addr=localhost:" + port
-                    + ";reconnect_max_duration_millis=400"
+                    + ";reconnect_max_duration_millis=200"
                     + ";reconnect_initial_backoff_millis=10"
                     + ";reconnect_max_backoff_millis=50"
                     + ";close_flush_timeout_millis=0;";
@@ -265,54 +257,48 @@ public void testConnectionLostBudgetExhaustionTagsDifferently() {
                         "wasEverConnected() must be true after a successful connect",
                         ((QwpWebSocketSender) sender).wasEverConnected());
 
-                // Tear the server down. The cursor I/O loop's tryReceiveAcks
-                // polls every 50us and discovers the peer disconnect on its
-                // own, then enters the reconnect loop and exhausts the
-                // 400ms budget — no producer activity required.
+                // Tear the server down. The I/O loop discovers the disconnect and
+                // enters reconnect -- which must retry forever, NOT surface a
+                // terminal on the (ignored) 200ms budget.
                 server.close();
-                Assert.assertTrue("budget exhaustion must surface a SenderError within 5s",
-                        inbox.await(5, TimeUnit.SECONDS));
-                SenderError err = inbox.get();
-                Assert.assertNotNull("budget exhaustion must surface a SenderError", err);
-                String msg = err.getServerMessage() == null ? "" : err.getServerMessage();
-                Assert.assertTrue(
-                        "error message must use connection-lost tag: " + msg,
-                        msg.contains("connection-lost-budget-exhausted"));
-                Assert.assertTrue(
-                        "error message must hint at transient cause: " + msg,
-                        msg.contains("server unreachable since last connect"));
+                Assert.assertFalse(
+                        "mid-stream reconnect must NOT surface a connection-lost terminal "
+                                + "(Invariant B: retries forever past the budget)",
+                        inbox.await(1500, TimeUnit.MILLISECONDS));
+                Assert.assertNull("no terminal may be delivered on a transient outage", inbox.get());
                 Assert.assertTrue(
                         "wasEverConnected() must remain true after the outage",
                         ((QwpWebSocketSender) sender).wasEverConnected());
             } finally {
-                assertCloseRethrowsTerminal(sender, "connection-lost-budget-exhausted");
+                // closeQuietly (not a bare close()) so a close-path exception
+                // cannot replace a pending AssertionError from the contract
+                // assertions above and mask a genuine failure.
+                closeQuietly(sender);
             }
-        } catch (Exception ignored) {
-            // already closed
         }
     }
 
     @Test
-    public void testWasEverConnectedTrueImmediatelyInSyncMode() {
+    public void testWasEverConnectedTrueImmediatelyInSyncMode() throws Exception {
         // Default (OFF) and SYNC modes both connect on the user thread
         // before fromConfig returns. wasEverConnected() must therefore
         // already be true the instant the sender becomes visible to the
         // caller — there is no observable "never connected" window in
-        // those modes, so misclassifying a budget exhaustion as
-        // never-connected is impossible.
+        // those modes.
         try (TestWebSocketServer server = new TestWebSocketServer(new AckHandler())) {
             int port = server.getPort();
             server.start();
             Assert.assertTrue(server.awaitStart(5, java.util.concurrent.TimeUnit.SECONDS));
             String cfg = "ws::addr=localhost:" + port
                     + ";close_flush_timeout_millis=0;";
-            try (Sender sender = Sender.fromConfig(cfg)) {
+            Sender sender = Sender.fromConfig(cfg);
+            try {
                 Assert.assertTrue(
                         "wasEverConnected() must be true immediately in OFF/SYNC mode",
                         ((QwpWebSocketSender) sender).wasEverConnected());
+            } finally {
+                closeQuietly(sender);
             }
-        } catch (Exception ignored) {
-            // already closed
         }
     }
 
@@ -335,6 +321,20 @@ private static void awaitAtLeastOneConnectAttempt(QwpWebSocketSender wss) {
         }
     }
 
+    /**
+     * Closes the sender, tolerating close-path teardown noise only. Used
+     * instead of a broad {@code catch (Exception ignored)} around a whole
+     * test body, which would swallow fromConfig/flush/setup failures and
+     * let the contract assertions pass vacuously.
+     */
+    private static void closeQuietly(Sender sender) {
+        try {
+            sender.close();
+        } catch (Exception ignored) {
+            // close() teardown noise only
+        }
+    }
+
     /**
      * Closes the sender and tolerates either outcome:
      * * close() throws -- the latched terminal must mention the expected
@@ -362,8 +362,11 @@ private static void assertCloseRethrowsTerminal(Sender sender, String expectedSu
 
     /**
      * Returns a unique temp sf_dir snippet for embedding in a config
-     * string. initial_connect_retry on/sync/async requires sf_dir per
-     * spec §3.5; without it the builder rejects construction.
+     * string. The builder does NOT require sf_dir for any
+     * initial_connect_retry mode — without it the sender builds in
+     * memory mode and buffers rows in the in-RAM cursor ring. These
+     * tests set an sf_dir so the rows accumulated before the first
+     * successful connect are disk-backed (the durable SF path).
      */
     private static String sfDirOpt() {
         String dir = java.nio.file.Paths.get(
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectRetryTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectRetryTest.java
index d5c5d5af..2d775773 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectRetryTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/InitialConnectRetryTest.java
@@ -42,9 +42,12 @@
 public class InitialConnectRetryTest {
 
     /**
-     * Temp sf_dir for retry-mode tests. Per spec §3.5,
-     * initial_connect_retry on/sync/async requires sf_dir — memory-mode
-     * senders cannot durably retry across reconnects.
+     * Temp sf_dir for retry-mode tests. The builder does NOT require
+     * sf_dir for any initial_connect_retry mode — memory-mode senders
+     * share the same retry machinery, buffering rows in the in-RAM
+     * cursor ring instead of on disk. These tests use an sf_dir so the
+     * retried rows are disk-backed and the tests exercise the durable
+     * SF path.
      */
     private static String makeSfDir() {
         return java.nio.file.Paths.get(
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/PrReviewRedTestsE2e.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/PrReviewRedTestsE2e.java
index 51da7427..35da304b 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/PrReviewRedTestsE2e.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/PrReviewRedTestsE2e.java
@@ -60,9 +60,9 @@ public class PrReviewRedTestsE2e {
      *   <li>{@code fail()} auth-terminal branch (lines 437-438)</li>
      *   <li>{@code fail()} budget-exhausted branch (lines 484-485)</li>
      * </ul>
-     * The locked spec ({@code design/qwp-cursor-error-api.md} § "Path 2:
-     * producer-side typed throw") requires {@code signal.terminalError = err}
-     * to be written BEFORE {@code errorInbox.offer(err)}.
+     * The error-API contract ("Path 2: producer-side typed throw") requires
+     * {@code signal.terminalError = err} to be written BEFORE
+     * {@code errorInbox.offer(err)}.
      * <p>
      * Concrete consequence the spec calls out: a user-supplied error handler
      * that synchronously calls {@code sender.flush()} from inside
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpColumnBatchViewsTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpColumnBatchViewsTest.java
index 697f3350..21b96af4 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpColumnBatchViewsTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpColumnBatchViewsTest.java
@@ -76,6 +76,29 @@ public void setUp() {
 
     @After
     public void tearDown() {
+        // Safety net for exits that bypass the assertMemoryLeak wrapper;
+        // normally a no-op because the wrapper's finally already freed them.
+        freeAllocations();
+    }
+
+    /**
+     * Wraps a test body in {@link TestUtils#assertMemoryLeak} and frees the
+     * tracked allocations BEFORE the leak check fires -- LeakCheck closes at
+     * the end of the wrapped lambda, so freeing only in @After would run too
+     * late and fail every test now that the check asserts strict per-tag
+     * equality.
+     */
+    private void assertMemoryLeak(TestUtils.LeakProneCode code) throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            try {
+                code.run();
+            } finally {
+                freeAllocations();
+            }
+        });
+    }
+
+    private void freeAllocations() {
         for (long[] alloc : allocations) {
             Unsafe.free(alloc[0], alloc[1], MemoryTag.NATIVE_DEFAULT);
         }
@@ -84,7 +107,7 @@ public void tearDown() {
 
     @Test
     public void testColumnViewArrayRowAddr() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             Object l = setupArrayColumnLayout(batch,
                     new boolean[]{false, true, false},
@@ -102,7 +125,7 @@ public void testColumnViewArrayRowAddr() throws Exception {
 
     @Test
     public void testColumnViewBatchAccessorReturnsParent() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 1);
             setupLongColumnLayout(batch, 0, "x", new long[]{42L}, new boolean[]{false});
             Assert.assertSame(batch, batch.column(0).batch());
@@ -111,7 +134,7 @@ public void testColumnViewBatchAccessorReturnsParent() throws Exception {
 
     @Test
     public void testColumnViewBinaryAccessors() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupBinaryColumnLayout(batch,
                     new byte[][]{{0x00, 0x7F, (byte) 0xFF}, null, {0x01}},
@@ -137,7 +160,7 @@ public void testColumnViewBinaryAccessors() throws Exception {
 
     @Test
     public void testColumnViewBoolValue() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 5);
             setupBooleanColumnLayout(batch, 0,
                     new boolean[]{true, false, true, true, false},
@@ -154,7 +177,7 @@ public void testColumnViewBoolValue() throws Exception {
 
     @Test
     public void testColumnViewByteValue() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 4);
             setupByteColumnLayout(batch, 0,
                     new byte[]{Byte.MIN_VALUE, -1, 0, Byte.MAX_VALUE},
@@ -169,7 +192,7 @@ public void testColumnViewByteValue() throws Exception {
 
     @Test
     public void testColumnViewBytesPerValuePerType() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(8, 1);
             setupLongColumnLayout(batch, 0, "l", new long[]{0}, new boolean[]{false});
             setupIntColumnLayout(batch, 1, new int[]{0}, new boolean[]{false});
@@ -193,7 +216,7 @@ public void testColumnViewBytesPerValuePerType() throws Exception {
 
     @Test
     public void testColumnViewCachedPerColumnIndex() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(2, 1);
             setupLongColumnLayout(batch, 0, "a", new long[]{1L}, new boolean[]{false});
             setupLongColumnLayout(batch, 1, "b", new long[]{2L}, new boolean[]{false});
@@ -215,7 +238,7 @@ public void testColumnViewCachedPerColumnIndex() throws Exception {
 
     @Test
     public void testColumnViewCharValue() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupCharColumnLayout(batch, 0,
                     new char[]{'A', 'z', '0'},
@@ -229,7 +252,7 @@ public void testColumnViewCharValue() throws Exception {
 
     @Test
     public void testColumnViewDecimal128Accessors() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             long[] lo = {0xFFEE_DDCC_BBAA_9988L, 0L, 0x1L};
             long[] hi = {0x1122_3344_5566_7788L, 0L, 0x2L};
@@ -246,7 +269,7 @@ public void testColumnViewDecimal128Accessors() throws Exception {
 
     @Test
     public void testColumnViewDelegatesAgreeWithBatchPrimitives() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(5, 4);
             setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L, 0L, 4L}, new boolean[]{false, false, true, false});
             setupIntColumnLayout(batch, 1, new int[]{10, 20, 0, 40}, new boolean[]{false, false, true, false});
@@ -279,7 +302,7 @@ public void testColumnViewDelegatesAgreeWithBatchPrimitives() throws Exception {
 
     @Test
     public void testColumnViewDoubleArrayElements() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupArrayColumnLayout(batch,
                     new boolean[]{false, true, false},
@@ -293,7 +316,7 @@ public void testColumnViewDoubleArrayElements() throws Exception {
 
     @Test
     public void testColumnViewDoubleValue() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 4);
             setupDoubleColumnLayout(batch, 0,
                     new double[]{1.5, -1.5, 0.0, Double.MAX_VALUE},
@@ -308,7 +331,7 @@ public void testColumnViewDoubleValue() throws Exception {
 
     @Test
     public void testColumnViewFloatValue() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupFloatColumnLayout(batch, 0,
                     new float[]{1.5f, -1.5f, 0.0f},
@@ -322,7 +345,7 @@ public void testColumnViewFloatValue() throws Exception {
 
     @Test
     public void testColumnViewGeohashValue() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(3, 2);
             setupGeohashColumnLayout(batch, 0, "g20", new long[]{0xABCDEL, 0L}, new boolean[]{false, true}, 20);
             setupGeohashColumnLayout(batch, 1, "g40", new long[]{0x12345_6789AL, 0L}, new boolean[]{false, true}, 40);
@@ -344,7 +367,7 @@ public void testColumnViewGeohashValue() throws Exception {
 
     @Test
     public void testColumnViewGetColumnIndex() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(3, 1);
             setupLongColumnLayout(batch, 0, "a", new long[]{0}, new boolean[]{false});
             setupLongColumnLayout(batch, 1, "b", new long[]{0}, new boolean[]{false});
@@ -357,7 +380,7 @@ public void testColumnViewGetColumnIndex() throws Exception {
 
     @Test
     public void testColumnViewGetColumnWireType() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(2, 1);
             setupLongColumnLayout(batch, 0, "l", new long[]{0}, new boolean[]{false});
             setupVarcharColumnLayout(batch, 1, "s", new String[]{""}, new boolean[]{false});
@@ -368,7 +391,7 @@ public void testColumnViewGetColumnWireType() throws Exception {
 
     @Test
     public void testColumnViewIntValue() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 4);
             setupIntColumnLayout(batch, 0,
                     new int[]{Integer.MIN_VALUE + 1, -1, 0, Integer.MAX_VALUE},
@@ -383,7 +406,7 @@ public void testColumnViewIntValue() throws Exception {
 
     @Test
     public void testColumnViewLong256AndLong256Word() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             long[][] words = {{0xAAAAL, 0xBBBBL, 0xCCCCL, 0xDDDDL}, {0L, 0L, 0L, 0L}};
             setupLong256ColumnLayout(batch, words, new boolean[]{false, true});
@@ -409,7 +432,7 @@ public void testColumnViewLong256AndLong256Word() throws Exception {
 
     @Test
     public void testColumnViewLongValue() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 4);
             setupLongColumnLayout(batch, 0, "l",
                     new long[]{Long.MIN_VALUE + 1, -1L, 0L, Long.MAX_VALUE},
@@ -424,7 +447,7 @@ public void testColumnViewLongValue() throws Exception {
 
     @Test
     public void testColumnViewNonNullCount() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 5);
             setupLongColumnLayout(batch, 0, "l",
                     new long[]{1L, 2L, 0L, 4L, 0L},
@@ -435,7 +458,7 @@ public void testColumnViewNonNullCount() throws Exception {
 
     @Test
     public void testColumnViewNonNullIndex() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 5);
             // Rows 1 and 3 are NULL; dense indices for non-null rows are 0, 1, 2.
             setupLongColumnLayout(batch, 0, "l",
@@ -452,7 +475,7 @@ public void testColumnViewNonNullIndex() throws Exception {
     public void testColumnViewNonNullIndexNoNulls() throws Exception {
         // When there are no nulls, dense index equals row index (layout skips the
         // nonNullIdx fill; the method just returns the row back).
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupLongColumnLayout(batch, 0, "l",
                     new long[]{1L, 2L, 3L},
@@ -466,7 +489,7 @@ public void testColumnViewNonNullIndexNoNulls() throws Exception {
 
     @Test
     public void testColumnViewNullBitmapAddrNoNulls() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupLongColumnLayout(batch, 0, "l",
                     new long[]{1L, 2L, 3L},
@@ -477,7 +500,7 @@ public void testColumnViewNullBitmapAddrNoNulls() throws Exception {
 
     @Test
     public void testColumnViewNullBitmapAddrWithNulls() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 5);
             setupLongColumnLayout(batch, 0, "l",
                     new long[]{1L, 0L, 3L, 0L, 5L},
@@ -496,7 +519,7 @@ public void testColumnViewNullBitmapAddrWithNulls() throws Exception {
 
     @Test
     public void testColumnViewNullValuesReturnTypeSentinels() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(6, 1);
             setupLongColumnLayout(batch, 0, "l", new long[]{0L}, new boolean[]{true});
             setupIntColumnLayout(batch, 1, new int[]{0}, new boolean[]{true});
@@ -519,7 +542,7 @@ public void testColumnViewNullValuesReturnTypeSentinels() throws Exception {
 
     @Test
     public void testColumnViewOfReturnsThis() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(2, 1);
             setupLongColumnLayout(batch, 0, "a", new long[]{1L}, new boolean[]{false});
             setupLongColumnLayout(batch, 1, "b", new long[]{2L}, new boolean[]{false});
@@ -532,7 +555,7 @@ public void testColumnViewOfReturnsThis() throws Exception {
 
     @Test
     public void testColumnViewRebindingPicksUpFreshLayout() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L}, new boolean[]{false, false});
             ColumnView col = batch.column(0);
@@ -560,7 +583,7 @@ public void testColumnViewRebindingPicksUpFreshLayout() throws Exception {
 
     @Test
     public void testColumnViewShortValue() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 4);
             setupShortColumnLayout(batch, 0,
                     new short[]{Short.MIN_VALUE + 1, -1, 0, Short.MAX_VALUE},
@@ -577,7 +600,7 @@ public void testColumnViewShortValue() throws Exception {
     public void testColumnViewStrBDualHold() throws Exception {
         // strA and strB are independent slots; a call to strB must not invalidate
         // an already-obtained strA view, and vice-versa.
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupVarcharColumnLayout(batch, 0, "s",
                     new String[]{"alpha", "beta", null},
@@ -596,7 +619,7 @@ public void testColumnViewStrBDualHold() throws Exception {
 
     @Test
     public void testColumnViewStringHeapAllocated() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupVarcharColumnLayout(batch, 0, "s",
                     new String[]{"alpha", null, "gamma"},
@@ -610,7 +633,7 @@ public void testColumnViewStringHeapAllocated() throws Exception {
 
     @Test
     public void testColumnViewStringSink() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupVarcharColumnLayout(batch, 0, "s",
                     new String[]{"alpha", null, "gamma"},
@@ -632,7 +655,7 @@ public void testColumnViewStringSink() throws Exception {
 
     @Test
     public void testColumnViewSymbolAccessors() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 5);
             String[] dict = {"AAPL", "MSFT", "GOOG"};
             int[] rowIds = {0, 1, 0, 2, -1};
@@ -662,7 +685,7 @@ public void testColumnViewSymbolAccessors() throws Exception {
 
     @Test
     public void testColumnViewUuidLoHi() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             long[] lo = {0xCAFE_BABEL, 0L};
             long[] hi = {0xDEAD_BEEFL, 0L};
@@ -677,7 +700,7 @@ public void testColumnViewUuidLoHi() throws Exception {
 
     @Test
     public void testColumnViewUuidWithSink() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             long[] lo = {0x1111_1111_1111_1111L, 0L};
             long[] hi = {0x2222_2222_2222_2222L, 0L};
@@ -693,7 +716,7 @@ public void testColumnViewUuidWithSink() throws Exception {
 
     @Test
     public void testColumnViewValuesAddrMatchesLayout() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(2, 1);
             Object lLayout = setupLongColumnLayout(batch, 0, "l", new long[]{1L}, new boolean[]{false});
             Object dLayout = setupDoubleColumnLayout(batch, 1, new double[]{2.0}, new boolean[]{false});
@@ -704,7 +727,7 @@ public void testColumnViewValuesAddrMatchesLayout() throws Exception {
 
     @Test
     public void testColumnViewVarcharAndStringBytesAddr() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupVarcharColumnLayout(batch, 0, "v",
                     new String[]{"hello", "world", null},
@@ -722,7 +745,7 @@ public void testColumnViewVarcharAndStringBytesAddr() throws Exception {
 
     @Test
     public void testForEachRowEmptyBatch() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 0);
             // Register a minimal layout so column() doesn't trip on null, though
             // forEachRow never reaches into it.
@@ -741,7 +764,7 @@ public void testForEachRowEmptyBatch() throws Exception {
 
     @Test
     public void testForEachRowExceptionPropagates() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 5);
             setupLongColumnLayout(batch, 0, "l",
                     new long[]{1L, 2L, 3L, 4L, 5L},
@@ -762,7 +785,7 @@ public void testForEachRowExceptionPropagates() throws Exception {
 
     @Test
     public void testForEachRowReusesSameInstance() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 4);
             setupLongColumnLayout(batch, 0, "l",
                     new long[]{1L, 2L, 3L, 4L},
@@ -775,7 +798,7 @@ public void testForEachRowReusesSameInstance() throws Exception {
 
     @Test
     public void testForEachRowVisitsRowsInOrder() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 5);
             setupLongColumnLayout(batch, 0, "l",
                     new long[]{10L, 20L, 30L, 40L, 50L},
@@ -796,7 +819,7 @@ public void testForEachRowVisitsRowsInOrder() throws Exception {
 
     @Test
     public void testRowViewArrayAccessors() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupArrayColumnLayout(batch,
                     new boolean[]{false, true, false},
@@ -811,7 +834,7 @@ public void testRowViewArrayAccessors() throws Exception {
 
     @Test
     public void testRowViewBatchAccessor() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 1);
             setupLongColumnLayout(batch, 0, "x", new long[]{42L}, new boolean[]{false});
             Assert.assertSame(batch, batch.row(0).batch());
@@ -820,7 +843,7 @@ public void testRowViewBatchAccessor() throws Exception {
 
     @Test
     public void testRowViewBinaryAccessor() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             setupBinaryColumnLayout(batch,
                     new byte[][]{{0x00, 0x7F, (byte) 0xFF}, null},
@@ -841,7 +864,7 @@ public void testRowViewBinaryAccessor() throws Exception {
     @Test
     public void testRowViewBinaryBDualHold() throws Exception {
         // binaryA and binaryB are independent slots, parallel to strA/strB.
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             setupBinaryColumnLayout(batch,
                     new byte[][]{{0x01, 0x02}, {(byte) 0xFE, (byte) 0xFF}},
@@ -860,7 +883,7 @@ public void testRowViewBinaryBDualHold() throws Exception {
 
     @Test
     public void testRowViewByteAndShortAndCharAndFloat() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(4, 2);
             setupByteColumnLayout(batch, 0, new byte[]{(byte) 127, 0}, new boolean[]{false, true});
             setupShortColumnLayout(batch, 1, new short[]{(short) -32000, 0}, new boolean[]{false, true});
@@ -884,7 +907,7 @@ public void testRowViewByteAndShortAndCharAndFloat() throws Exception {
 
     @Test
     public void testRowViewDecimal128() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             long[] lo = {0x1122_3344_5566_7788L, 0L};
             long[] hi = {0x99AA_BBCC_DDEE_FF00L, 0L};
@@ -898,7 +921,7 @@ public void testRowViewDecimal128() throws Exception {
 
     @Test
     public void testRowViewDelegatesAgreeWithBatchPrimitives() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(5, 4);
             setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L, 0L, 4L}, new boolean[]{false, false, true, false});
             setupIntColumnLayout(batch, 1, new int[]{10, 20, 0, 40}, new boolean[]{false, false, true, false});
@@ -921,7 +944,7 @@ public void testRowViewDelegatesAgreeWithBatchPrimitives() throws Exception {
 
     @Test
     public void testRowViewGeohashValue() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             setupGeohashColumnLayout(batch, 0, "g", new long[]{0xDEAD_BEEFL, 0L}, new boolean[]{false, true}, 32);
             Assert.assertEquals(0xDEAD_BEEFL, batch.row(0).getGeohashValue(0));
@@ -931,7 +954,7 @@ public void testRowViewGeohashValue() throws Exception {
 
     @Test
     public void testRowViewGetRowIndex() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 5);
             setupLongColumnLayout(batch, 0, "l",
                     new long[]{0L, 0L, 0L, 0L, 0L},
@@ -947,7 +970,7 @@ public void testRowViewGetRowIndex() throws Exception {
 
     @Test
     public void testRowViewLong256WithSink() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             long[][] words = {{0x1L, 0x2L, 0x3L, 0x4L}, {0L, 0L, 0L, 0L}};
             setupLong256ColumnLayout(batch, words, new boolean[]{false, true});
@@ -965,7 +988,7 @@ public void testRowViewLong256WithSink() throws Exception {
 
     @Test
     public void testRowViewLong256Word() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             long[][] words = {{0x11L, 0x22L, 0x33L, 0x44L}, {0L, 0L, 0L, 0L}};
             setupLong256ColumnLayout(batch, words, new boolean[]{false, true});
@@ -980,7 +1003,7 @@ public void testRowViewLong256Word() throws Exception {
 
     @Test
     public void testRowViewOfReturnsThis() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             setupLongColumnLayout(batch, 0, "l", new long[]{7L, 8L}, new boolean[]{false, false});
             RowView v = batch.row(0);
@@ -991,7 +1014,7 @@ public void testRowViewOfReturnsThis() throws Exception {
 
     @Test
     public void testRowViewSingleSharedInstance() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupLongColumnLayout(batch, 0, "l", new long[]{1L, 2L, 3L}, new boolean[]{false, false, false});
             RowView a = batch.row(0);
@@ -1006,7 +1029,7 @@ public void testRowViewSingleSharedInstance() throws Exception {
 
     @Test
     public void testRowViewStrAStrBDualHold() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             setupVarcharColumnLayout(batch, 0, "s",
                     new String[]{"alpha", "beta"},
@@ -1023,7 +1046,7 @@ public void testRowViewStrAStrBDualHold() throws Exception {
 
     @Test
     public void testRowViewStringAccessors() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 3);
             setupVarcharColumnLayout(batch, 0, "s",
                     new String[]{"alpha", null, "gamma"},
@@ -1046,7 +1069,7 @@ public void testRowViewStringAccessors() throws Exception {
 
     @Test
     public void testRowViewSymbolAccessors() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 4);
             String[] dict = {"AAPL", "MSFT"};
             int[] rowIds = {0, 1, 0, -1};
@@ -1066,7 +1089,7 @@ public void testRowViewSymbolAccessors() throws Exception {
 
     @Test
     public void testRowViewUuidLoHi() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             long[] lo = {0xCAFE_BABEL, 0L};
             long[] hi = {0xDEAD_BEEFL, 0L};
@@ -1080,7 +1103,7 @@ public void testRowViewUuidLoHi() throws Exception {
 
     @Test
     public void testRowViewUuidWithSink() throws Exception {
-        TestUtils.assertMemoryLeak(() -> {
+        assertMemoryLeak(() -> {
             QwpColumnBatch batch = newBatch(1, 2);
             long[] lo = {0xAAAAL, 0L};
             long[] hi = {0xBBBBL, 0L};
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpConnectWalkBackgroundIsolationTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpConnectWalkBackgroundIsolationTest.java
new file mode 100644
index 00000000..51ef603a
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpConnectWalkBackgroundIsolationTest.java
@@ -0,0 +1,224 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client;
+
+import io.questdb.client.DefaultHttpClientConfiguration;
+import io.questdb.client.cutlass.http.client.WebSocketClient;
+import io.questdb.client.cutlass.line.LineSenderException;
+import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop;
+import io.questdb.client.network.PlainSocketFactory;
+import io.questdb.client.test.tools.TestUtils;
+import org.junit.Test;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicReference;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+/**
+ * Coverage of the connect-walk concurrency policy (M11): no network I/O
+ * runs under a sender-wide lock for background work. A FOREGROUND walk
+ * holds the connect-walk lock across its sweep (it owns the shared round
+ * state and the lifecycle commits); BACKGROUND (drainer) walks take no
+ * lock at all — each sweeps a private {@code QwpHostHealthTracker
+ * .RoundCursor} and records health-only results — so a drainer sweep
+ * proceeds CONCURRENTLY with a foreground walk that is parked inside a
+ * blocking connect, and the foreground's reconnect and {@code close()}
+ * paths can never queue behind (or be queued behind by) a drainer's
+ * endpoint walk.
+ * <p>
+ * The proof shape: pin a foreground walk inside {@code connect()} (lock
+ * held, I/O in flight), then run TWO full background sweeps to completion
+ * while the foreground is still parked. Under the old walk-wide lock
+ * (monitor or tryLock-yield) both background calls would have blocked or
+ * yielded; lock-free they must reach the client factory and fail with the
+ * ordinary end-of-round error.
+ */
+public class QwpConnectWalkBackgroundIsolationTest {
+
+    /** Tracks every stub for defensive close (close() is idempotent). */
+    private static final List<StubClient> LIVE_STUBS =
+            Collections.synchronizedList(new ArrayList<>());
+
+    @Test
+    public void testBackgroundSweepRunsConcurrentlyWithParkedForegroundWalk() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            try (QwpWebSocketSender sender = QwpWebSocketSender.createForTesting("localhost", 19999)) {
+                final CountDownLatch foregroundInConnect = new CountDownLatch(1);
+                final CountDownLatch releaseForeground = new CountDownLatch(1);
+                final AtomicInteger factoryCalls = new AtomicInteger();
+                sender.setClientFactoryOverride(() -> {
+                    int call = factoryCalls.incrementAndGet();
+                    StubClient stub = new StubClient(
+                            call == 1 ? foregroundInConnect : null,
+                            call == 1 ? releaseForeground : null);
+                    LIVE_STUBS.add(stub);
+                    return stub;
+                });
+
+                // Foreground walk on a helper thread: its stub connect()
+                // parks on releaseForeground, so the walk holds the
+                // connect-walk lock with I/O "in flight" for as long as
+                // this test wants.
+                final CursorWebSocketSendLoop.ReconnectFactory foreground =
+                        sender.newReconnectFactory();
+                final AtomicReference<Throwable> foregroundError = new AtomicReference<>();
+                Thread fg = new Thread(() -> {
+                    try {
+                        foreground.reconnect();
+                    } catch (Throwable e) {
+                        foregroundError.set(e);
+                    }
+                }, "test-foreground-walk");
+                fg.setDaemon(true);
+                fg.start();
+                try {
+                    assertTrue("foreground walk must reach its (blocking) connect attempt",
+                            foregroundInConnect.await(5, TimeUnit.SECONDS));
+
+                    // TWO background sweeps run to completion while the
+                    // foreground is parked mid-connect. Each must reach the
+                    // client factory (lock-free walk, no yield, no blocking)
+                    // and fail with the ordinary end-of-round error. Two
+                    // sweeps prove per-walk cursor independence: the second
+                    // sweep gets its own full walk, not the first's
+                    // exhausted cursor.
+                    for (int sweep = 1; sweep <= 2; sweep++) {
+                        final CursorWebSocketSendLoop.ReconnectFactory background =
+                                sender.newBackgroundReconnectFactory(() -> false);
+                        try {
+                            background.reconnect();
+                            fail("stub connect always throws; background sweep " + sweep
+                                    + " must fail its round");
+                        } catch (Exception e) {
+                            assertTrue("background sweep " + sweep + " must fail with the "
+                                            + "ordinary end-of-round error, not a lock artifact "
+                                            + "(got: " + e.getMessage() + ")",
+                                    e instanceof LineSenderException
+                                            && String.valueOf(e.getMessage())
+                                            .contains("Failed to connect"));
+                        }
+                        assertEquals("background sweep " + sweep + " must have reached the "
+                                        + "client factory while the foreground is parked",
+                                1 + sweep, factoryCalls.get());
+                        assertTrue("foreground must still be parked in connect (background "
+                                        + "sweeps must not disturb it)",
+                                fg.isAlive());
+                        assertNull("foreground walk must not have failed while background "
+                                        + "sweeps ran",
+                                foregroundError.get());
+                    }
+                } finally {
+                    releaseForeground.countDown();
+                }
+                fg.join(5_000);
+                assertFalse("foreground walk thread must exit once released", fg.isAlive());
+
+                // The foreground's own outcome is unaffected by the two
+                // background sweeps that ran under it: the ordinary
+                // end-of-round failure for its single-endpoint round.
+                Throwable fgErr = foregroundError.get();
+                assertNotNull("foreground walk fails its (single-endpoint) round once the "
+                        + "stub connect throws", fgErr);
+                assertTrue("foreground failure is the ordinary end-of-round connect error "
+                                + "(got: " + fgErr.getMessage() + ")",
+                        fgErr instanceof LineSenderException
+                                && String.valueOf(fgErr.getMessage()).contains("Failed to connect"));
+            } finally {
+                closeAllStubs();
+            }
+        });
+    }
+
+    private static void closeAllStubs() {
+        synchronized (LIVE_STUBS) {
+            for (StubClient c : LIVE_STUBS) {
+                try {
+                    c.close();
+                } catch (Throwable ignored) {
+                    // best-effort; close() is idempotent
+                }
+            }
+            LIVE_STUBS.clear();
+        }
+    }
+
+    /**
+     * Real-constructor stub (native buffers allocated and freed by the base
+     * class; the walk closes failed-attempt clients itself). {@code connect}
+     * optionally parks on a latch to pin the walk — and, on the foreground
+     * path, the connect-walk lock — then always throws, so no walk ever
+     * "succeeds" and reaches upgrade or lifecycle commits.
+     */
+    private static final class StubClient extends WebSocketClient {
+        private final CountDownLatch entered;
+        private final CountDownLatch release;
+
+        StubClient(CountDownLatch entered, CountDownLatch release) {
+            super(DefaultHttpClientConfiguration.INSTANCE, PlainSocketFactory.INSTANCE);
+            this.entered = entered;
+            this.release = release;
+        }
+
+        @Override
+        public void connect(CharSequence host, int port) {
+            if (entered != null) {
+                entered.countDown();
+            }
+            if (release != null) {
+                try {
+                    if (!release.await(10, TimeUnit.SECONDS)) {
+                        throw new RuntimeException("stub connect never released");
+                    }
+                } catch (InterruptedException e) {
+                    Thread.currentThread().interrupt();
+                    throw new RuntimeException("stub connect interrupted", e);
+                }
+            }
+            throw new RuntimeException("stub: connection refused");
+        }
+
+        @Override
+        protected void ioWait(int timeout, int op) {
+            throw new UnsupportedOperationException("stub: no socket");
+        }
+
+        @Override
+        protected void setupIoWait() {
+            // no-op
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpHostHealthTrackerTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpHostHealthTrackerTest.java
index 2ae217c4..ffd3a4b9 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpHostHealthTrackerTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpHostHealthTrackerTest.java
@@ -319,4 +319,98 @@ public void testZone_ZoneIdComparisonIsCaseInsensitive() {
         Assert.assertEquals(QwpHostHealthTracker.ZoneTier.SAME, t.getZoneTier(0));
         Assert.assertEquals(QwpHostHealthTracker.ZoneTier.OTHER, t.getZoneTier(1));
     }
+
+    @Test
+    public void testRoundCursor_FullSweepInLivePriorityOrderThenExhausted() {
+        QwpHostHealthTracker t = new QwpHostHealthTracker(3);
+        t.recordSuccess(2);          // HEALTHY -> first
+        t.recordTransportError(0);   // TRANSPORT_ERROR -> last
+        // host 1 stays UNKNOWN -> middle
+        QwpHostHealthTracker.RoundCursor c = t.newRoundCursor();
+        Assert.assertEquals(2, c.next());
+        Assert.assertEquals(1, c.next());
+        Assert.assertEquals(0, c.next());
+        Assert.assertEquals("cursor must be exhausted after a full sweep", -1, c.next());
+        Assert.assertEquals("cursor exhaustion is sticky", -1, c.next());
+    }
+
+    @Test
+    public void testRoundCursor_ReRanksRemainingHostsOnLiveStateChange() {
+        QwpHostHealthTracker t = new QwpHostHealthTracker(3);
+        QwpHostHealthTracker.RoundCursor c = t.newRoundCursor();
+        Assert.assertEquals(0, c.next()); // all UNKNOWN -> idx order
+        // Another walker observes host 2 healthy mid-sweep: it must now
+        // outrank the still-UNKNOWN host 1 for THIS cursor's next pick.
+        t.recordSuccess(2, false);
+        Assert.assertEquals(2, c.next());
+        Assert.assertEquals(1, c.next());
+        Assert.assertEquals(-1, c.next());
+    }
+
+    @Test
+    public void testRoundCursor_DoesNotConsumeOrDependOnSharedRound() {
+        QwpHostHealthTracker t = new QwpHostHealthTracker(2);
+        // Exhaust the SHARED round completely...
+        t.recordTransportError(0);
+        t.recordTransportError(1);
+        Assert.assertTrue(t.isRoundExhausted());
+        Assert.assertEquals(-1, t.pickNext());
+        // ...a fresh cursor still gets a FULL sweep (its attempted set is
+        // private), ordered by the live states.
+        QwpHostHealthTracker.RoundCursor c = t.newRoundCursor();
+        Assert.assertEquals(0, c.next());
+        Assert.assertEquals(1, c.next());
+        Assert.assertEquals(-1, c.next());
+        // ...and the cursor's sweep left the shared round untouched.
+        Assert.assertTrue(t.isRoundExhausted());
+        Assert.assertEquals(-1, t.pickNext());
+    }
+
+    @Test
+    public void testRoundCursors_AreIndependentNoEndpointStealing() {
+        QwpHostHealthTracker t = new QwpHostHealthTracker(2);
+        QwpHostHealthTracker.RoundCursor a = t.newRoundCursor();
+        QwpHostHealthTracker.RoundCursor b = t.newRoundCursor();
+        // Interleaved: each cursor must sweep EVERY host exactly once;
+        // a's claims must not consume b's sweep or vice versa.
+        Assert.assertEquals(0, a.next());
+        Assert.assertEquals(0, b.next());
+        Assert.assertEquals(1, a.next());
+        Assert.assertEquals(1, b.next());
+        Assert.assertEquals(-1, a.next());
+        Assert.assertEquals(-1, b.next());
+    }
+
+    @Test
+    public void testHealthOnlyRecords_UpdateStateButNeverTheSharedRoundBit() {
+        QwpHostHealthTracker t = new QwpHostHealthTracker(1);
+        // Health-only variants (markRoundAttempted=false): state flips...
+        t.recordTransportError(0, false);
+        Assert.assertEquals(QwpHostHealthTracker.HostState.TRANSPORT_ERROR, t.getState(0));
+        t.recordRoleReject(0, true, false);
+        Assert.assertEquals(QwpHostHealthTracker.HostState.TRANSIENT_REJECT, t.getState(0));
+        t.recordSuccess(0, false);
+        Assert.assertEquals(QwpHostHealthTracker.HostState.HEALTHY, t.getState(0));
+        // ...but the shared round never sees an attempt: the host is still
+        // pickable and the round is not exhausted. This is what keeps a
+        // background drainer's sweep invisible to the foreground's round.
+        Assert.assertFalse(t.isRoundExhausted());
+        Assert.assertEquals(0, t.pickNext());
+    }
+
+    @Test
+    public void testHealthOnlySuccess_StillFeedsStickyHealthyRecency() {
+        QwpHostHealthTracker t = new QwpHostHealthTracker(2);
+        // Foreground succeeded on 0 (round-marking), a background walker
+        // later succeeded on 1 (health-only): the background success is the
+        // most RECENT and must win the sticky-Healthy pin across
+        // beginRound(true).
+        t.recordSuccess(0);
+        t.recordSuccess(1, false);
+        t.beginRound(true);
+        Assert.assertEquals("most recent success (health-only or not) is sticky",
+                QwpHostHealthTracker.HostState.HEALTHY, t.getState(1));
+        Assert.assertEquals(QwpHostHealthTracker.HostState.UNKNOWN, t.getState(0));
+        Assert.assertEquals(1, t.pickNext());
+    }
 }
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientConnectTimeoutTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientConnectTimeoutTest.java
new file mode 100644
index 00000000..e0435b72
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientConnectTimeoutTest.java
@@ -0,0 +1,88 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client;
+
+import io.questdb.client.cutlass.http.client.HttpClientException;
+import io.questdb.client.cutlass.qwp.client.QwpQueryClient;
+import org.junit.Assert;
+import org.junit.Assume;
+import org.junit.Test;
+
+public class QwpQueryClientConnectTimeoutTest {
+
+    /**
+     * A connect-phase timeout must be reported as a connect_timeout failure, not
+     * relabeled as an "exceeded auth_timeout" overage.
+     * <p>
+     * {@code QwpQueryClient.runUpgradeWithTimeout} used to wrap the {@code connect()}
+     * and {@code upgrade()} calls in one try block, so the timeout-flagged exception
+     * thrown by the (in-diff) connect_timeout path was caught by the {@code isTimeout()}
+     * branch intended for upgrade() and rewritten with the (much larger, and wrong)
+     * auth_timeout value -- e.g. a connect that bailed after 500 ms reported
+     * "exceeded auth_timeout=15000ms". The ingest side never had this because it
+     * routes through {@code QwpUpgradeFailures.classify}, which leaves the
+     * connect-timeout exception unmodified.
+     */
+    @Test(timeout = 30_000)
+    public void testConnectTimeoutNotReportedAsAuthTimeout() {
+        // 192.0.2.0/24 is TEST-NET-1 (RFC 5737): on a normal network the SYN is
+        // silently dropped, so the TCP connect stalls and our application-level
+        // connect_timeout (500 ms) fires -- long before auth_timeout_ms (15000 ms).
+        // The WebSocket upgrade phase is never reached.
+        try (QwpQueryClient client = QwpQueryClient.fromConfig(
+                "ws::addr=192.0.2.1:9009;connect_timeout=500;auth_timeout_ms=15000;failover=off;target=any;")) {
+            long start = System.currentTimeMillis();
+            try {
+                client.connect();
+                Assert.fail("expected connect to fail");
+            } catch (HttpClientException ex) {
+                long elapsed = System.currentTimeMillis() - start;
+                String msg = ex.getMessage();
+
+                // The connect_timeout path is only exercised when the runner routes
+                // TEST-NET-1 into a black hole (dropped SYN). Skip -- rather than
+                // flake -- on the other two outcomes:
+                //  - no route: a fast ENETUNREACH surfaces as "could not connect".
+                //  - (rare) the host accepts the connect: the upgrade then runs the
+                //    full auth_timeout, so elapsed ~ auth_timeout (>5 s).
+                // Neither gate keys on the connect-vs-auth label, so neither can mask
+                // the regression: a black-holed connect always bails at ~500 ms with
+                // a message that is "connect timed out" (fixed) or "...auth_timeout..."
+                // (the bug) -- both reach the assertions below.
+                Assume.assumeFalse("no route to TEST-NET-1 black hole on this runner: " + msg,
+                        msg.contains("could not connect"));
+                Assume.assumeTrue("TEST-NET-1 is not a black hole on this runner (elapsed=" + elapsed + "ms): " + msg,
+                        elapsed < 5_000);
+
+                // It bailed at connect_timeout=500 ms, nowhere near auth_timeout=15000 ms.
+                // Regression: name the connect phase, never auth_timeout.
+                Assert.assertFalse("connect-phase timeout misreported as auth_timeout: " + msg,
+                        msg.contains("auth_timeout"));
+                Assert.assertTrue("expected a connect-timeout diagnostic, got: " + msg,
+                        msg.contains("connect timed out"));
+            }
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientWalkTrackerTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientWalkTrackerTest.java
index ee5909ce..4a254402 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientWalkTrackerTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpQueryClientWalkTrackerTest.java
@@ -170,10 +170,14 @@ public void testWalk_AllUnreachableThrowsHttpClientException() {
         // The exception type is HttpClientException (transport-only
         // failure mode) -- distinct from QwpRoleMismatchException which
         // would falsely suggest a topology issue.
-        int port1 = TestPorts.findUnusedPort();
-        int port2 = TestPorts.findUnusedPort();
+        // findUnusedPorts (plural) holds both probe sockets open at once so
+        // the two ports are guaranteed distinct — two separate
+        // findUnusedPort() calls can return the SAME port (bind-close-return
+        // lets the kernel recycle it immediately), which fails the config's
+        // duplicate-addr validation before the walk under test even runs.
+        int[] ports = TestPorts.findUnusedPorts(2);
         try (QwpQueryClient client = QwpQueryClient.fromConfig(
-                "ws::addr=localhost:" + port1 + ",localhost:" + port2 + ";auth_timeout_ms=300;")) {
+                "ws::addr=localhost:" + ports[0] + ",localhost:" + ports[1] + ";auth_timeout_ms=300;")) {
             try {
                 client.connect();
                 Assert.fail("expected HttpClientException on unreachable hosts");
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpRoleRejectBackoffGrowthTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpRoleRejectBackoffGrowthTest.java
new file mode 100644
index 00000000..ff1b858f
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpRoleRejectBackoffGrowthTest.java
@@ -0,0 +1,189 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client;
+
+import io.questdb.client.Sender;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.net.InetAddress;
+import java.net.ServerSocket;
+import java.net.Socket;
+import java.nio.charset.StandardCharsets;
+import java.util.concurrent.CopyOnWriteArrayList;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+/**
+ * Regression guard for the foreground role-reject retry storm.
+ *
+ * <p>When every reachable endpoint role-rejects the {@code /write/v4} upgrade
+ * (a genuine all-replica failover window, or a misconfigured address list that
+ * points at replicas only), the cursor I/O loop MUST retry with the same
+ * capped exponential backoff-with-jitter every other reconnect branch uses --
+ * NOT pin at {@code reconnect_initial_backoff_millis} forever. Pinning turned
+ * this into a fixed ~10/s storm of fresh TLS handshakes (new
+ * {@code WebSocketClient} + new {@code SSLContext} + trust-store re-read) per
+ * endpoint, in breach of the documented capped-exponential-backoff contract and
+ * asymmetric with the orphan drainer, which already grows to
+ * {@code reconnect_max_backoff_millis}.
+ *
+ * <p>The server here is plaintext loopback, so a role-reject upgrade completes
+ * in well under a millisecond and the wall-clock gap between successive attempts
+ * is dominated by the backoff park. Under the old fixed-interval bug every gap
+ * stayed {@code ~= reconnect_initial_backoff_millis}; under capped exponential
+ * backoff a later gap climbs many multiples past it.
+ */
+public class QwpRoleRejectBackoffGrowthTest {
+
+    @Test(timeout = 30_000)
+    public void testRoleRejectRetryUsesCappedExponentialBackoff() throws Exception {
+        try (RoleRejectServer server = new RoleRejectServer()) {
+            server.start();
+
+            final long initialBackoffMillis = 50;
+            String cfg = "ws::addr=127.0.0.1:" + server.port()
+                    + ";reconnect_initial_backoff_millis=" + initialBackoffMillis
+                    + ";reconnect_max_backoff_millis=4000"
+                    + ";auth_timeout_ms=2000"
+                    + ";auto_flush_rows=1"
+                    + ";close_flush_timeout_millis=0"
+                    + ";initial_connect_retry=async;";
+
+            try (Sender sender = Sender.fromConfig(cfg)) {
+                // Kick the I/O thread into the connect/role-reject loop.
+                sender.table("t").longColumn("v", 1L).atNow();
+                // Wait for enough attempts to observe several backoff doublings:
+                // the parked gaps run ~50, ~100, ~200, ~400, ~800, ~1600 ms
+                // (+jitter). Seven attempts give six gaps up to the ~1600 ms step.
+                waitFor(() -> server.attemptNanos.size() >= 7, 25_000);
+            }
+
+            Long[] ts = server.attemptNanos.toArray(new Long[0]);
+            Assert.assertTrue("expected at least 7 upgrade attempts, got " + ts.length, ts.length >= 7);
+
+            long firstGapMs = (ts[1] - ts[0]) / 1_000_000L;
+            long maxGapMs = 0;
+            StringBuilder gaps = new StringBuilder();
+            for (int i = 1; i < ts.length; i++) {
+                long gapMs = (ts[i] - ts[i - 1]) / 1_000_000L;
+                gaps.append(gapMs).append(i < ts.length - 1 ? "," : "");
+                if (gapMs > maxGapMs) {
+                    maxGapMs = gapMs;
+                }
+            }
+
+            // Under the fixed-interval bug every gap stayed ~= 50 ms (no jitter,
+            // no growth) over a sub-millisecond plaintext handshake, so maxGap
+            // could never climb past ~60 ms. Capped exponential backoff drives a
+            // later gap to 400 ms+ by the fourth doubling. Require maxGap to reach
+            // at least 4x the initial interval: unreachable under the old
+            // behaviour, comfortably cleared under the new one.
+            Assert.assertTrue(
+                    "role-reject backoff did not grow (fixed-interval storm): gaps=[" + gaps
+                            + "]ms maxGap=" + maxGapMs + "ms firstGap=" + firstGapMs
+                            + "ms initial=" + initialBackoffMillis + "ms",
+                    maxGapMs >= initialBackoffMillis * 4);
+            // And a later gap must dwarf the first, proving genuine growth rather
+            // than a single anomalous park.
+            Assert.assertTrue(
+                    "role-reject gaps are flat, not exponential: gaps=[" + gaps
+                            + "]ms maxGap=" + maxGapMs + "ms firstGap=" + firstGapMs + "ms",
+                    maxGapMs >= firstGapMs * 3);
+        }
+    }
+
+    private static void waitFor(java.util.function.BooleanSupplier cond, long timeoutMs) throws InterruptedException {
+        long deadline = System.currentTimeMillis() + timeoutMs;
+        while (System.currentTimeMillis() < deadline) {
+            if (cond.getAsBoolean()) {
+                return;
+            }
+            Thread.sleep(20);
+        }
+    }
+
+    private static final class RoleRejectServer implements AutoCloseable {
+        final CopyOnWriteArrayList<Long> attemptNanos = new CopyOnWriteArrayList<>();
+        private final ServerSocket socket;
+        private final AtomicBoolean running = new AtomicBoolean(true);
+
+        RoleRejectServer() throws IOException {
+            this.socket = new ServerSocket(0, 50, InetAddress.getLoopbackAddress());
+        }
+
+        int port() {
+            return socket.getLocalPort();
+        }
+
+        void start() {
+            Thread t = new Thread(this::loop, "role-reject-backoff-server");
+            t.setDaemon(true);
+            t.start();
+        }
+
+        @Override
+        public void close() throws IOException {
+            running.set(false);
+            socket.close();
+        }
+
+        private void loop() {
+            while (running.get()) {
+                try {
+                    Socket s = socket.accept();
+                    Thread h = new Thread(() -> handle(s), "role-reject-backoff-handler");
+                    h.setDaemon(true);
+                    h.start();
+                } catch (IOException e) {
+                    if (!running.get()) {
+                        return;
+                    }
+                }
+            }
+        }
+
+        private void handle(Socket s) {
+            try (Socket sock = s) {
+                byte[] discard = new byte[8192];
+                int n = sock.getInputStream().read(discard);
+                if (n < 0) {
+                    return;
+                }
+                // Record the attempt only once we have actually read the upgrade
+                // request, so the timestamp reflects a real handshake attempt.
+                attemptNanos.add(System.nanoTime());
+                String resp = "HTTP/1.1 421 Misdirected Request\r\n"
+                        + "X-QuestDB-Role: REPLICA\r\n"
+                        + "Content-Length: 0\r\nConnection: close\r\n\r\n";
+                OutputStream out = sock.getOutputStream();
+                out.write(resp.getBytes(StandardCharsets.US_ASCII));
+                out.flush();
+            } catch (Exception ignored) {
+            }
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpWebSocketSenderJvmErrorCleanupTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpWebSocketSenderJvmErrorCleanupTest.java
new file mode 100644
index 00000000..4a507e99
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpWebSocketSenderJvmErrorCleanupTest.java
@@ -0,0 +1,277 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client;
+
+import io.questdb.client.cutlass.http.client.WebSocketClient;
+import io.questdb.client.cutlass.line.LineSenderException;
+import io.questdb.client.cutlass.qwp.client.QwpHostHealthTracker;
+import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender;
+import io.questdb.client.std.Unsafe;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.lang.reflect.Constructor;
+import java.lang.reflect.Field;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.function.Supplier;
+
+/**
+ * Regression coverage (M10): {@code buildAndConnect}'s connect/upgrade try
+ * used to catch only {@code HttpClientException} and {@code Exception}, so a
+ * JVM {@link java.lang.Error} (OOM, LinkageError, StackOverflowError) thrown
+ * mid-connect escaped with the half-built {@code WebSocketClient} open -- fd
+ * plus native buffers, unreachable by GC, freed only in {@code close()}. The
+ * fix adds a {@code catch (Error)} arm that closes the client quietly (a
+ * close failure under memory pressure must not mask the original Error) and
+ * rethrows without recording endpoint-health penalties: a JVM failure is not
+ * endpoint health data.
+ * <p>
+ * Uses the same bare-instance pattern as
+ * {@code CursorWebSocketSendLoopJvmErrorTest}: {@code Unsafe.allocateInstance}
+ * plus reflective wiring of the fields the connect walk dereferences, with the
+ * {@code clientFactoryOverride} test seam substituting a stub client whose
+ * {@code connect()} throws.
+ */
+public class QwpWebSocketSenderJvmErrorCleanupTest {
+
+    @Test
+    public void testErrorDuringConnectClosesClientAndStopsWalk() throws Exception {
+        // Two endpoints: an Error on the FIRST connect attempt must close that
+        // attempt's client and propagate immediately -- no walk to endpoint 2,
+        // no health penalty. Contrast with the Exception path (below) which
+        // closes, records a transport error and keeps walking.
+        QwpWebSocketSender sender = newBareSender();
+        QwpHostHealthTracker tracker = wireEndpoints(sender, 2);
+        List<StubClient> built = new ArrayList<>();
+        OutOfMemoryError oom = new OutOfMemoryError("simulated allocation failure");
+        installFactory(sender, () -> {
+            StubClient c = newStubClient();
+            c.connectError = oom;
+            built.add(c);
+            return c;
+        });
+
+        try {
+            invokeBuildAndConnect(sender);
+            Assert.fail("a JVM Error must propagate out of buildAndConnect");
+        } catch (InvocationTargetException ite) {
+            Assert.assertSame("the original Error must surface", oom, ite.getCause());
+        }
+        Assert.assertEquals("Error must stop the walk on the first attempt", 1, built.size());
+        Assert.assertEquals("half-built client must be closed exactly once",
+                1, built.get(0).closeCalls);
+        Assert.assertEquals("a JVM failure is not endpoint health data",
+                QwpHostHealthTracker.HostState.UNKNOWN, tracker.getState(0));
+        Assert.assertEquals("unattempted endpoint must stay untouched",
+                QwpHostHealthTracker.HostState.UNKNOWN, tracker.getState(1));
+    }
+
+    @Test
+    public void testCloseFailureDoesNotMaskOriginalError() throws Exception {
+        // Under OOM, close() itself can throw. The cleanup must be
+        // best-effort: the ORIGINAL Error surfaces, not the close failure.
+        QwpWebSocketSender sender = newBareSender();
+        wireEndpoints(sender, 1);
+        OutOfMemoryError oom = new OutOfMemoryError("simulated allocation failure");
+        StubClient stub = newStubClient();
+        stub.connectError = oom;
+        stub.throwOnClose = true;
+        installFactory(sender, () -> stub);
+
+        try {
+            invokeBuildAndConnect(sender);
+            Assert.fail("a JVM Error must propagate out of buildAndConnect");
+        } catch (InvocationTargetException ite) {
+            Assert.assertSame("close() failure must not mask the original Error",
+                    oom, ite.getCause());
+        }
+        Assert.assertEquals("close must have been attempted", 1, stub.closeCalls);
+    }
+
+    @Test
+    public void testExceptionPathStillClosesAndWalksAllEndpoints() throws Exception {
+        // Seam sanity + behavioral contrast: a plain RuntimeException stays on
+        // the existing path -- close, record a transport penalty, walk the
+        // next endpoint, and surface LineSenderException once the round is
+        // exhausted.
+        QwpWebSocketSender sender = newBareSender();
+        QwpHostHealthTracker tracker = wireEndpoints(sender, 2);
+        List<StubClient> built = new ArrayList<>();
+        installFactory(sender, () -> {
+            StubClient c = newStubClient();
+            c.connectRuntimeError = new IllegalStateException("simulated transport failure");
+            built.add(c);
+            return c;
+        });
+
+        try {
+            invokeBuildAndConnect(sender);
+            Assert.fail("an exhausted round must surface LineSenderException");
+        } catch (InvocationTargetException ite) {
+            Assert.assertTrue("expected LineSenderException, got " + ite.getCause(),
+                    ite.getCause() instanceof LineSenderException);
+        }
+        Assert.assertEquals("an Exception must keep the walk going", 2, built.size());
+        for (StubClient c : built) {
+            Assert.assertEquals("every attempt's client must be closed", 1, c.closeCalls);
+        }
+        Assert.assertEquals("Exception path records the transport penalty",
+                QwpHostHealthTracker.HostState.TRANSPORT_ERROR, tracker.getState(0));
+        Assert.assertEquals("Exception path records the transport penalty",
+                QwpHostHealthTracker.HostState.TRANSPORT_ERROR, tracker.getState(1));
+    }
+
+    /**
+     * Bypasses the real constructor -- no wire client, engine or dispatcher
+     * needed. The connect walk dereferences only the fields wired below plus
+     * primitives whose zero-defaults are valid here (field initializers do
+     * not run under {@code Unsafe.allocateInstance}), plus the connect-walk
+     * lock, which buildAndConnect acquires unconditionally and is therefore
+     * wired here.
+     */
+    private static QwpWebSocketSender newBareSender() throws Exception {
+        QwpWebSocketSender sender = (QwpWebSocketSender) Unsafe.getUnsafe()
+                .allocateInstance(QwpWebSocketSender.class);
+        setField(sender, "connectWalkLock", new java.util.concurrent.locks.ReentrantLock());
+        return sender;
+    }
+
+    private static QwpHostHealthTracker wireEndpoints(QwpWebSocketSender sender,
+                                                      int count) throws Exception {
+        QwpWebSocketSender.Endpoint[] eps = new QwpWebSocketSender.Endpoint[count];
+        for (int i = 0; i < count; i++) {
+            eps[i] = new QwpWebSocketSender.Endpoint("localhost", 9000 + i);
+        }
+        setField(sender, "endpoints", Arrays.asList(eps));
+        QwpHostHealthTracker tracker = new QwpHostHealthTracker(count);
+        setField(sender, "hostTracker", tracker);
+        return tracker;
+    }
+
+    private static void installFactory(QwpWebSocketSender sender,
+                                       Supplier<WebSocketClient> factory) throws Exception {
+        setField(sender, "clientFactoryOverride", factory);
+    }
+
+    /**
+     * Drives the private connect walk through its private foreground
+     * {@code ReconnectSupplier} (no-arg: abortCheck null means foreground;
+     * the bare sender's null {@code cursorSendLoop} and false {@code closed}
+     * make {@code isAborted()} false).
+     */
+    private static void invokeBuildAndConnect(QwpWebSocketSender sender) throws Exception {
+        Class<?> supplierClass = Class.forName(
+                "io.questdb.client.cutlass.qwp.client.QwpWebSocketSender$ReconnectSupplier");
+        Constructor<?> ctor = supplierClass.getDeclaredConstructor(QwpWebSocketSender.class);
+        ctor.setAccessible(true);
+        Object ctx = ctor.newInstance(sender);
+        Method m = QwpWebSocketSender.class.getDeclaredMethod("buildAndConnect", supplierClass);
+        m.setAccessible(true);
+        m.invoke(sender, ctx);
+    }
+
+    private static StubClient newStubClient() {
+        try {
+            return (StubClient) Unsafe.getUnsafe().allocateInstance(StubClient.class);
+        } catch (InstantiationException e) {
+            throw new AssertionError(e);
+        }
+    }
+
+    private static void setField(Object target, String name, Object value) throws Exception {
+        Field f = QwpWebSocketSender.class.getDeclaredField(name);
+        f.setAccessible(true);
+        f.set(target, value);
+    }
+
+    /**
+     * Minimal stub: every method the connect walk touches is overridden so no
+     * base-class state (native buffers, socket) is ever dereferenced --
+     * instances come from {@code Unsafe.allocateInstance}, so the base
+     * constructor never ran. Fields rely on zero-defaults; tests assign them
+     * post-allocation.
+     */
+    private static final class StubClient extends WebSocketClient {
+        int closeCalls;
+        Error connectError;
+        RuntimeException connectRuntimeError;
+        boolean throwOnClose;
+
+        private StubClient() {
+            // Never invoked -- instances come from Unsafe.allocateInstance.
+            super(null, null);
+        }
+
+        @Override
+        public void close() {
+            closeCalls++;
+            if (throwOnClose) {
+                throw new IllegalStateException("simulated close failure under memory pressure");
+            }
+        }
+
+        @Override
+        public void connect(CharSequence host, int port) {
+            if (connectError != null) {
+                throw connectError;
+            }
+            if (connectRuntimeError != null) {
+                throw connectRuntimeError;
+            }
+        }
+
+        @Override
+        public void setConnectTimeout(int connectTimeoutMillis) {
+        }
+
+        @Override
+        public void setQwpClientId(String clientId) {
+        }
+
+        @Override
+        public void setQwpMaxVersion(int maxVersion) {
+        }
+
+        @Override
+        public void setQwpRequestDurableAck(boolean enabled) {
+        }
+
+        @Override
+        public void upgrade(CharSequence path, int timeout, CharSequence authorizationHeader) {
+        }
+
+        @Override
+        protected void ioWait(int timeout, int op) {
+        }
+
+        @Override
+        protected void setupIoWait() {
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/ReconnectTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/ReconnectTest.java
index 5c0f9bd2..bd619d14 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/ReconnectTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/ReconnectTest.java
@@ -101,33 +101,37 @@ public void testReconnectAfterServerInducedDisconnect() throws Exception {
     }
 
     @Test
-    public void testReconnectGivesUpAfterCap() throws Exception {
-        // Server is up at first (initial connect succeeds + ACKs batch 1),
-        // then we tear it down — subsequent reconnect attempts get TCP
-        // connection-refused and accumulate against the budget. With a
-        // 500ms cap, the loop should give up well inside the test's 5s
-        // poll window and the next user-thread flush() must throw.
+    public void testReconnectNeverGivesUpInvariantB() throws Exception {
+        // INVARIANT B: server is up at first (initial connect + ACK), then torn
+        // down. The I/O loop enters reconnect and must retry FOREVER -- flush()
+        // must keep succeeding (publishing to on-disk SF), never surface a
+        // give-up / budget terminal. The rows are safe in SF and the server may
+        // return, so reconnect_max_duration_millis is ignored as a give-up
+        // deadline.
         try (TestWebSocketServer server = new TestWebSocketServer(new AckHandler())) {
             int port = server.getPort();
             server.start();
             Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
 
             String cfg = "ws::addr=localhost:" + port
-                    + ";reconnect_max_duration_millis=500"
+                    + ";reconnect_max_duration_millis=300"
                     + ";reconnect_initial_backoff_millis=10"
                     + ";reconnect_max_backoff_millis=50"
                     + ";close_flush_timeout_millis=0;";
-            try (Sender sender = Sender.fromConfig(cfg)) {
+            Throwable observed = null;
+            // fromConfig/first-flush/setup failures must fail the test --
+            // only close() teardown noise is tolerated in the finally below.
+            Sender sender = Sender.fromConfig(cfg);
+            try {
                 sender.table("foo").longColumn("v", 1L).atNow();
                 sender.flush();
 
-                // Tear down the server: existing client connection gets
-                // EOF, the I/O loop tries to reconnect, every attempt
-                // hits TCP refused → budget exhausts.
+                // Tear down the server: the I/O loop gets EOF and enters
+                // reconnect; every attempt hits TCP refused but must keep
+                // retrying past the (ignored) 300ms budget.
                 server.close();
 
-                Throwable observed = null;
-                long deadline = System.currentTimeMillis() + 5_000;
+                long deadline = System.currentTimeMillis() + 2_000;
                 long iter = 0;
                 while (System.currentTimeMillis() < deadline) {
                     iter++;
@@ -140,21 +144,18 @@ public void testReconnectGivesUpAfterCap() throws Exception {
                     }
                     Thread.sleep(50);
                 }
-                Assert.assertNotNull(
-                        "sender should have surfaced the terminal reconnect-cap error",
-                        observed);
-                String msg = observed.getMessage() == null ? "" : observed.getMessage();
-                Assert.assertTrue(
-                        "error message must mention the give-up: " + msg,
-                        msg.contains("reconnect failed")
-                                || msg.contains("I/O thread failed")
-                                || msg.contains("Failed to connect"));
-            } catch (LineSenderException ignored) {
+            } finally {
+                try {
+                    sender.close();
+                } catch (Exception ignored) {
+                    // close() teardown noise -- the contract under test is the
+                    // flush loop above, captured in `observed`.
+                }
             }
-            // close() rethrows the latched terminal reconnect-cap error
-            // (commit 052f6ee). Already observed and asserted above.
-        } catch (Exception ignored) {
-            // already closed
+            Assert.assertNull(
+                    "mid-stream reconnect must retry forever, not surface a terminal "
+                            + "(Invariant B); flush() threw: " + observed,
+                    observed);
         }
     }
 
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/TestPorts.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/TestPorts.java
index 43b3e8e0..ecf10800 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/TestPorts.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/TestPorts.java
@@ -40,4 +40,44 @@ public static int findUnusedPort() {
             throw new RuntimeException("failed to allocate an ephemeral port", e);
         }
     }
+
+    /**
+     * Allocates {@code n} DISTINCT ephemeral ports. All {@code n} probe
+     * sockets are held open simultaneously, so the kernel is forced to hand
+     * out {@code n} different ports; they are closed together only after
+     * every port has been collected.
+     * <p>
+     * Do NOT emulate this with repeated {@link #findUnusedPort()} calls:
+     * that helper is bind-close-return, and once its probe socket closes the
+     * port returns to the kernel's ephemeral pool — Linux readily hands the
+     * just-released port straight back to the next {@code bind(0)}, so two
+     * back-to-back calls can return the SAME port. That exact race made a
+     * multi-addr config fail validation with "duplicate addr entry" in CI.
+     */
+    public static int[] findUnusedPorts(int n) {
+        if (n <= 0) {
+            throw new IllegalArgumentException("n must be > 0: " + n);
+        }
+        ServerSocket[] sockets = new ServerSocket[n];
+        int[] ports = new int[n];
+        try {
+            for (int i = 0; i < n; i++) {
+                sockets[i] = new ServerSocket(0, 50, InetAddress.getLoopbackAddress());
+                ports[i] = sockets[i].getLocalPort();
+            }
+            return ports;
+        } catch (IOException e) {
+            throw new RuntimeException("failed to allocate " + n + " ephemeral ports", e);
+        } finally {
+            for (ServerSocket s : sockets) {
+                if (s != null) {
+                    try {
+                        s.close();
+                    } catch (IOException ignored) {
+                        // best-effort; the probe socket carries no state
+                    }
+                }
+            }
+        }
+    }
 }
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/DrainerForegroundEventIsolationTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/DrainerForegroundEventIsolationTest.java
new file mode 100644
index 00000000..f9dd14ab
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/DrainerForegroundEventIsolationTest.java
@@ -0,0 +1,376 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client.sf;
+
+import io.questdb.client.Sender;
+import io.questdb.client.SenderConnectionEvent;
+import io.questdb.client.SenderConnectionListener;
+import io.questdb.client.cutlass.qwp.client.QwpWebSocketSender;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner;
+import io.questdb.client.std.Files;
+import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer;
+import io.questdb.client.test.tools.TestUtils;
+import org.jetbrains.annotations.NotNull;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Contract: background orphan-slot drainers are invisible in the foreground
+ * sender's connection-event stream. {@link SenderConnectionEvent}s describe
+ * the FOREGROUND connection's lifecycle — the documented meaning a monitoring
+ * integration depends on: {@code CONNECTED} fires once when the sender first
+ * comes up, {@code RECONNECTED}/{@code FAILED_OVER} fire when the sender's own
+ * connection was re-established, {@code DISCONNECTED} fires when the sender's
+ * own connection dropped. A drainer connecting, reconnecting after a wire
+ * drop, or failing over is background bookkeeping for an orphan slot and must
+ * not masquerade as foreground lifecycle transitions.
+ * <p>
+ * Both tests are black-box: real {@code Sender} built from config, real
+ * {@link TestWebSocketServer}, events captured through the public
+ * {@code connectionListener} builder hook. They do not care HOW drainer
+ * connects are isolated from foreground state — any implementation that keeps
+ * drainer activity out of the user-visible event stream passes.
+ * <p>
+ * Barriers: the drain outcome is awaited via the public drainer counters
+ * before close; sender close drains the event-dispatcher inbox before
+ * returning, so post-close assertions observe the complete delivered stream;
+ * {@code getDroppedConnectionNotifications() == 0} guards the
+ * absence-assertions against inbox-overflow false greens.
+ */
+public class DrainerForegroundEventIsolationTest {
+
+    private static final int GHOST_ROWS = 5;
+
+    private String sfDir;
+
+    @Before
+    public void setUp() {
+        sfDir = Paths.get(System.getProperty("java.io.tmpdir"),
+                "qdb-drainer-event-iso-" + System.nanoTime()).toString();
+    }
+
+    @After
+    public void tearDown() {
+        if (sfDir != null) rmDirRec(sfDir);
+    }
+
+    /**
+     * A drainer's successful connect must not fire a foreground success event.
+     * The foreground connects exactly once against a healthy server and never
+     * drops, so the event stream must contain exactly one success-kind event:
+     * the initial {@code CONNECTED}. A second success-kind event means the
+     * drainer's connect leaked into the foreground lifecycle stream (today it
+     * surfaces as a fabricated {@code RECONNECTED}/{@code FAILED_OVER} while
+     * the foreground connection never went away).
+     */
+    @Test
+    public void testDrainerConnectMustNotFireForegroundSuccessEvents() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            seedGhostSlot();
+
+            RecordingListener events = new RecordingListener();
+            AckAllHandler handler = new AckAllHandler();
+            try (TestWebSocketServer server = new TestWebSocketServer(handler)) {
+                server.start();
+                Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+
+                String cfg = "ws::addr=localhost:" + server.getPort()
+                        + ";sf_dir=" + sfDir
+                        + ";sender_id=primary"
+                        + ";drain_orphans=true"
+                        + ";max_background_drainers=1;";
+                try (Sender sender = Sender.builder(cfg)
+                        .connectionListener(events)
+                        .build()) {
+                    QwpWebSocketSender ws = (QwpWebSocketSender) sender;
+                    awaitDrainSuccess(ws, handler.distinctPayloads, 10_000);
+                    Assert.assertEquals(
+                            "absence-assertions require a lossless event stream",
+                            0, ws.getDroppedConnectionNotifications());
+                }
+                // Sender is closed: the dispatcher inbox has been drained, the
+                // captured list is the complete delivered stream.
+                List<SenderConnectionEvent> successes = events.ofKinds(
+                        SenderConnectionEvent.Kind.CONNECTED,
+                        SenderConnectionEvent.Kind.RECONNECTED,
+                        SenderConnectionEvent.Kind.FAILED_OVER);
+                Assert.assertEquals(
+                        "background drainer connects must be invisible in the "
+                                + "foreground connection-event stream; expected the "
+                                + "initial CONNECTED only, got: " + successes,
+                        1, successes.size());
+                Assert.assertEquals(
+                        "the single success event must be the foreground's "
+                                + "first-connect CONNECTED",
+                        SenderConnectionEvent.Kind.CONNECTED,
+                        successes.get(0).getKind());
+            }
+        });
+    }
+
+    /**
+     * A drainer's mid-drain wire drop must not fire a foreground
+     * {@code DISCONNECTED}. The server deterministically drops the drainer's
+     * first connection after acking one frame; the drainer reconnects and
+     * finishes the slot. The foreground connection is healthy for the whole
+     * test (it never sends and is never dropped), so a {@code DISCONNECTED}
+     * in the stream is a phantom: it reports an outage, against an endpoint
+     * the foreground is healthily using, that the foreground never had.
+     */
+    @Test
+    public void testDrainerWireDropMustNotFirePhantomForegroundDisconnect() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            seedGhostSlot();
+
+            RecordingListener events = new RecordingListener();
+            DropFirstDataConnectionHandler handler = new DropFirstDataConnectionHandler();
+            try (TestWebSocketServer server = new TestWebSocketServer(handler)) {
+                server.start();
+                Assert.assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+
+                String cfg = "ws::addr=localhost:" + server.getPort()
+                        + ";sf_dir=" + sfDir
+                        + ";sender_id=primary"
+                        + ";drain_orphans=true"
+                        + ";max_background_drainers=1;";
+                try (Sender sender = Sender.builder(cfg)
+                        .connectionListener(events)
+                        .build()) {
+                    QwpWebSocketSender ws = (QwpWebSocketSender) sender;
+                    awaitDrainSuccess(ws, handler.distinctPayloads, 15_000);
+                    // Fixture sanity: the drain really did span a wire drop —
+                    // at least two distinct data connections served frames.
+                    Assert.assertTrue(
+                            "expected the drainer to reconnect after the scripted "
+                                    + "drop; data connections=" + handler.dataConnections(),
+                            handler.dataConnections() >= 2);
+                    Assert.assertEquals(
+                            "absence-assertions require a lossless event stream",
+                            0, ws.getDroppedConnectionNotifications());
+                }
+                List<SenderConnectionEvent> disconnects = events.ofKinds(
+                        SenderConnectionEvent.Kind.DISCONNECTED);
+                Assert.assertEquals(
+                        "a background drainer's wire drop must not surface as a "
+                                + "foreground DISCONNECTED — the foreground connection "
+                                + "never dropped; got: " + disconnects,
+                        0, disconnects.size());
+            }
+        });
+    }
+
+    // Ghost sender against a silent server leaves an unacked orphan slot with
+    // GHOST_ROWS frames under the group root (same recipe as
+    // BackgroundDrainerEndToEndTest).
+    private void seedGhostSlot() throws Exception {
+        try (TestWebSocketServer silent = new TestWebSocketServer(new SilentHandler())) {
+            silent.start();
+            Assert.assertTrue(silent.awaitStart(5, TimeUnit.SECONDS));
+            String cfg = "ws::addr=localhost:" + silent.getPort()
+                    + ";sf_dir=" + sfDir
+                    + ";sender_id=ghost"
+                    + ";close_flush_timeout_millis=0;";
+            try (Sender g = Sender.fromConfig(cfg)) {
+                for (int i = 0; i < GHOST_ROWS; i++) {
+                    g.table("foo").longColumn("v", i).atNow();
+                    g.flush();
+                }
+            }
+        }
+        Assert.assertEquals("ghost slot must be a candidate orphan",
+                1, OrphanScanner.scan(sfDir, "primary").size());
+    }
+
+    private static void awaitDrainSuccess(
+            QwpWebSocketSender ws,
+            java.util.Set<String> distinctPayloads,
+            long timeoutMillis
+    ) throws InterruptedException {
+        long deadline = System.currentTimeMillis() + timeoutMillis;
+        while (System.currentTimeMillis() < deadline
+                && (distinctPayloads.size() < GHOST_ROWS
+                || ws.getTotalBackgroundDrainersSucceeded() < 1)) {
+            Thread.sleep(20);
+        }
+        Assert.assertEquals("drainer must replay every ghost-slot row",
+                GHOST_ROWS, distinctPayloads.size());
+        Assert.assertEquals("drainer must drain the slot fully and exit cleanly",
+                1, ws.getTotalBackgroundDrainersSucceeded());
+    }
+
+    private static void rmDirRec(String dir) {
+        if (!Files.exists(dir)) return;
+        long find = Files.findFirst(dir);
+        if (find > 0) {
+            try {
+                int rc = 1;
+                while (rc > 0) {
+                    String name = Files.utf8ToString(Files.findName(find));
+                    if (name != null && !".".equals(name) && !"..".equals(name)) {
+                        String child = dir + "/" + name;
+                        if (!Files.remove(child)) rmDirRec(child);
+                    }
+                    rc = Files.findNext(find);
+                }
+            } finally {
+                Files.findClose(find);
+            }
+        }
+        Files.remove(dir);
+    }
+
+    // status OK + wire seq + tableCount 0 — the minimal ack the non-durable
+    // drain path consumes (same shape as BackgroundDrainerEndToEndTest).
+    private static byte[] buildAck(long wireSeq) {
+        byte[] buf = new byte[1 + 8 + 2];
+        ByteBuffer bb = ByteBuffer.wrap(buf).order(ByteOrder.LITTLE_ENDIAN);
+        bb.put((byte) 0x00);
+        bb.putLong(wireSeq);
+        bb.putShort((short) 0);
+        return buf;
+    }
+
+    /** Captures every delivered event for post-close exact assertions. */
+    private static final class RecordingListener implements SenderConnectionListener {
+        private final List<SenderConnectionEvent> captured = new ArrayList<>();
+
+        @Override
+        public synchronized void onEvent(@NotNull SenderConnectionEvent event) {
+            captured.add(event);
+        }
+
+        synchronized List<SenderConnectionEvent> ofKinds(SenderConnectionEvent.Kind... kinds) {
+            List<SenderConnectionEvent> out = new ArrayList<>();
+            for (int i = 0, n = captured.size(); i < n; i++) {
+                SenderConnectionEvent e = captured.get(i);
+                for (SenderConnectionEvent.Kind k : kinds) {
+                    if (e.getKind() == k) {
+                        out.add(e);
+                        break;
+                    }
+                }
+            }
+            return out;
+        }
+    }
+
+    private static class SilentHandler implements TestWebSocketServer.WebSocketServerHandler {
+        @Override
+        public void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) {
+            // intentionally no ack
+        }
+    }
+
+    /**
+     * Acks every frame with a per-connection wire sequence. The foreground
+     * connection never sends data in these tests, so only drainer connections
+     * show up here.
+     */
+    private static class AckAllHandler implements TestWebSocketServer.WebSocketServerHandler {
+        final java.util.Set<String> distinctPayloads =
+                java.util.Collections.synchronizedSet(new java.util.HashSet<>());
+        private final java.util.Map<TestWebSocketServer.ClientHandler, long[]> wireSeqByConn =
+                new java.util.IdentityHashMap<>();
+
+        @Override
+        public synchronized void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) {
+            distinctPayloads.add(java.util.Arrays.toString(data));
+            long[] counter = wireSeqByConn.get(client);
+            if (counter == null) {
+                counter = new long[1];
+                wireSeqByConn.put(client, counter);
+            }
+            try {
+                client.sendBinary(buildAck(counter[0]++));
+            } catch (IOException ignored) {
+                // best-effort: connection may be racing its own close
+            }
+        }
+    }
+
+    /**
+     * Deterministic mid-drain wire drop. The first connection that sends a
+     * binary frame (the drainer — the foreground never sends in these tests)
+     * gets exactly one frame acked, then the server closes its socket on the
+     * next frame. Every later connection acks all traffic with a
+     * per-connection wire sequence, so the reconnected drain runs to
+     * completion. State is keyed per {@code ClientHandler} identity: a dead
+     * connection's reader can deliver late buffered frames after a newer
+     * connection started, and those must neither ack with a stale counter nor
+     * disturb the live connection (same discipline as
+     * BackgroundDrainerMidDrainCapabilityGapTest's GapScenarioHandler).
+     */
+    private static class DropFirstDataConnectionHandler
+            implements TestWebSocketServer.WebSocketServerHandler {
+        final java.util.Set<String> distinctPayloads =
+                java.util.Collections.synchronizedSet(new java.util.HashSet<>());
+        private final List<TestWebSocketServer.ClientHandler> arrivalOrder = new ArrayList<>();
+        private final java.util.Map<TestWebSocketServer.ClientHandler, long[]> wireSeqByConn =
+                new java.util.IdentityHashMap<>();
+
+        synchronized int dataConnections() {
+            return arrivalOrder.size();
+        }
+
+        @Override
+        public synchronized void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) {
+            distinctPayloads.add(java.util.Arrays.toString(data));
+            long[] counter = wireSeqByConn.get(client);
+            if (counter == null) {
+                counter = new long[1];
+                wireSeqByConn.put(client, counter);
+                arrivalOrder.add(client);
+            }
+            boolean firstConnection = arrivalOrder.get(0) == client;
+            long seq = counter[0]++;
+            try {
+                if (firstConnection) {
+                    if (seq == 0) {
+                        client.sendBinary(buildAck(seq));
+                    } else if (seq == 1) {
+                        client.close(); // mid-drain wire drop
+                    }
+                    // seq > 1: late buffered frames from the condemned
+                    // connection; ignore.
+                } else {
+                    client.sendBinary(buildAck(seq));
+                }
+            } catch (IOException ignored) {
+                // best-effort: the connection died under us; the drainer
+                // replays on its next connection
+            }
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/OrphanScanIntegrationTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/OrphanScanIntegrationTest.java
index 2ec7b836..eccc030b 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/OrphanScanIntegrationTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/OrphanScanIntegrationTest.java
@@ -39,16 +39,17 @@
 import java.nio.ByteBuffer;
 import java.nio.ByteOrder;
 import java.nio.file.Paths;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.CountDownLatch;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicLong;
 
 /**
  * Integration check: with {@code drain_orphans=true} the foreground sender
- * sees sibling slots holding unacked data and a follow-up call to
- * {@link OrphanScanner#scan} from outside the sender returns the same.
- * <p>
- * The drainer runtime that actually empties orphan slots is a follow-up;
- * this test pins down the visibility/scan piece.
+ * sees sibling slots holding unacked data, adopts them via the background
+ * drainer pool, and replays their unacked frames — after which
+ * {@link OrphanScanner#scan} reports no candidates, both while the adopting
+ * sender is still open and after it closes.
  */
 public class OrphanScanIntegrationTest {
 
@@ -75,7 +76,8 @@ public void testScanFindsOrphanFromPriorSenderUnderSameGroupRoot() throws Except
             // with sender_id=primary and drain_orphans=true.
 
             // Phase 1: ghost writes + closes; never acked.
-            try (TestWebSocketServer ghostServer = new TestWebSocketServer(new SilentHandler())) {
+            SilentHandler silent = new SilentHandler();
+            try (TestWebSocketServer ghostServer = new TestWebSocketServer(silent)) {
                 ghostServer.start();
                 Assert.assertTrue(ghostServer.awaitStart(5, TimeUnit.SECONDS));
 
@@ -85,21 +87,29 @@ public void testScanFindsOrphanFromPriorSenderUnderSameGroupRoot() throws Except
                 try (Sender ghost = Sender.fromConfig(ghostCfg)) {
                     ghost.table("foo").longColumn("v", 7L).atNow();
                     ghost.flush();
+                    // The frame must reach the wire before we close: on-the-wire
+                    // implies the I/O loop read it back from the slot's .sfa, so
+                    // the recovered slot holds publishedFsn >= 1 and the drain in
+                    // phase 2 proves something. Without this await,
+                    // close_flush_timeout=0 can close before the async publish
+                    // lands and the "drain" would trivially succeed on an empty
+                    // slot (observed as "fully drained (target=0)").
+                    Assert.assertTrue("ghost frame must reach the wire before close",
+                            silent.awaitFrame(5, TimeUnit.SECONDS));
                     // No wait for ACK — close right away; close_flush_timeout=0
                     // means we don't drain.
                 }
-            } catch (Exception ignored) {
-                // best-effort
             }
             // Independent verification: the scanner sees the ghost slot.
             ObjList<String> seen = OrphanScanner.scan(sfDir, "primary");
             Assert.assertEquals("ghost slot must be a candidate orphan", 1, seen.size());
             Assert.assertEquals(sfDir + "/ghost", seen.get(0));
 
-            // Phase 2: open the primary sender with drain_orphans=true. We
-            // can't directly assert the log output in this test, but the
-            // call must not throw, and the primary's own slot must NOT
-            // appear in a fresh scan (sender_id-filtered).
+            // Phase 2: open the primary sender with drain_orphans=true. The
+            // background drainer pool adopts the ghost slot, replays its
+            // unacked frames against the ACKing primaryServer, and the
+            // drained slot's .sfa files are removed when the drainer's
+            // engine closes fully drained.
             try (TestWebSocketServer primaryServer = new TestWebSocketServer(new AckHandler())) {
                 primaryServer.start();
                 Assert.assertTrue(primaryServer.awaitStart(5, TimeUnit.SECONDS));
@@ -112,19 +122,28 @@ public void testScanFindsOrphanFromPriorSenderUnderSameGroupRoot() throws Except
                 try (Sender primary = Sender.fromConfig(primaryCfg)) {
                     primary.table("foo").longColumn("v", 8L).atNow();
                     primary.flush();
+                    // Await the drain while the primary is still open so this
+                    // assertion exercises the drainer runtime itself and does
+                    // not depend on close()'s bounded graceful-drain window.
+                    long deadlineNanos = System.nanoTime() + TimeUnit.SECONDS.toNanos(10);
+                    while (OrphanScanner.scan(sfDir, "primary").size() > 0
+                            && System.nanoTime() < deadlineNanos) {
+                        Thread.sleep(10);
+                    }
+                    Assert.assertEquals(
+                            "drainer should have adopted + drained the ghost slot "
+                                    + "while the primary sender is open",
+                            0, OrphanScanner.scan(sfDir, "primary").size());
                 }
-                // With drain_orphans=true, the background drainer pool adopts
-                // the ghost slot, replays its unacked frames against the now-
-                // ACKing primaryServer, and removes the drained slot dir.
                 // Primary's own slot drains cleanly on close() and is filtered
-                // out by sender_id. Net: scanner sees neither.
+                // out by sender_id; the drained ghost slot must not resurface
+                // (e.g. as a spurious .failed quarantine). Net: scanner sees
+                // neither.
                 ObjList<String> postRun = OrphanScanner.scan(sfDir, "primary");
                 Assert.assertEquals(
                         "drain_orphans=true should have drained + removed the "
                                 + "ghost slot; primary's own slot is sender_id-filtered",
                         0, postRun.size());
-            } catch (Exception ignored) {
-                // best-effort
             }
         });
     }
@@ -154,20 +173,38 @@ private static void touchFile(String path) {
     /** Receives binary frames but never acks. Causes the sender to
      *  leave unacked data on disk on close. */
     private static class SilentHandler implements TestWebSocketServer.WebSocketServerHandler {
+        private final CountDownLatch frameReceived = new CountDownLatch(1);
+
+        boolean awaitFrame(long timeout, TimeUnit unit) throws InterruptedException {
+            return frameReceived.await(timeout, unit);
+        }
+
         @Override
         public void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) {
-            // Drop on the floor — no ACK.
+            // Drop on the floor — no ACK. Record receipt so the test can
+            // prove the frame reached the wire (hence the slot's .sfa)
+            // before the ghost sender closes.
+            frameReceived.countDown();
         }
     }
 
-    /** Acks every binary frame. */
+    /**
+     * Acks every binary frame. Sequence numbers are per-connection: the
+     * primary sender and the orphan drainer each open their own WebSocket,
+     * and each connection numbers its frames from 0. A single shared
+     * counter would hand the second connection an ack seq it never sent
+     * ("ACK wire seq N exceeds highest sent 0"), making the drain succeed
+     * only via the client's clamping fallback.
+     */
     private static class AckHandler implements TestWebSocketServer.WebSocketServerHandler {
-        private final AtomicLong nextSeq = new AtomicLong(0);
+        private final ConcurrentHashMap<TestWebSocketServer.ClientHandler, AtomicLong> seqByClient =
+                new ConcurrentHashMap<>();
 
         @Override
         public void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) {
+            long seq = seqByClient.computeIfAbsent(client, k -> new AtomicLong()).getAndIncrement();
             try {
-                client.sendBinary(buildAck(nextSeq.getAndIncrement()));
+                client.sendBinary(buildAck(seq));
             } catch (IOException e) {
                 throw new RuntimeException(e);
             }
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerDurableAckRetryTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerDurableAckRetryTest.java
index 5be1b75a..0a6b69f8 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerDurableAckRetryTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerDurableAckRetryTest.java
@@ -26,21 +26,26 @@
 
 import io.questdb.client.DefaultHttpClientConfiguration;
 import io.questdb.client.cutlass.http.client.WebSocketClient;
+import io.questdb.client.cutlass.http.client.WebSocketUpgradeException;
 import io.questdb.client.network.PlainSocketFactory;
+import io.questdb.client.cutlass.line.LineSenderException;
 import io.questdb.client.cutlass.qwp.client.QwpDurableAckMismatchException;
+import io.questdb.client.cutlass.qwp.client.QwpIngressRoleRejectedException;
 import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer;
 import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener;
 import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop;
 import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner;
 import io.questdb.client.std.Files;
+import io.questdb.client.test.tools.TestUtils;
 import org.junit.After;
 import org.junit.Assert;
 import org.junit.Before;
 import org.junit.Test;
 
-import java.io.IOException;
 import java.nio.file.Paths;
 import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
 import java.util.concurrent.CountDownLatch;
 import java.util.concurrent.TimeUnit;
@@ -68,6 +73,14 @@
  */
 public class BackgroundDrainerDurableAckRetryTest {
 
+    /**
+     * Every {@link StubWebSocketClient} allocated by a test, so its ~128 KB
+     * of eagerly-malloced native buffers (recv + send + control) can be
+     * released before the leak check fires (M7).
+     */
+    private static final List<StubWebSocketClient> LIVE_STUBS =
+            Collections.synchronizedList(new ArrayList<>());
+
     private static final long FAST_BACKOFF_MAX_MILLIS = 4L;
     private static final long FAST_BACKOFF_MILLIS = 1L;
     private static final long FAST_RECONNECT_MAX_DURATION_MILLIS = 60_000L;
@@ -85,6 +98,10 @@ public void setUp() {
 
     @After
     public void tearDown() {
+        // Safety net for exits that bypass the assertMemoryLeak wrapper;
+        // normally a no-op because the wrapper's finally already closed
+        // and cleared the stubs (close() is idempotent).
+        closeAllStubs();
         if (slotPath == null) return;
         long find = Files.findFirst(slotPath);
         if (find > 0) {
@@ -105,226 +122,787 @@ public void tearDown() {
     }
 
     @Test
-    public void testCallbackArgumentsCarrySlotPathAndAttemptNumber() {
-        CountingListener listener = new CountingListener();
-        ScriptedFactory factory = ScriptedFactory.failingTimes(3,
-                () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
-        BackgroundDrainer drainer = newDrainer(factory);
-        drainer.setListener(listener);
-        WebSocketClient out = drainer.connectWithDurableAckRetry();
-        assertSame(factory.successSentinel(), out);
-        assertEquals(3, listener.unavailableSlotPaths.size());
-        for (int i = 0; i < 3; i++) {
-            assertEquals(slotPath, listener.unavailableSlotPaths.get(i));
-            assertEquals(Integer.valueOf(i + 1), listener.unavailableAttempts.get(i));
-        }
-        assertEquals(0, listener.persistentFailures.get());
+    public void testCallbackArgumentsCarrySlotPathAndAttemptNumber() throws Exception {
+        assertMemoryLeak(() -> {
+            CountingListener listener = new CountingListener();
+            ScriptedFactory factory = ScriptedFactory.failingTimes(3,
+                    () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertSame(factory.successSentinel(), out);
+            assertEquals(3, listener.unavailableSlotPaths.size());
+            for (int i = 0; i < 3; i++) {
+                assertEquals(slotPath, listener.unavailableSlotPaths.get(i));
+                assertEquals(Integer.valueOf(i + 1), listener.unavailableAttempts.get(i));
+            }
+            assertEquals(0, listener.persistentFailures.get());
+        });
     }
 
     @Test
-    public void testEscalatesAfterMaxAttemptsAndDropsSentinel() {
-        CountingListener listener = new CountingListener();
-        ScriptedFactory factory = ScriptedFactory.alwaysFailing(
-                () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
-        BackgroundDrainer drainer = newDrainer(factory);
-        drainer.setListener(listener);
-        WebSocketClient out = drainer.connectWithDurableAckRetry();
-        assertNull("escalation must signal failure to caller", out);
-        assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
-        // The escalation attempt itself must not also fire onDurableAckUnavailable.
-        // Threshold attempts trigger one persistent-failure callback and
-        // (threshold - 1) unavailable callbacks.
-        int threshold = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS;
-        assertEquals(threshold - 1, listener.unavailableAttempts.size());
-        assertEquals(1, listener.persistentFailures.get());
-        assertEquals(threshold, listener.lastPersistentTotalAttempts.get());
-        assertTrue("elapsed >= 0", listener.lastPersistentElapsedMs.get() >= 0);
-        // Sentinel dropped with the right reason prefix.
-        String sentinel = slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME;
-        assertTrue("expected .failed sentinel at " + sentinel, Files.exists(sentinel));
-        assertNotNull("lastErrorMessage populated", drainer.getLastErrorMessage());
+    public void testEscalatesAfterMaxAttemptsAndDropsSentinel() throws Exception {
+        assertMemoryLeak(() -> {
+            CountingListener listener = new CountingListener();
+            ScriptedFactory factory = ScriptedFactory.alwaysFailing(
+                    () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertNull("escalation must signal failure to caller", out);
+            assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
+            // The escalation attempt itself must not also fire onDurableAckUnavailable.
+            // Threshold attempts trigger one persistent-failure callback and
+            // (threshold - 1) unavailable callbacks.
+            int threshold = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS;
+            assertEquals(threshold - 1, listener.unavailableAttempts.size());
+            assertEquals(1, listener.persistentFailures.get());
+            assertEquals(threshold, listener.lastPersistentTotalAttempts.get());
+            assertTrue("elapsed >= 0", listener.lastPersistentElapsedMs.get() >= 0);
+            // Sentinel dropped with the right reason prefix.
+            String sentinel = slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME;
+            assertTrue("expected .failed sentinel at " + sentinel, Files.exists(sentinel));
+            assertNotNull("lastErrorMessage populated", drainer.getLastErrorMessage());
+        });
+    }
+
+    @Test
+    public void testListenerThrowingOnPersistentFailureStillMarksFailed() throws Exception {
+        assertMemoryLeak(() -> {
+            BackgroundDrainerListener throwing = new BackgroundDrainerListener() {
+                @Override
+                public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) {
+                    throw new RuntimeException("listener boom (persistent)");
+                }
+
+                @Override
+                public void onDurableAckUnavailable(String slotPath, int attemptNumber) {
+                    // no-op
+                }
+            };
+            ScriptedFactory factory = ScriptedFactory.alwaysFailing(
+                    () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(throwing);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertNull(out);
+            assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
+            // Sentinel must be dropped even though the listener threw.
+            assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    @Test
+    public void testListenerThrowingOnUnavailableContinuesRetrying() throws Exception {
+        assertMemoryLeak(() -> {
+            AtomicInteger unavailableCalls = new AtomicInteger();
+            BackgroundDrainerListener throwing = new BackgroundDrainerListener() {
+                @Override
+                public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) {
+                    Assert.fail("must not escalate");
+                }
+
+                @Override
+                public void onDurableAckUnavailable(String slotPath, int attemptNumber) {
+                    unavailableCalls.incrementAndGet();
+                    throw new RuntimeException("listener boom (transient)");
+                }
+            };
+            ScriptedFactory factory = ScriptedFactory.failingTimes(3,
+                    () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(throwing);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertSame(factory.successSentinel(), out);
+            assertEquals(3, unavailableCalls.get());
+            assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome());
+            // No sentinel dropped on success.
+            assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    @Test
+    public void testNoListenerNoNullPointerOnEscalation() throws Exception {
+        assertMemoryLeak(() -> {
+            ScriptedFactory factory = ScriptedFactory.alwaysFailing(
+                    () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
+            BackgroundDrainer drainer = newDrainer(factory);
+            // Intentionally leave listener null.
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertNull(out);
+            assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
+            assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    @Test
+    public void testTerminalUpgradeMarksFailedImmediately() throws Exception {
+        assertMemoryLeak(() -> {
+            CountingListener listener = new CountingListener();
+            // A genuinely non-retriable upgrade error (non-421 5xx upgrade reject) is
+            // terminal -- waiting will not fix it -- so the drainer quarantines on the
+            // first attempt, exactly like the live sender's background loop halts on
+            // auth/upgrade. A TRANSPORT error, by contrast, is transient and is
+            // retried (see testTransportErrorNeverQuarantinesInvariantB).
+            ScriptedFactory factory = ScriptedFactory.alwaysFailing(
+                    () -> new WebSocketUpgradeException(500, null, "server error during upgrade"));
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertNull(out);
+            assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
+            // Listener must not have been touched — this path doesn't fire either callback.
+            assertEquals(0, listener.unavailableAttempts.size());
+            assertEquals(0, listener.persistentFailures.get());
+            // Sentinel dropped for a genuine terminal.
+            String sentinel = slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME;
+            assertTrue(Files.exists(sentinel));
+            // The factory must have been invoked exactly once — no retry on a terminal.
+            assertEquals(1, factory.attempts());
+        });
     }
 
     @Test
-    public void testListenerThrowingOnPersistentFailureStillMarksFailed() {
-        BackgroundDrainerListener throwing = new BackgroundDrainerListener() {
-            @Override
-            public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) {
-                throw new RuntimeException("listener boom (persistent)");
+    public void testReturnsClientOnSuccessFirstAttempt() throws Exception {
+        assertMemoryLeak(() -> {
+            CountingListener listener = new CountingListener();
+            ScriptedFactory factory = ScriptedFactory.alwaysSucceeding();
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertSame(factory.successSentinel(), out);
+            assertEquals(1, factory.attempts());
+            assertEquals(0, listener.unavailableAttempts.size());
+            assertEquals(0, listener.persistentFailures.get());
+            assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome());
+            assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    @Test
+    public void testRetriesOnDurableAckMismatchThenSucceeds() throws Exception {
+        assertMemoryLeak(() -> {
+            CountingListener listener = new CountingListener();
+            int failTimes = 5;
+            ScriptedFactory factory = ScriptedFactory.failingTimes(failTimes,
+                    () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertSame(factory.successSentinel(), out);
+            assertEquals(failTimes + 1, factory.attempts());
+            assertEquals(failTimes, listener.unavailableAttempts.size());
+            for (int i = 0; i < failTimes; i++) {
+                assertEquals(Integer.valueOf(i + 1), listener.unavailableAttempts.get(i));
+            }
+            assertEquals(0, listener.persistentFailures.get());
+            assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome());
+            assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    @Test
+    public void testStopRequestedDuringRetryAbortsWithStoppedOutcome() throws Exception {
+        assertMemoryLeak(() -> {
+            CountingListener listener = new CountingListener();
+            // Slow factory: each attempt blocks for ~30ms throwing DA mismatch.
+            // Combined with a 50ms reconnectMaxDuration we'd hit budget too,
+            // so set a long budget and rely on requestStop() to break the loop.
+            CountDownLatch firstFailureSeen = new CountDownLatch(1);
+            ScriptedFactory factory = new ScriptedFactory(
+                    /* successSentinel */ stubClient(),
+                    /* throwingTimes */ Integer.MAX_VALUE,
+                    /* throwSupplier */ () -> {
+                firstFailureSeen.countDown();
+                return new QwpDurableAckMismatchException("h", 1234, "primary");
+            });
+            BackgroundDrainer drainer = newDrainerWithBudgets(
+                    factory, /*reconnectMaxDurationMillis*/ 60_000L,
+                    /*backoffInit*/ 5L, /*backoffMax*/ 10L);
+            drainer.setListener(listener);
+            Thread t = new Thread(drainer::connectWithDurableAckRetry, "test-helper");
+            t.setDaemon(true);
+            t.start();
+            // Wait until at least one attempt has fired, then signal stop.
+            Assert.assertTrue("first failure must occur promptly",
+                    firstFailureSeen.await(2, TimeUnit.SECONDS));
+            drainer.requestStop();
+            t.join(5_000);
+            Assert.assertFalse("helper must exit after stop", t.isAlive());
+            assertEquals(BackgroundDrainer.DrainOutcome.STOPPED, drainer.outcome());
+            // No persistent-failure callback on stop; no sentinel dropped.
+            assertEquals(0, listener.persistentFailures.get());
+            assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    @Test
+    public void testWallTimeBudgetEscalatesBeforeAttemptCap() throws Exception {
+        assertMemoryLeak(() -> {
+            CountingListener listener = new CountingListener();
+            // Each failure sleeps 12ms; budget is 25ms — second iteration must
+            // observe deadline crossed without reaching the 16-attempt cap.
+            ScriptedFactory factory = new ScriptedFactory(
+                    /* successSentinel */ stubClient(),
+                    /* throwingTimes */ Integer.MAX_VALUE,
+                    /* throwSupplier */ () -> {
+                try {
+                    Thread.sleep(12);
+                } catch (InterruptedException ignored) {
+                    Thread.currentThread().interrupt();
+                }
+                return new QwpDurableAckMismatchException("h", 1234, "primary");
+            });
+            BackgroundDrainer drainer = newDrainerWithBudgets(
+                    factory, /*reconnectMaxDurationMillis*/ 25L,
+                    /*backoffInit*/ 1L, /*backoffMax*/ 1L);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertNull(out);
+            assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
+            assertEquals("must escalate by wall time, not attempts", 1, listener.persistentFailures.get());
+            int total = listener.lastPersistentTotalAttempts.get();
+            assertTrue("escalated before reaching attempt cap (got " + total + ")",
+                    total < BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS);
+            assertTrue(total >= 1);
+            assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    @Test
+    public void testAllReplicaWindowNeverEscalatesInvariantB() throws Exception {
+        assertMemoryLeak(() -> {
+            // INVARIANT B (orphan drainer): a store-and-forward drainer must NEVER
+            // quarantine a slot just because every reachable endpoint is a REPLICA.
+            // A replica is promotable and a primary will reappear, so an all-replica
+            // window is a TRANSIENT failover state -- the drainer must keep retrying
+            // (capped backoff) until a primary is reachable, stopRequested, or SF
+            // exhaustion. NEITHER the 16-attempt cap NOR the wall-clock reconnect
+            // budget may escalate it to a .failed sentinel.
+            //
+            // Distinct from testEscalatesAfterMaxAttemptsAndDropsSentinel /
+            // testWallTimeBudgetEscalatesBeforeAttemptCap, which use a genuine
+            // durable-ack CAPABILITY gap (QwpDurableAckMismatchException -- a server
+            // upgrades but does not advertise durable ack): that is a real config
+            // problem and stays terminal. This test uses a role reject (every
+            // endpoint is a replica right now), which must NOT be terminal.
+            //
+            // Red-first: connectWithDurableAckRetry() currently lumps role rejects in
+            // with the durable-ack-mismatch give-up, so after the 16-attempt cap /
+            // the budget it markFailed()s and returns -> the helper thread dies. Goes
+            // green once the drainer treats an all-replica window as retry-forever
+            // (split the catch: role reject -> retry; capability gap -> quarantine).
+            CountingListener listener = new CountingListener();
+            AtomicInteger attempts = new AtomicInteger();
+            ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> {
+                attempts.incrementAndGet();
+                return new QwpIngressRoleRejectedException(
+                        QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000);
+            });
+            // SHORT budget + tiny backoff so BOTH give-up triggers (the 16-attempt
+            // cap and the 200ms wall clock) would fire promptly under the bug.
+            BackgroundDrainer drainer = newDrainerWithBudgets(
+                    factory, /*reconnectMaxDurationMillis*/ 200L, /*backoffInit*/ 1L, /*backoffMax*/ 2L);
+            drainer.setListener(listener);
+            Thread t = new Thread(drainer::connectWithDurableAckRetry, "invariant-b-orphan-drainer");
+            t.setDaemon(true);
+            t.start();
+
+            // Observe well past BOTH the 200ms budget and the 16-attempt cap. Under
+            // the bug the drainer escalates (within the cap time) and the helper
+            // thread dies; a contract-honoring drainer is still retrying here.
+            long observeUntilNanos = System.nanoTime() + 600_000_000L; // 600ms >> 200ms budget
+            while (System.nanoTime() < observeUntilNanos && t.isAlive()) {
+                Thread.sleep(10);
             }
 
-            @Override
-            public void onDurableAckUnavailable(String slotPath, int attemptNumber) {
-                // no-op
+            try {
+                assertTrue("orphan drainer gave up on a transient all-replica window (attempts="
+                                + attempts.get() + ", outcome=" + drainer.outcome() + "): Invariant B "
+                                + "forbids quarantining a slot on the 16-attempt cap or the wall-clock "
+                                + "reconnect budget -- a replica is promotable, so the drainer must keep "
+                                + "retrying until a primary reappears or SF is exhausted",
+                        t.isAlive());
+                assertEquals("must not escalate a transient all-replica window to FAILED",
+                        BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome());
+                assertEquals("must not fire persistent-failure on an all-replica window",
+                        0, listener.persistentFailures.get());
+                assertFalse("must not quarantine (.failed sentinel) an all-replica window",
+                        Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+                assertTrue("must have retried past the 16-attempt cap (got " + attempts.get() + ")",
+                        attempts.get() > BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS);
+            } finally {
+                drainer.requestStop();
+                t.join(5_000);
             }
-        };
-        ScriptedFactory factory = ScriptedFactory.alwaysFailing(
-                () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
-        BackgroundDrainer drainer = newDrainer(factory);
-        drainer.setListener(throwing);
-        WebSocketClient out = drainer.connectWithDurableAckRetry();
-        assertNull(out);
-        assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
-        // Sentinel must be dropped even though the listener threw.
-        assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+            assertFalse("helper must exit after stop", t.isAlive());
+            assertEquals(BackgroundDrainer.DrainOutcome.STOPPED, drainer.outcome());
+        });
     }
 
     @Test
-    public void testListenerThrowingOnUnavailableContinuesRetrying() {
-        AtomicInteger unavailableCalls = new AtomicInteger();
-        BackgroundDrainerListener throwing = new BackgroundDrainerListener() {
-            @Override
-            public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) {
-                Assert.fail("must not escalate");
+    public void testTransportErrorNeverQuarantinesInvariantB() throws Exception {
+        assertMemoryLeak(() -> {
+            // INVARIANT B (orphan drainer): a fully-unreachable cluster (server down,
+            // network partition -- every endpoint refuses / times out) is TRANSIENT,
+            // not terminal. The server will come back; the drainer must keep retrying
+            // (capped backoff) until it does, stopRequested, or SF exhaustion -- it
+            // must NEVER quarantine the slot on the first failed sweep. This is the
+            // exact behaviour of the live sender's background loop
+            // (CursorWebSocketSendLoop.connectLoop: a transport error backs off and
+            // retries), which the orphan drainer must match.
+            //
+            // Red-first: connectWithDurableAckRetry() currently routes any non-role,
+            // non-durable-ack Throwable (including "all endpoints unreachable") to an
+            // IMMEDIATE markFailed / .failed sentinel on the first attempt. Green once
+            // transport errors are retried indefinitely like connectLoop. (Genuine
+            // terminals -- auth / non-421 upgrade -- must still fail fast.)
+            CountingListener listener = new CountingListener();
+            AtomicInteger attempts = new AtomicInteger();
+            ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> {
+                attempts.incrementAndGet();
+                return new LineSenderException(
+                        "Failed to connect: all 2 endpoint(s) unreachable; last=127.0.0.1:9000");
+            });
+            BackgroundDrainer drainer = newDrainerWithBudgets(
+                    factory, /*reconnectMaxDurationMillis*/ 200L, /*backoffInit*/ 1L, /*backoffMax*/ 2L);
+            drainer.setListener(listener);
+            Thread t = new Thread(drainer::connectWithDurableAckRetry, "invariant-b-transport-drainer");
+            t.setDaemon(true);
+            t.start();
+
+            // Observe well past the 200ms budget: the drainer must still be retrying.
+            long observeUntilNanos = System.nanoTime() + 600_000_000L; // 600ms >> 200ms budget
+            while (System.nanoTime() < observeUntilNanos && t.isAlive()) {
+                Thread.sleep(10);
             }
 
-            @Override
-            public void onDurableAckUnavailable(String slotPath, int attemptNumber) {
-                unavailableCalls.incrementAndGet();
-                throw new RuntimeException("listener boom (transient)");
+            try {
+                assertTrue("orphan drainer quarantined a fully-unreachable (server-down) cluster "
+                                + "(attempts=" + attempts.get() + ", outcome=" + drainer.outcome()
+                                + "): Invariant B says a down server is transient -- the drainer must "
+                                + "retry indefinitely (exactly like the live background loop), never "
+                                + "quarantine on a transport error",
+                        t.isAlive());
+                assertEquals("must not escalate a transient transport error to FAILED",
+                        BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome());
+                assertEquals("transport retry must not fire a persistent-failure escalation",
+                        0, listener.persistentFailures.get());
+                assertFalse("must not quarantine (.failed sentinel) a down server",
+                        Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+                assertTrue("must have retried the down server well past the first sweep (got "
+                                + attempts.get() + ")",
+                        attempts.get() > BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS);
+            } finally {
+                drainer.requestStop();
+                t.join(5_000);
             }
-        };
-        ScriptedFactory factory = ScriptedFactory.failingTimes(3,
-                () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
-        BackgroundDrainer drainer = newDrainer(factory);
-        drainer.setListener(throwing);
-        WebSocketClient out = drainer.connectWithDurableAckRetry();
-        assertSame(factory.successSentinel(), out);
-        assertEquals(3, unavailableCalls.get());
-        assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome());
-        // No sentinel dropped on success.
-        assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+            assertFalse("helper must exit after stop", t.isAlive());
+            assertEquals(BackgroundDrainer.DrainOutcome.STOPPED, drainer.outcome());
+        });
     }
 
     @Test
-    public void testNoListenerNoNullPointerOnEscalation() {
-        ScriptedFactory factory = ScriptedFactory.alwaysFailing(
-                () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
-        BackgroundDrainer drainer = newDrainer(factory);
-        // Intentionally leave listener null.
-        WebSocketClient out = drainer.connectWithDurableAckRetry();
-        assertNull(out);
-        assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
-        assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+    public void testJvmErrorEscapesConnectRetryLoop() throws Exception {
+        assertMemoryLeak(() -> {
+            // Regression (M3): catch (Throwable) in connectWithDurableAckRetry used
+            // to swallow java.lang.Error (OOM, LinkageError, StackOverflowError)
+            // into the indefinite "cluster unreachable" retry -- pinning the slot
+            // .lock forever with no .failed sentinel and only a throttled WARN as
+            // a trace. A JVM/programming failure is not a transport outage:
+            // retrying cannot clear it, so it must escape the loop on the FIRST
+            // sweep. run()'s outer catch then quarantines the slot (markFailed +
+            // FAILED) and its finally releases the lock -- quarantine-and-exit.
+            CountingListener listener = new CountingListener();
+            ScriptedFactory factory = ScriptedFactory.alwaysFailing(
+                    () -> new LinkageError("simulated JVM failure"));
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(listener);
+            try {
+                drainer.connectWithDurableAckRetry();
+                Assert.fail("a JVM Error must escape the retry loop, "
+                        + "not spin as a transport outage");
+            } catch (LinkageError expected) {
+                assertEquals("simulated JVM failure", expected.getMessage());
+            }
+            // No retry: the Error propagated on the very first attempt.
+            assertEquals(1, factory.attempts());
+            // Neither observability callback fires -- this is not a durable-ack
+            // episode, and no escalation decision was made inside the loop.
+            assertEquals(0, listener.unavailableAttempts.size());
+            assertEquals(0, listener.persistentFailures.get());
+        });
     }
 
     @Test
-    public void testNonDurableAckExceptionMarksFailedImmediately() {
-        CountingListener listener = new CountingListener();
-        ScriptedFactory factory = ScriptedFactory.alwaysFailing(
-                () -> new IOException("transport down"));
-        BackgroundDrainer drainer = newDrainer(factory);
-        drainer.setListener(listener);
-        WebSocketClient out = drainer.connectWithDurableAckRetry();
-        assertNull(out);
-        assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
-        // Listener must not have been touched — this path doesn't fire either callback.
-        assertEquals(0, listener.unavailableAttempts.size());
-        assertEquals(0, listener.persistentFailures.get());
-        // Sentinel reason should reflect the non-DA path (initial connect: ...).
-        String sentinel = slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME;
-        assertTrue(Files.exists(sentinel));
-        // The factory must have been invoked exactly once — no retry on this path.
-        assertEquals(1, factory.attempts());
+    public void testRoleRejectChurnDoesNotConsumeCapabilityGapBudgetInvariantB() throws Exception {
+        assertMemoryLeak(() -> {
+            // Rolling-upgrade interleave: a long all-replica window (role rejects),
+            // then an old-build node is promoted and upgrades WITHOUT durable ack
+            // (genuine capability gap). The transient window must not consume the
+            // 16-attempt settle budget -- the gap phase gets the full budget.
+            int roleRejects = 20; // > the attempt cap: under the bug the first gap attempt escalates
+            int cap = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS;
+            CountingListener listener = new CountingListener();
+            AtomicInteger sweeps = new AtomicInteger();
+            ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> {
+                if (sweeps.incrementAndGet() <= roleRejects) {
+                    return new QwpIngressRoleRejectedException(
+                            QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000);
+                }
+                return new QwpDurableAckMismatchException("h", 1234, "primary");
+            });
+            // 60s wall budget: only the attempt cap can fire in this test.
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertNull(out);
+            assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
+            assertEquals(1, listener.persistentFailures.get());
+            assertEquals("escalation must count capability-gap attempts only",
+                    cap, listener.lastPersistentTotalAttempts.get());
+            assertEquals("full settle budget must be granted after the transient window",
+                    roleRejects + cap, factory.attempts());
+            // M10 split: the transient all-replica window lands on the
+            // onPrimaryUnavailable stream (1..20), the capability-gap episode
+            // on onDurableAckUnavailable (1..15; the 16th fires
+            // persistent-failure instead). Neither stream sees the other's
+            // counter, so a listener alerting on "attemptNumber approaching
+            // the cap" no longer false-positives on role-reject churn.
+            assertEquals(roleRejects, listener.primaryUnavailableAttempts.size());
+            for (int i = 0; i < roleRejects; i++) {
+                assertEquals(Integer.valueOf(i + 1), listener.primaryUnavailableAttempts.get(i));
+            }
+            assertEquals(cap - 1, listener.unavailableAttempts.size());
+            for (int i = 0; i < cap - 1; i++) {
+                assertEquals(Integer.valueOf(i + 1), listener.unavailableAttempts.get(i));
+            }
+            assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
     }
 
     @Test
-    public void testReturnsClientOnSuccessFirstAttempt() {
-        CountingListener listener = new CountingListener();
-        ScriptedFactory factory = ScriptedFactory.alwaysSucceeding();
-        BackgroundDrainer drainer = newDrainer(factory);
-        drainer.setListener(listener);
-        WebSocketClient out = drainer.connectWithDurableAckRetry();
-        assertSame(factory.successSentinel(), out);
-        assertEquals(1, factory.attempts());
-        assertEquals(0, listener.unavailableAttempts.size());
-        assertEquals(0, listener.persistentFailures.get());
-        assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome());
-        assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+    public void testFailoverWindowDoesNotBurnCapabilityGapWallClockInvariantB() throws Exception {
+        assertMemoryLeak(() -> {
+            // The wall-clock half of the settle budget must be anchored at the
+            // FIRST capability-gap error, not at connect entry: an all-replica
+            // window that outlives reconnectMaxDurationMillis must not cause the
+            // first genuine capability-gap attempt to escalate on an already-
+            // expired deadline. Catches the partial fix (separate counter but
+            // entry-anchored deadline) that the attempt-cap test cannot see.
+            int roleRejects = 20;
+            long budgetMillis = 1_000L;
+            int cap = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS;
+            CountingListener listener = new CountingListener();
+            AtomicInteger sweeps = new AtomicInteger();
+            ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> {
+                if (sweeps.incrementAndGet() <= roleRejects) {
+                    // Burn well past the wall-clock budget inside the transient
+                    // window: 20 * 60ms = 1200ms of sleep alone >> 1000ms budget.
+                    try {
+                        Thread.sleep(60);
+                    } catch (InterruptedException ignored) {
+                        Thread.currentThread().interrupt();
+                    }
+                    return new QwpIngressRoleRejectedException(
+                            QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000);
+                }
+                return new QwpDurableAckMismatchException("h", 1234, "primary");
+            });
+            BackgroundDrainer drainer = newDrainerWithBudgets(
+                    factory, budgetMillis, /*backoffInit*/ 1L, /*backoffMax*/ 2L);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertNull(out);
+            assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
+            assertEquals(1, listener.persistentFailures.get());
+            assertEquals("first gap attempt must not observe a deadline burned by the "
+                            + "transient window -- full attempt budget expected",
+                    cap, listener.lastPersistentTotalAttempts.get());
+            assertEquals(roleRejects + cap, factory.attempts());
+            assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    @Test
+    public void testRoleRejectResetsCapabilityGapEpisode() throws Exception {
+        assertMemoryLeak(() -> {
+            // An intervening role reject proves the topology changed (the node
+            // that produced earlier gap errors is gone), so the settle budget
+            // restarts: 15 gap errors, one role reject, then gaps again -- the
+            // second episode gets the full 16 attempts, it does not inherit the
+            // first episode's 15.
+            int cap = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS;
+            CountingListener listener = new CountingListener();
+            AtomicInteger sweeps = new AtomicInteger();
+            ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> {
+                if (sweeps.incrementAndGet() == cap) { // 16th sweep: role reject between the gap runs
+                    return new QwpIngressRoleRejectedException(
+                            QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000);
+                }
+                return new QwpDurableAckMismatchException("h", 1234, "primary");
+            });
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertNull(out);
+            assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
+            assertEquals(1, listener.persistentFailures.get());
+            assertEquals("second episode must get the full budget after the reset",
+                    cap, listener.lastPersistentTotalAttempts.get());
+            // 15 gap + 1 role reject + 16 gap = 32 sweeps total.
+            assertEquals(2 * cap, factory.attempts());
+            // M10 split, per-stream: the DA stream carries both episodes'
+            // per-episode numbering (1..15, then 1..15 again -- the second
+            // episode's 16th attempt fires persistent-failure instead), and
+            // the reset between them is attributable: exactly one role reject
+            // on the primary stream. Before the split the reset was an
+            // ambiguous non-monotonic drop in a single stream.
+            List<Integer> expectedDaStream = new ArrayList<>();
+            for (int episode = 0; episode < 2; episode++) {
+                for (int i = 1; i <= cap - 1; i++) {
+                    expectedDaStream.add(i);
+                }
+            }
+            assertEquals(expectedDaStream, listener.unavailableAttempts);
+            assertEquals(Collections.singletonList(1), listener.primaryUnavailableAttempts);
+            assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    @Test
+    public void testRoleRejectAndCapabilityGapLandOnSeparateStreams() throws Exception {
+        assertMemoryLeak(() -> {
+            // M10 discriminator: gap -> role reject -> gap -> success. The
+            // released 1.3.4 contract fed BOTH conditions to
+            // onDurableAckUnavailable, so this script produced the ambiguous
+            // stream [1, 1, 1] -- a listener could not tell a budget-bound
+            // capability-gap episode from a never-escalating role-reject
+            // window, and could not see WHY the episode counter reset. With
+            // the split, the DA stream carries only the two one-attempt gap
+            // episodes ([1, 1] -- the reset stays visible) and the role
+            // reject that caused the reset lands on the primary stream ([1]).
+            CountingListener listener = new CountingListener();
+            AtomicInteger sweeps = new AtomicInteger();
+            ScriptedFactory factory = ScriptedFactory.failingTimes(3, () -> {
+                if (sweeps.incrementAndGet() == 2) {
+                    return new QwpIngressRoleRejectedException(
+                            QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000);
+                }
+                return new QwpDurableAckMismatchException("h", 1234, "primary");
+            });
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertSame(factory.successSentinel(), out);
+            assertEquals(4, factory.attempts());
+            assertEquals("DA stream must carry only the gap episodes, each"
+                            + " restarting at 1 after the role-reject reset",
+                    Arrays.asList(1, 1), listener.unavailableAttempts);
+            assertEquals("role reject must land on the primary stream",
+                    Collections.singletonList(1), listener.primaryUnavailableAttempts);
+            assertEquals(Collections.singletonList(slotPath), listener.primaryUnavailableSlotPaths);
+            assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome());
+            assertEquals(0, listener.persistentFailures.get());
+            assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    @Test
+    public void testTransportErrorDoesNotResetCapabilityGapEpisode() throws Exception {
+        assertMemoryLeak(() -> {
+            // A transport blip between gap attempts does not prove promotion
+            // churn: it must neither consume the budget (no increment) nor
+            // restart it (no reset) -- otherwise a flaky-but-misconfigured
+            // cluster would evade the cap forever. 15 gaps, one transport error,
+            // one gap: escalates on that 16th gap attempt.
+            int cap = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS;
+            CountingListener listener = new CountingListener();
+            AtomicInteger sweeps = new AtomicInteger();
+            ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> {
+                if (sweeps.incrementAndGet() == cap) { // 16th sweep: transport error between the gap runs
+                    return new LineSenderException("Failed to connect: all 2 endpoint(s) "
+                            + "unreachable; last=127.0.0.1:9000");
+                }
+                return new QwpDurableAckMismatchException("h", 1234, "primary");
+            });
+            BackgroundDrainer drainer = newDrainer(factory);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertNull(out);
+            assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
+            assertEquals(1, listener.persistentFailures.get());
+            assertEquals("transport blip must not restart the episode",
+                    cap, listener.lastPersistentTotalAttempts.get());
+            // 15 gap + 1 transport + 1 gap = 17 sweeps total.
+            assertEquals(cap + 1, factory.attempts());
+            assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    @Test
+    public void testTransportWindowDoesNotBurnCapabilityGapWallClock() throws Exception {
+        assertMemoryLeak(() -> {
+            // Red-first: the wall-clock half of the settle budget is anchored at
+            // gap #1, and a transport window BETWEEN gap sweeps must PAUSE it --
+            // only gap-to-gap time is the cluster "failing to settle". Under the
+            // bug the deadline keeps ticking while the cluster is unreachable:
+            // gap #1 anchors the deadline, the cluster then drops off the network
+            // for longer than the entire budget (transport errors are retried
+            // "forever" and charge nothing else), and when it comes back still
+            // briefly gapped, gap #2 observes an expired deadline and quarantines
+            // the slot after just 2 gap sweeps -- contradicting both the
+            // 16-attempt settle intent and Invariant B's "transients never
+            // consume the budget". Evasion is not a concern: the attempt counter
+            // survives the window untouched, which
+            // testTransportErrorDoesNotResetCapabilityGapEpisode pins.
+            // Here the cluster actually settles after the outage (two more gap
+            // sweeps, then durable-ack-capable), so the drain must proceed --
+            // no escalation, no sentinel.
+            long budgetMillis = 250L;
+            CountingListener listener = new CountingListener();
+            AtomicInteger sweeps = new AtomicInteger();
+            ScriptedFactory factory = ScriptedFactory.failingTimes(4, () -> {
+                if (sweeps.incrementAndGet() == 2) {
+                    // Cluster fully unreachable for ~2.5x the wall-clock budget.
+                    // A real outage is time spent inside reconnect() walking
+                    // unreachable endpoints, so model it inside the factory.
+                    try {
+                        Thread.sleep(budgetMillis * 2 + 100);
+                    } catch (InterruptedException ignored) {
+                        Thread.currentThread().interrupt();
+                    }
+                    return new LineSenderException("Failed to connect: all 2 endpoint(s) "
+                            + "unreachable; last=127.0.0.1:9000");
+                }
+                return new QwpDurableAckMismatchException("h", 1234, "primary");
+            });
+            BackgroundDrainer drainer = newDrainerWithBudgets(
+                    factory, budgetMillis, FAST_BACKOFF_MILLIS, FAST_BACKOFF_MAX_MILLIS);
+            drainer.setListener(listener);
+            WebSocketClient out = drainer.connectWithDurableAckRetry();
+            assertSame("cluster recovered after the outage -- the drain must proceed, not "
+                            + "quarantine on a wall clock burned by the transport window",
+                    factory.successSentinel(), out);
+            // gap #1 + outage + gap #2 + gap #3 + success = 5 sweeps.
+            assertEquals(5, factory.attempts());
+            assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome());
+            assertEquals("transport window must not trigger persistent-failure escalation",
+                    0, listener.persistentFailures.get());
+            assertFalse("no .failed sentinel: the slot was never in a terminal state",
+                    Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
     }
 
     @Test
-    public void testRetriesOnDurableAckMismatchThenSucceeds() {
+    public void testRoleRejectGrantsFreshWallClockToNextGapEpisode() {
+        // Companion to testRoleRejectResetsCapabilityGapEpisode, which pins the
+        // ATTEMPT-counter half of the episode reset but runs under a 60s budget
+        // where the wall-clock half is unobservable: a mutant that resets only
+        // capabilityGapAttempts (leaving capabilityGapElapsedNanos /
+        // lastCapabilityGapNanos ticking) passes it. This test pins the
+        // WALL-CLOCK half: gap sweeps burn most of the budget, a role reject
+        // proves the topology churned, and the next gap episode must start
+        // from a zero wall clock -- under the counter-only mutant the stale
+        // elapsed (plus the still-anchored lastCapabilityGapNanos charging
+        // straight across the role-reject window) exhausts the budget and
+        // quarantines a cluster that was about to settle.
+        long budgetMillis = 800L;
         CountingListener listener = new CountingListener();
-        int failTimes = 5;
-        ScriptedFactory factory = ScriptedFactory.failingTimes(failTimes,
-                () -> new QwpDurableAckMismatchException("h", 1234, "primary"));
-        BackgroundDrainer drainer = newDrainer(factory);
+        AtomicInteger sweeps = new AtomicInteger();
+        ScriptedFactory factory = ScriptedFactory.failingTimes(5, () -> {
+            switch (sweeps.incrementAndGet()) {
+                case 2:
+                    // Burn ~600ms of the 800ms budget inside the first gap
+                    // episode (charged by this sweep's gap-to-gap interval).
+                    sleepQuietly(600);
+                    return new QwpDurableAckMismatchException("h", 1234, "primary");
+                case 3:
+                    // Topology churn: the settle budget must restart in full.
+                    return new QwpIngressRoleRejectedException(
+                            QwpIngressRoleRejectedException.ROLE_REPLICA, "127.0.0.1", 9000);
+                case 5:
+                    // Second episode burns ~350ms -- well inside a fresh 800ms
+                    // budget, but 600 + 350 > 800 under the mutant's carried-over
+                    // wall clock.
+                    sleepQuietly(350);
+                    return new QwpDurableAckMismatchException("h", 1234, "primary");
+                default:
+                    return new QwpDurableAckMismatchException("h", 1234, "primary");
+            }
+        });
+        BackgroundDrainer drainer = newDrainerWithBudgets(
+                factory, budgetMillis, FAST_BACKOFF_MILLIS, FAST_BACKOFF_MAX_MILLIS);
         drainer.setListener(listener);
         WebSocketClient out = drainer.connectWithDurableAckRetry();
-        assertSame(factory.successSentinel(), out);
-        assertEquals(failTimes + 1, factory.attempts());
-        assertEquals(failTimes, listener.unavailableAttempts.size());
-        for (int i = 0; i < failTimes; i++) {
-            assertEquals(Integer.valueOf(i + 1), listener.unavailableAttempts.get(i));
-        }
-        assertEquals(0, listener.persistentFailures.get());
+        assertSame("role reject restarts the episode wall clock -- the second gap "
+                        + "episode must get the full settle budget, not the first "
+                        + "episode's leftovers",
+                factory.successSentinel(), out);
+        // gap, gap(+600ms), roleReject, gap, gap(+350ms), success = 6 sweeps.
+        assertEquals(6, factory.attempts());
         assertEquals(BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome());
-        assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        assertEquals("a settling cluster must never see a persistent-failure escalation",
+                0, listener.persistentFailures.get());
+        assertFalse("no .failed sentinel: both gap episodes stayed inside their budgets",
+                Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        // Per-stream attempt numbering across the reset (M10 split): the DA
+        // stream carries gaps 1,2 then the fresh episode's 1,2; the role
+        // reject that restarted the episode lands on the primary stream.
+        assertEquals(Arrays.asList(1, 2, 1, 2), listener.unavailableAttempts);
+        assertEquals(Collections.singletonList(1), listener.primaryUnavailableAttempts);
     }
 
     @Test
-    public void testStopRequestedDuringRetryAbortsWithStoppedOutcome() throws Exception {
-        CountingListener listener = new CountingListener();
-        // Slow factory: each attempt blocks for ~30ms throwing DA mismatch.
-        // Combined with a 50ms reconnectMaxDuration we'd hit budget too,
-        // so set a long budget and rely on requestStop() to break the loop.
+    public void testRequestStopInterruptsLongBackoffParkPromptly() throws Exception {
+        // Pins the stop-promptness contract of the backoff park: requestStop()
+        // must break the drainer out of a LONG park (unpark, backstopped by
+        // the 50ms STOP_CHECK_PARK_CHUNK_NANOS chunking) instead of sleeping
+        // out the remainder. testStopRequestedDuringRetryAbortsWithStoppedOutcome
+        // cannot see this: its 5-10ms backoffs complete faster than any
+        // reasonable join timeout, so a monolithic park with no unpark passes
+        // it. Here the backoff is 5s and the exit bound is 2s -- an
+        // implementation that parks the full backoff in one shot fails.
         CountDownLatch firstFailureSeen = new CountDownLatch(1);
-        ScriptedFactory factory = new ScriptedFactory(
-                /* successSentinel */ stubClient(),
-                /* throwingTimes */ Integer.MAX_VALUE,
-                /* throwSupplier */ () -> {
+        ScriptedFactory factory = ScriptedFactory.alwaysFailing(() -> {
             firstFailureSeen.countDown();
-            return new QwpDurableAckMismatchException("h", 1234, "primary");
+            // Transport error: the un-clamped (boundedByBudget=false) sleep
+            // path, so the park is backoff+jitter (5-10s), never trimmed to
+            // the wall-clock budget.
+            return new LineSenderException(
+                    "Failed to connect: all 2 endpoint(s) unreachable; last=127.0.0.1:9000");
         });
         BackgroundDrainer drainer = newDrainerWithBudgets(
                 factory, /*reconnectMaxDurationMillis*/ 60_000L,
-                /*backoffInit*/ 5L, /*backoffMax*/ 10L);
-        drainer.setListener(listener);
-        Thread t = new Thread(drainer::connectWithDurableAckRetry, "test-helper");
+                /*backoffInit*/ 5_000L, /*backoffMax*/ 5_000L);
+        Thread t = new Thread(drainer::connectWithDurableAckRetry, "long-park-stop-drainer");
         t.setDaemon(true);
         t.start();
-        // Wait until at least one attempt has fired, then signal stop.
         Assert.assertTrue("first failure must occur promptly",
                 firstFailureSeen.await(2, TimeUnit.SECONDS));
+        // Give the drainer a moment to enter the 5-10s park. If requestStop()
+        // instead lands before the park, the pre-park stopRequested check
+        // skips it entirely -- either way the exit must be prompt.
+        Thread.sleep(100);
+        long stopNanos = System.nanoTime();
         drainer.requestStop();
-        t.join(5_000);
-        Assert.assertFalse("helper must exit after stop", t.isAlive());
+        t.join(2_000);
+        long exitMillis = (System.nanoTime() - stopNanos) / 1_000_000L;
+        Assert.assertFalse("requestStop() must break the drainer out of a 5-10s "
+                        + "backoff park promptly (exit took >" + exitMillis + "ms); "
+                        + "a monolithic park with no unpark sleeps out the full backoff",
+                t.isAlive());
         assertEquals(BackgroundDrainer.DrainOutcome.STOPPED, drainer.outcome());
-        // No persistent-failure callback on stop; no sentinel dropped.
-        assertEquals(0, listener.persistentFailures.get());
-        assertFalse(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        assertFalse("stop is not a failure: no .failed sentinel",
+                Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
     }
 
-    @Test
-    public void testWallTimeBudgetEscalatesBeforeAttemptCap() {
-        CountingListener listener = new CountingListener();
-        // Each failure sleeps 12ms; budget is 25ms — second iteration must
-        // observe deadline crossed without reaching the 16-attempt cap.
-        ScriptedFactory factory = new ScriptedFactory(
-                /* successSentinel */ stubClient(),
-                /* throwingTimes */ Integer.MAX_VALUE,
-                /* throwSupplier */ () -> {
-            try {
-                Thread.sleep(12);
-            } catch (InterruptedException ignored) {
-                Thread.currentThread().interrupt();
-            }
-            return new QwpDurableAckMismatchException("h", 1234, "primary");
-        });
-        BackgroundDrainer drainer = newDrainerWithBudgets(
-                factory, /*reconnectMaxDurationMillis*/ 25L,
-                /*backoffInit*/ 1L, /*backoffMax*/ 1L);
-        drainer.setListener(listener);
-        WebSocketClient out = drainer.connectWithDurableAckRetry();
-        assertNull(out);
-        assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
-        assertEquals("must escalate by wall time, not attempts", 1, listener.persistentFailures.get());
-        int total = listener.lastPersistentTotalAttempts.get();
-        assertTrue("escalated before reaching attempt cap (got " + total + ")",
-                total < BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS);
-        assertTrue(total >= 1);
-        assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+    private static void sleepQuietly(long millis) {
+        try {
+            Thread.sleep(millis);
+        } catch (InterruptedException ignored) {
+            Thread.currentThread().interrupt();
+        }
     }
 
     private BackgroundDrainer newDrainer(ScriptedFactory factory) {
@@ -350,8 +928,35 @@ private BackgroundDrainer newDrainerWithBudgets(
                 /* durableAckKeepaliveIntervalMillis */ 200L);
     }
 
+    /**
+     * Wraps a test body in {@link TestUtils#assertMemoryLeak} and closes every
+     * stub the body allocated BEFORE the leak check fires -- LeakCheck closes
+     * at the end of the wrapped lambda, so an @After-only close would run too
+     * late and fail every wrapped test.
+     */
+    private static void assertMemoryLeak(TestUtils.LeakProneCode code) throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            try {
+                code.run();
+            } finally {
+                closeAllStubs();
+            }
+        });
+    }
+
+    private static void closeAllStubs() {
+        synchronized (LIVE_STUBS) {
+            for (int i = 0, n = LIVE_STUBS.size(); i < n; i++) {
+                LIVE_STUBS.get(i).close();
+            }
+            LIVE_STUBS.clear();
+        }
+    }
+
     private static StubWebSocketClient stubClient() {
-        return new StubWebSocketClient();
+        StubWebSocketClient client = new StubWebSocketClient();
+        LIVE_STUBS.add(client);
+        return client;
     }
 
     /**
@@ -362,6 +967,8 @@ private static final class CountingListener implements BackgroundDrainerListener
         final AtomicInteger lastPersistentElapsedMs = new AtomicInteger(-1);
         final AtomicInteger lastPersistentTotalAttempts = new AtomicInteger(-1);
         final AtomicInteger persistentFailures = new AtomicInteger();
+        final List<Integer> primaryUnavailableAttempts = new ArrayList<>();
+        final List<String> primaryUnavailableSlotPaths = new ArrayList<>();
         final List<Integer> unavailableAttempts = new ArrayList<>();
         final List<String> unavailableSlotPaths = new ArrayList<>();
 
@@ -377,6 +984,12 @@ public synchronized void onDurableAckUnavailable(String slotPath, int attemptNum
             unavailableSlotPaths.add(slotPath);
             unavailableAttempts.add(attemptNumber);
         }
+
+        @Override
+        public synchronized void onPrimaryUnavailable(String slotPath, int attemptNumber) {
+            primaryUnavailableSlotPaths.add(slotPath);
+            primaryUnavailableAttempts.add(attemptNumber);
+        }
     }
 
     /**
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerInterruptedTeardownTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerInterruptedTeardownTest.java
new file mode 100644
index 00000000..d1526146
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerInterruptedTeardownTest.java
@@ -0,0 +1,287 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client.sf.cursor;
+
+import io.questdb.client.DefaultHttpClientConfiguration;
+import io.questdb.client.cutlass.http.client.WebSocketClient;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop;
+import io.questdb.client.network.PlainSocketFactory;
+import io.questdb.client.std.Compat;
+import io.questdb.client.std.Files;
+import io.questdb.client.std.MemoryTag;
+import io.questdb.client.std.Unsafe;
+import io.questdb.client.test.tools.TestUtils;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.nio.file.Paths;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicReference;
+
+/**
+ * Red test for the SEGV half of finding C5 — interrupted drainer teardown
+ * must not unmap the engine under a live I/O thread.
+ * <p>
+ * Production sequence: during an outage the drainer's loop I/O thread sits
+ * inside a blocking native connect ({@code connect_timeout} defaults to 0 =
+ * OS timeout; neither unpark nor interrupt cancels {@code connect(2)}).
+ * {@code BackgroundDrainerPool.close()} escalates — graceful drain,
+ * {@code requestStop()}, then {@code shutdownNow()} — and the interrupt
+ * lands in {@code loop.close()}'s {@code shutdownLatch.await()}. Pre-fix,
+ * {@code close()} swallows it and returns while the I/O thread is alive;
+ * {@code BackgroundDrainer.run()}'s finally then closes the engine —
+ * {@code munmap}/{@code Unsafe.free} on segment memory a thread that is
+ * still alive may touch with raw {@code Unsafe} reads.
+ * <p>
+ * The invariant pinned here: <b>at the moment {@code run()} returns, NOT
+ * (loop I/O thread alive AND slot lock released)</b>. The slot lock is an
+ * on-disk protocol shared with other processes and scanners, and
+ * {@code engine.close()} releases it strictly after unmapping — so
+ * "lock released" is the public, behavioral witness of "engine torn down".
+ * Either valid fix shape satisfies the invariant: block until the thread
+ * exits (re-await), or keep the lock/engine alive past {@code run()} by
+ * delegating engine teardown to the I/O thread's exit path. The tail of the
+ * test additionally requires that the slot lock is EVENTUALLY released once
+ * the stuck connect resolves — a fix may defer teardown, not abandon it.
+ * <p>
+ * NOTE: this test is a proxy for the memory-safety property ("no engine
+ * access after unmap"), which cannot be asserted in-process — a SEGV kills
+ * the JVM, and {@code Unsafe.free}'d memory is not guaranteed to fault. The
+ * invariant is a sufficient teardown discipline, deliberately stricter than
+ * the minimal property; see the C5 review discussion.
+ * <p>
+ * Determinism: no sleeps. The interrupt is delivered after
+ * {@code requestStop()} while the runner is either in an interrupt-immune
+ * park ({@code LockSupport.parkNanos} preserves the flag) or already in the
+ * latch await — both routes arrive at the await with the flag set, which
+ * throws before parking. The "stuck connect" is a latch-gated factory
+ * (unpark-immune; {@code close()} never interrupts the I/O thread). The
+ * test only runs safely on pre-fix code because the already-landed
+ * discard-when-stopped fix keeps the post-teardown I/O thread away from
+ * engine memory — the hazard this test guards is real on any HEAD without
+ * that commit.
+ */
+public class BackgroundDrainerInterruptedTeardownTest {
+
+    private static final long SEGMENT_BYTES = 64 * 1024;
+    private String tmpDir;
+
+    @Before
+    public void setUp() {
+        tmpDir = Paths.get(System.getProperty("java.io.tmpdir"),
+                "qdb-c5-teardown-" + System.nanoTime()).toString();
+        Assert.assertEquals(0, Files.mkdir(tmpDir, Files.DIR_MODE_DEFAULT));
+    }
+
+    @After
+    public void tearDown() {
+        if (tmpDir == null) return;
+        long find = Files.findFirst(tmpDir);
+        if (find > 0) {
+            try {
+                int rc = 1;
+                while (rc > 0) {
+                    String name = Files.utf8ToString(Files.findName(find));
+                    if (name != null && !".".equals(name) && !"..".equals(name)) {
+                        Files.remove(tmpDir + "/" + name);
+                    }
+                    rc = Files.findNext(find);
+                }
+            } finally {
+                Files.findClose(find);
+            }
+        }
+        Files.remove(tmpDir);
+    }
+
+    @Test
+    public void testC5_interruptedTeardownMustNotReleaseSlotUnderLiveIoThread() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            // 1. Slot with one published, unacked frame so the drainer opens
+            //    a real engine and spins up a send loop.
+            long buf = Unsafe.malloc(16, MemoryTag.NATIVE_DEFAULT);
+            try {
+                CursorSendEngine prep = new CursorSendEngine(tmpDir, SEGMENT_BYTES);
+                try {
+                    for (int i = 0; i < 16; i++) {
+                        Unsafe.getUnsafe().putByte(buf + i, (byte) i);
+                    }
+                    Assert.assertEquals(0L, prep.appendBlocking(buf, 16));
+                } finally {
+                    // Unacked data on disk -> close() keeps the .sfa files.
+                    prep.close();
+                }
+            } finally {
+                Unsafe.free(buf, 16, MemoryTag.NATIVE_DEFAULT);
+            }
+
+            final CountDownLatch enteredReconnect = new CountDownLatch(1);
+            final CountDownLatch releaseConnect = new CountDownLatch(1);
+            final AtomicInteger connects = new AtomicInteger();
+            final AtomicReference<Thread> ioThreadRef = new AtomicReference<>();
+            // Client #1: initial connect succeeds (on the runner thread), but
+            // every send throws -- driving the I/O thread into its reconnect
+            // loop. Client #2: handed back by the gated "connect" after the
+            // teardown has already run.
+            final StubWebSocketClient wireDownClient = new StubWebSocketClient();
+            final StubWebSocketClient postTeardownClient = new StubWebSocketClient();
+
+            final CursorWebSocketSendLoop.ReconnectFactory factory = () -> {
+                if (connects.incrementAndGet() == 1) {
+                    // Initial connect: runs on the drainer's runner thread.
+                    return wireDownClient;
+                }
+                // Wire-failure reconnect: runs on the loop's I/O thread.
+                // Stand-in for a blocking native connect(2): unpark-immune
+                // (a latch await re-parks after a spurious wake) and never
+                // interrupted (loop.close() only unparks).
+                ioThreadRef.set(Thread.currentThread());
+                enteredReconnect.countDown();
+                releaseConnect.await();
+                return postTeardownClient;
+            };
+
+            final BackgroundDrainer drainer = new BackgroundDrainer(
+                    tmpDir, SEGMENT_BYTES, Long.MAX_VALUE, factory,
+                    5_000L, 10L, 50L, false, 0L);
+
+            Thread runner = new Thread(drainer::run, "drainer-runner");
+            runner.setDaemon(true);
+            runner.start();
+            try {
+                Assert.assertTrue("I/O thread never reached the reconnect factory",
+                        enteredReconnect.await(10, TimeUnit.SECONDS));
+
+                // Pool-shutdown stand-in: requestStop, then the shutdownNow
+                // interrupt. Wherever the runner is at this instant -- the
+                // poll park (flag-preserving) or already in the latch await --
+                // the flag arrives at the await and throws before parking.
+                drainer.requestStop();
+                runner.interrupt();
+                runner.join(10_000L);
+                Assert.assertFalse("drainer did not return after stop + interrupt",
+                        runner.isAlive());
+                Assert.assertEquals(BackgroundDrainer.DrainOutcome.STOPPED,
+                        drainer.outcome());
+
+                Thread ioThread = ioThreadRef.get();
+                Assert.assertNotNull(ioThread);
+                boolean ioThreadAliveAtReturn = ioThread.isAlive();
+                boolean slotLockFreeAtReturn = isSlotLockFree();
+                Assert.assertFalse(
+                        "C5 (SEGV): BackgroundDrainer.run() returned with the slot lock "
+                                + "released (engine closed -- segments munmap'd/freed) while "
+                                + "the loop's I/O thread was still alive inside a blocking "
+                                + "connect. loop.close() swallowed the InterruptedException "
+                                + "from shutdownLatch.await() and returned; the finally then "
+                                + "unmapped memory a live thread may touch with raw Unsafe "
+                                + "reads. Teardown must either wait for the thread or be "
+                                + "delegated to its exit path.",
+                        ioThreadAliveAtReturn && slotLockFreeAtReturn);
+            } finally {
+                // Unblock the "connect" and quiesce regardless of verdict so
+                // the memory-leak wrapper sees a fully wound-down world.
+                releaseConnect.countDown();
+                Thread ioThread = ioThreadRef.get();
+                if (ioThread != null) {
+                    ioThread.join(10_000L);
+                    Assert.assertFalse("I/O thread did not exit after the connect returned",
+                            ioThread.isAlive());
+                }
+                wireDownClient.close();
+                postTeardownClient.close();
+            }
+
+            // Deferred is fine; abandoned is not: once the stuck connect
+            // resolved and the I/O thread exited, the slot lock must be
+            // released (engine closed by whoever ended up owning teardown),
+            // or no scanner can ever adopt the slot's remaining data.
+            long deadlineNanos = System.nanoTime() + TimeUnit.SECONDS.toNanos(10);
+            while (!isSlotLockFree()) {
+                Assert.assertTrue(
+                        "slot lock never released after the I/O thread exited -- "
+                                + "engine teardown was abandoned, not deferred",
+                        System.nanoTime() < deadlineNanos);
+                Compat.onSpinWait();
+            }
+        });
+    }
+
+    /**
+     * Public, behavioral probe of the slot lock: opening an engine on the
+     * slot succeeds iff no other engine holds the on-disk lock. The probe
+     * engine is closed immediately; the slot's unacked data keeps its files
+     * on disk, so probing is observation-only.
+     */
+    private boolean isSlotLockFree() {
+        try {
+            new CursorSendEngine(tmpDir, SEGMENT_BYTES).close();
+            return true;
+        } catch (IllegalStateException e) {
+            String msg = e.getMessage();
+            if (msg != null && msg.contains("already in use")) {
+                return false;
+            }
+            throw e;
+        }
+    }
+
+    /**
+     * Minimal concrete {@link WebSocketClient}: connect-level collaborator
+     * only. Every send throws, so handing it to a live loop deterministically
+     * drives the I/O thread into its reconnect path without native I/O.
+     */
+    private static final class StubWebSocketClient extends WebSocketClient {
+        StubWebSocketClient() {
+            super(DefaultHttpClientConfiguration.INSTANCE, PlainSocketFactory.INSTANCE);
+        }
+
+        @Override
+        public void sendBinary(long dataPtr, int length) {
+            throw new IllegalStateException("stub: wire down");
+        }
+
+        @Override
+        public void sendBinary(long dataPtr, int length, int timeout) {
+            throw new IllegalStateException("stub: wire down");
+        }
+
+        @Override
+        protected void ioWait(int timeout, int op) {
+            throw new UnsupportedOperationException("stub: no socket");
+        }
+
+        @Override
+        protected void setupIoWait() {
+            // no-op
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerMidDrainCapabilityGapTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerMidDrainCapabilityGapTest.java
new file mode 100644
index 00000000..889fd3e5
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerMidDrainCapabilityGapTest.java
@@ -0,0 +1,426 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client.sf.cursor;
+
+import io.questdb.client.cutlass.http.client.WebSocketClient;
+import io.questdb.client.cutlass.http.client.WebSocketClientFactory;
+import io.questdb.client.cutlass.http.client.WebSocketUpgradeException;
+import io.questdb.client.cutlass.qwp.client.QwpDurableAckMismatchException;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner;
+import io.questdb.client.std.Files;
+import io.questdb.client.std.MemoryTag;
+import io.questdb.client.std.Unsafe;
+import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer;
+import io.questdb.client.test.tools.TestUtils;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Paths;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+/**
+ * Mid-drain durable-ack capability-gap coverage for {@link BackgroundDrainer}.
+ * <p>
+ * The initial-connect path ({@code connectWithDurableAckRetry}) gives a
+ * cluster-wide durable-ack capability gap a bounded settle budget (16
+ * consecutive sweeps / wall clock) before quarantining the slot — the budget
+ * exists precisely for rolling-upgrade transients. The same condition hit
+ * <i>mid-drain</i> (wire drops, the loop's reconnect sweep lands on a node
+ * that upgrades but does not advertise durable ack) must get the same budget:
+ * the drainer re-enters the budgeted connect instead of dropping a
+ * {@code .failed} sentinel on the first sweep. Genuine terminals (auth,
+ * non-421 upgrade reject) still quarantine immediately — the sanctioned
+ * terminal set is unchanged.
+ * <p>
+ * Wire realism: a real {@link TestWebSocketServer} acks over a live socket;
+ * the scripted {@link CursorWebSocketSendLoop.ReconnectFactory} decides,
+ * per connect attempt, whether the sweep sees a healthy node or the
+ * capability gap. The mid-drain drop is deterministic — the server closes
+ * the first connection after durably acking exactly one frame.
+ */
+public class BackgroundDrainerMidDrainCapabilityGapTest {
+
+    private static final long FAST_BACKOFF_MAX_MILLIS = 4L;
+    private static final long FAST_BACKOFF_MILLIS = 1L;
+    private static final long RECONNECT_MAX_DURATION_MILLIS = 60_000L;
+    private static final int SEEDED_FRAMES = 5;
+    private static final long SEGMENT_SIZE_BYTES = 16384L;
+    private static final long SF_MAX_TOTAL_BYTES = 1L << 20;
+
+    private String slotPath;
+
+    @Before
+    public void setUp() {
+        slotPath = Paths.get(System.getProperty("java.io.tmpdir"),
+                "qdb-mid-drain-gap-" + System.nanoTime()).toString();
+        assertEquals("mkdir slot dir", 0, Files.mkdir(slotPath, Files.DIR_MODE_DEFAULT));
+    }
+
+    @After
+    public void tearDown() {
+        rmDirRec(slotPath);
+    }
+
+    @Test
+    public void testMidDrainCapabilityGapGetsSettleBudgetNotQuarantine() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            seedSlot(SEEDED_FRAMES);
+            GapScenarioHandler handler = new GapScenarioHandler(/* dropFirstConnection */ true);
+            try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) {
+                server.start();
+                assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+                // Call 1: healthy connect (drain starts). The server durably
+                // acks one frame, then drops the wire. Calls 2-4: the
+                // reconnect sweep finds only the capability-gap node. Call 5+:
+                // the rolling upgrade settled; a capable node is back.
+                ScriptedWireFactory factory =
+                        new ScriptedWireFactory(server.getPort(), 2, 4);
+                BackgroundDrainer drainer = newDrainer(factory);
+                CountingListener listener = new CountingListener();
+                drainer.setListener(listener);
+
+                runToCompletion(drainer);
+
+                assertEquals("a transient capability gap inside the settle budget "
+                                + "must not quarantine the slot",
+                        BackgroundDrainer.DrainOutcome.SUCCESS, drainer.outcome());
+                assertFalse("no .failed sentinel after a successful drain",
+                        Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+                // 1 healthy + 3 gap sweeps + 1 healthy at minimum. Stopping at
+                // 2 means the drainer latched terminal on the first gap sweep.
+                assertTrue("expected the drainer to retry through the gap, attempts="
+                        + factory.attempts(), factory.attempts() >= 5);
+                // The loop's own failed sweep (call 2) latches the loop; budget
+                // attempts 1 and 2 (calls 3, 4) fire the observability callback.
+                assertEquals(Arrays.asList(1, 2), listener.unavailableAttempts);
+                assertEquals(0, listener.persistentFailures.get());
+            }
+        });
+    }
+
+    @Test
+    public void testMidDrainPersistentCapabilityGapExhaustsBudgetThenQuarantines() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            seedSlot(SEEDED_FRAMES);
+            GapScenarioHandler handler = new GapScenarioHandler(/* dropFirstConnection */ true);
+            try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) {
+                server.start();
+                assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+                // Gap never clears: every sweep after the drop throws.
+                ScriptedWireFactory factory =
+                        new ScriptedWireFactory(server.getPort(), 2, Integer.MAX_VALUE);
+                BackgroundDrainer drainer = newDrainer(factory);
+                CountingListener listener = new CountingListener();
+                drainer.setListener(listener);
+
+                runToCompletion(drainer);
+
+                int budget = BackgroundDrainer.DEFAULT_MAX_DURABLE_ACK_MISMATCH_ATTEMPTS;
+                assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
+                assertTrue("persistent gap must quarantine after the budget",
+                        Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+                // Escalation goes through the settle budget, not the generic
+                // wire-error path: the persistent-failure callback fires once
+                // with the full budget consumed.
+                assertEquals(1, listener.persistentFailures.get());
+                assertEquals(budget, listener.lastPersistentTotalAttempts.get());
+                // 1 healthy connect + 1 loop reconnect sweep (latches the loop)
+                // + the full budget of re-entered sweeps.
+                assertEquals(2 + budget, factory.attempts());
+            }
+        });
+    }
+
+    @Test
+    public void testMidDrainTerminalUpgradeErrorStillQuarantinesImmediately() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            seedSlot(SEEDED_FRAMES);
+            GapScenarioHandler handler = new GapScenarioHandler(/* dropFirstConnection */ true);
+            try (TestWebSocketServer server = new TestWebSocketServer(handler, true)) {
+                server.start();
+                assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+                // Non-421 upgrade reject mid-drain: sanctioned terminal, no
+                // settle budget — the drainer must quarantine on the first
+                // sweep exactly as before.
+                ScriptedWireFactory factory = new ScriptedWireFactory(
+                        server.getPort(), 2, Integer.MAX_VALUE,
+                        () -> new WebSocketUpgradeException(500, null, "server error during upgrade"));
+                BackgroundDrainer drainer = newDrainer(factory);
+                CountingListener listener = new CountingListener();
+                drainer.setListener(listener);
+
+                runToCompletion(drainer);
+
+                assertEquals(BackgroundDrainer.DrainOutcome.FAILED, drainer.outcome());
+                assertTrue(Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+                assertEquals("terminal upgrade error must not consume gap sweeps",
+                        2, factory.attempts());
+                assertEquals(0, listener.unavailableAttempts.size());
+                assertEquals(0, listener.persistentFailures.get());
+            }
+        });
+    }
+
+    private BackgroundDrainer newDrainer(ScriptedWireFactory factory) {
+        return new BackgroundDrainer(
+                slotPath,
+                SEGMENT_SIZE_BYTES,
+                SF_MAX_TOTAL_BYTES,
+                factory,
+                RECONNECT_MAX_DURATION_MILLIS,
+                FAST_BACKOFF_MILLIS,
+                FAST_BACKOFF_MAX_MILLIS,
+                /* requestDurableAck */ true,
+                /* durableAckKeepaliveIntervalMillis */ 200L);
+    }
+
+    private static void rmDirRec(String dir) {
+        if (dir == null || !Files.exists(dir)) return;
+        long find = Files.findFirst(dir);
+        if (find > 0) {
+            try {
+                int rc = 1;
+                while (rc > 0) {
+                    String name = Files.utf8ToString(Files.findName(find));
+                    if (name != null && !".".equals(name) && !"..".equals(name)) {
+                        String child = dir + "/" + name;
+                        if (!Files.remove(child)) rmDirRec(child);
+                    }
+                    rc = Files.findNext(find);
+                }
+            } finally {
+                Files.findClose(find);
+            }
+        }
+        Files.remove(dir);
+    }
+
+    private static void runToCompletion(BackgroundDrainer drainer) throws InterruptedException {
+        Thread t = new Thread(drainer, "test-mid-drain-drainer");
+        t.setDaemon(true);
+        t.start();
+        t.join(20_000);
+        if (t.isAlive()) {
+            drainer.requestStop();
+            t.join(5_000);
+            fail("drainer did not finish within 20s (outcome=" + drainer.outcome() + ")");
+        }
+    }
+
+    private void seedSlot(int frames) {
+        try (CursorSendEngine engine = new CursorSendEngine(slotPath, SEGMENT_SIZE_BYTES)) {
+            long buf = Unsafe.malloc(16, MemoryTag.NATIVE_DEFAULT);
+            try {
+                byte[] payload = "frame-bytes-padd".getBytes(StandardCharsets.US_ASCII);
+                for (int i = 0; i < payload.length; i++) {
+                    Unsafe.getUnsafe().putByte(buf + i, payload[i]);
+                }
+                for (int i = 0; i < frames; i++) {
+                    engine.appendBlocking(buf, 16);
+                }
+            } finally {
+                Unsafe.free(buf, 16, MemoryTag.NATIVE_DEFAULT);
+            }
+        }
+    }
+
+    /**
+     * Records listener invocations for exact-count assertions.
+     */
+    private static final class CountingListener implements BackgroundDrainerListener {
+        final AtomicInteger lastPersistentTotalAttempts = new AtomicInteger(-1);
+        final AtomicInteger persistentFailures = new AtomicInteger();
+        final List<Integer> unavailableAttempts = new ArrayList<>();
+
+        @Override
+        public synchronized void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) {
+            persistentFailures.incrementAndGet();
+            lastPersistentTotalAttempts.set(totalAttempts);
+        }
+
+        @Override
+        public synchronized void onDurableAckUnavailable(String slotPath, int attemptNumber) {
+            unavailableAttempts.add(attemptNumber);
+        }
+    }
+
+    /**
+     * Server-side script. Connection #1 durably acks exactly one frame, then
+     * closes the socket — a deterministic mid-drain wire drop. Every later
+     * connection acks all traffic (OK + durable-ack per frame, per-connection
+     * wire sequence), so a reconnected loop drains to completion.
+     * <p>
+     * State is keyed per {@code ClientHandler} identity. A dead connection's
+     * reader can still deliver late buffered frames AFTER a newer connection
+     * started (the server reads ahead of the socket close), so any
+     * "latest-connection" flip-flop bookkeeping desyncs the per-connection
+     * wire sequence and produces phantom connections. Acks are best-effort:
+     * a late frame from a dead connection must neither ack with a stale
+     * counter nor kill the reader thread of a live one.
+     */
+    private static final class GapScenarioHandler implements TestWebSocketServer.WebSocketServerHandler {
+        private static final String TABLE = "trades";
+        private final boolean dropFirstConnection;
+        private final List<TestWebSocketServer.ClientHandler> arrivalOrder = new ArrayList<>();
+        private final java.util.Map<TestWebSocketServer.ClientHandler, long[]> wireSeqByConn =
+                new java.util.IdentityHashMap<>();
+
+        GapScenarioHandler(boolean dropFirstConnection) {
+            this.dropFirstConnection = dropFirstConnection;
+        }
+
+        @Override
+        public synchronized void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) {
+            long[] counter = wireSeqByConn.get(client);
+            if (counter == null) {
+                counter = new long[1];
+                wireSeqByConn.put(client, counter);
+                arrivalOrder.add(client);
+            }
+            int connectionIndex = arrivalOrder.indexOf(client) + 1;
+            long seq = counter[0]++;
+            try {
+                if (dropFirstConnection && connectionIndex == 1) {
+                    if (seq == 0) {
+                        client.sendBinary(okFrame(seq, seq));
+                        client.sendBinary(durableAckFrame(seq));
+                    } else if (seq == 1) {
+                        client.close(); // mid-drain wire drop
+                    }
+                    // seq > 1: late buffered frames from the condemned
+                    // connection; ignore.
+                } else {
+                    client.sendBinary(okFrame(seq, seq));
+                    client.sendBinary(durableAckFrame(seq));
+                }
+            } catch (IOException ignored) {
+                // Best-effort ack: the connection died under us (e.g. racing
+                // its own close). The client replays on its next connection.
+            }
+        }
+
+        private static byte[] durableAckFrame(long seqTxn) {
+            byte[] name = TABLE.getBytes(StandardCharsets.UTF_8);
+            ByteBuffer bb = ByteBuffer.allocate(1 + 2 + 2 + name.length + 8)
+                    .order(ByteOrder.LITTLE_ENDIAN);
+            bb.put((byte) 0x02); // STATUS_DURABLE_ACK
+            bb.putShort((short) 1); // tableCount
+            bb.putShort((short) name.length);
+            bb.put(name);
+            bb.putLong(seqTxn);
+            return bb.array();
+        }
+
+        private static byte[] okFrame(long wireSeq, long seqTxn) {
+            byte[] name = TABLE.getBytes(StandardCharsets.UTF_8);
+            ByteBuffer bb = ByteBuffer.allocate(1 + 8 + 2 + 2 + name.length + 8)
+                    .order(ByteOrder.LITTLE_ENDIAN);
+            bb.put((byte) 0x00); // STATUS_OK
+            bb.putLong(wireSeq);
+            bb.putShort((short) 1); // tableCount
+            bb.putShort((short) name.length);
+            bb.put(name);
+            bb.putLong(seqTxn);
+            return bb.array();
+        }
+    }
+
+    /**
+     * Per-call-index scripted factory over a real wire. Call indexes inside
+     * {@code [throwFrom, throwTo]} (1-based, inclusive) throw the scripted
+     * exception; every other call returns a live upgraded client against the
+     * test server, with durable ack requested — exactly the client the
+     * production connect walk would hand back.
+     */
+    private static final class ScriptedWireFactory implements CursorWebSocketSendLoop.ReconnectFactory {
+        private final AtomicInteger calls = new AtomicInteger();
+        private final int port;
+        private final ThrowableSupplier throwSupplier;
+        private final int throwFrom;
+        private final int throwTo;
+
+        ScriptedWireFactory(int port, int throwFrom, int throwTo) {
+            this(port, throwFrom, throwTo,
+                    () -> new QwpDurableAckMismatchException("localhost", port, "primary"));
+        }
+
+        ScriptedWireFactory(int port, int throwFrom, int throwTo, ThrowableSupplier throwSupplier) {
+            this.port = port;
+            this.throwFrom = throwFrom;
+            this.throwTo = throwTo;
+            this.throwSupplier = throwSupplier;
+        }
+
+        int attempts() {
+            return calls.get();
+        }
+
+        @Override
+        public WebSocketClient reconnect() throws Exception {
+            int n = calls.incrementAndGet();
+            if (n >= throwFrom && n <= throwTo) {
+                Throwable t = throwSupplier.get();
+                if (t instanceof RuntimeException) throw (RuntimeException) t;
+                if (t instanceof Exception) throw (Exception) t;
+                throw new RuntimeException(t);
+            }
+            WebSocketClient c = WebSocketClientFactory.newPlainTextInstance();
+            try {
+                c.setQwpMaxVersion(1);
+                c.setQwpRequestDurableAck(true);
+                c.setConnectTimeout(5_000);
+                c.connect("localhost", port);
+                c.upgrade("/write/v4", 5_000, null);
+            } catch (Throwable t) {
+                c.close();
+                throw t;
+            }
+            return c;
+        }
+    }
+
+    @FunctionalInterface
+    private interface ThrowableSupplier {
+        Throwable get();
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerPoolConnectPhaseCloseTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerPoolConnectPhaseCloseTest.java
new file mode 100644
index 00000000..d31f60d4
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerPoolConnectPhaseCloseTest.java
@@ -0,0 +1,189 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client.sf.cursor;
+
+import io.questdb.client.cutlass.line.LineSenderException;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerPool;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner;
+import io.questdb.client.std.Files;
+import io.questdb.client.std.MemoryTag;
+import io.questdb.client.std.Unsafe;
+import io.questdb.client.test.tools.TestUtils;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Paths;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Coverage of {@link BackgroundDrainerPool#close()}'s split stop policy
+ * (M11): a drainer that never started draining — still inside its
+ * connect-retry loop, e.g. during a cluster outage — is stop-signaled
+ * BEFORE the graceful-drain window, so {@code close()} returns in roughly
+ * one stop-check park chunk (~50ms) instead of burning the full
+ * {@code GRACEFUL_DRAIN_MILLIS + STOP_GRACE_MILLIS} (~3s) on a drainer
+ * that cannot possibly finish.
+ * <p>
+ * The factory throws a plain transport-shaped {@link LineSenderException}
+ * (the shape of every outage-time connect failure), which doubles this
+ * test as the contract check that such failures are retried under
+ * Invariant B — outcome stays PENDING while running, becomes STOPPED on
+ * close, and NEVER drops a {@code .failed} sentinel.
+ */
+public class BackgroundDrainerPoolConnectPhaseCloseTest {
+
+    /**
+     * Well below the pool's 2.5s graceful window: generous enough for CI
+     * scheduling jitter, tight enough that a regression to
+     * "graceful-wait-first" (>= 2500ms) fails loudly.
+     */
+    private static final long CLOSE_BUDGET_MILLIS = 2_000L;
+    /** Longer than the close budget: close() must not sleep a backoff out. */
+    private static final long LONG_BACKOFF_MILLIS = 30_000L;
+    private static final long SEGMENT_SIZE_BYTES = 16384L;
+    private static final long SF_MAX_TOTAL_BYTES = 1L << 20;
+
+    private String slotPath;
+
+    @Before
+    public void setUp() {
+        slotPath = Paths.get(System.getProperty("java.io.tmpdir"),
+                "qdb-pool-connect-close-" + System.nanoTime()).toString();
+        assertEquals("mkdir slot dir", 0, Files.mkdir(slotPath, Files.DIR_MODE_DEFAULT));
+    }
+
+    @After
+    public void tearDown() {
+        rmDirRec(slotPath);
+    }
+
+    @Test
+    public void testCloseStopsConnectPhaseDrainerWithoutBurningGracefulWindow() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            // Unacked data on disk: without it run() exits SUCCESS before
+            // ever entering the connect-retry loop this test needs.
+            seedSlot(3);
+            final CountDownLatch firstAttempt = new CountDownLatch(1);
+            final AtomicInteger attempts = new AtomicInteger();
+            final CursorWebSocketSendLoop.ReconnectFactory factory = () -> {
+                attempts.incrementAndGet();
+                firstAttempt.countDown();
+                // Plain transport-shaped failure: the shape of every
+                // outage-time connect error. Must be retried, never
+                // quarantined.
+                throw new LineSenderException(
+                        "Failed to connect: all endpoints unreachable (simulated outage)");
+            };
+            final BackgroundDrainer drainer = new BackgroundDrainer(
+                    slotPath,
+                    SEGMENT_SIZE_BYTES,
+                    SF_MAX_TOTAL_BYTES,
+                    factory,
+                    /* reconnectMaxDurationMillis */ 60_000L,
+                    /* reconnectInitialBackoffMillis */ LONG_BACKOFF_MILLIS,
+                    /* reconnectMaxBackoffMillis */ LONG_BACKOFF_MILLIS,
+                    /* requestDurableAck */ false,
+                    /* durableAckKeepaliveIntervalMillis */ 0L);
+            final BackgroundDrainerPool pool = new BackgroundDrainerPool(1);
+            pool.submit(drainer);
+            assertTrue("drainer must enter its connect-retry loop",
+                    firstAttempt.await(5, TimeUnit.SECONDS));
+
+            final long startNanos = System.nanoTime();
+            pool.close();
+            final long elapsedMillis = (System.nanoTime() - startNanos) / 1_000_000L;
+
+            assertTrue("close() must stop a connect-phase drainer immediately (split stop "
+                            + "policy), not wait out the graceful-drain window; took "
+                            + elapsedMillis + "ms with a " + LONG_BACKOFF_MILLIS
+                            + "ms drainer backoff in flight",
+                    elapsedMillis < CLOSE_BUDGET_MILLIS);
+            assertEquals("a stop-signaled connect-phase drainer exits STOPPED (slot stays "
+                            + "adoptable), never FAILED",
+                    BackgroundDrainer.DrainOutcome.STOPPED, drainer.outcome());
+            assertEquals("a connect-phase drainer must never have advanced ackedFsn",
+                    -1L, drainer.getAckedFsn());
+            assertTrue("the drainer must have attempted at least one connect sweep",
+                    attempts.get() >= 1);
+            assertFalse("outage-shaped connect failures must never quarantine the slot "
+                            + "(.failed sentinel)",
+                    Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    /**
+     * Seeds {@code frames} frames and returns nothing the test needs beyond
+     * the on-disk unacked state; mirrors
+     * {@code BackgroundDrainerTransportOutageRecoveryTest.seedSlot}.
+     */
+    private void seedSlot(int frames) {
+        try (CursorSendEngine engine = new CursorSendEngine(slotPath, SEGMENT_SIZE_BYTES)) {
+            long buf = Unsafe.malloc(16, MemoryTag.NATIVE_DEFAULT);
+            try {
+                byte[] payload = "frame-bytes-padd".getBytes(StandardCharsets.US_ASCII);
+                for (int i = 0; i < payload.length; i++) {
+                    Unsafe.getUnsafe().putByte(buf + i, payload[i]);
+                }
+                for (int i = 0; i < frames; i++) {
+                    engine.appendBlocking(buf, 16);
+                }
+            } finally {
+                Unsafe.free(buf, 16, MemoryTag.NATIVE_DEFAULT);
+            }
+        }
+    }
+
+    private static void rmDirRec(String dir) {
+        if (dir == null || !Files.exists(dir)) return;
+        long find = Files.findFirst(dir);
+        if (find > 0) {
+            try {
+                int rc = 1;
+                while (rc > 0) {
+                    String name = Files.utf8ToString(Files.findName(find));
+                    if (name != null && !".".equals(name) && !"..".equals(name)) {
+                        String child = dir + "/" + name;
+                        if (!Files.remove(child)) rmDirRec(child);
+                    }
+                    rc = Files.findNext(find);
+                }
+            } finally {
+                Files.findClose(find);
+            }
+        }
+        Files.remove(dir);
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerTransportOutageRecoveryTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerTransportOutageRecoveryTest.java
new file mode 100644
index 00000000..86a3decc
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/BackgroundDrainerTransportOutageRecoveryTest.java
@@ -0,0 +1,319 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client.sf.cursor;
+
+import io.questdb.client.cutlass.http.client.WebSocketClient;
+import io.questdb.client.cutlass.http.client.WebSocketClientFactory;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainer;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.BackgroundDrainerListener;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.OrphanScanner;
+import io.questdb.client.std.Files;
+import io.questdb.client.std.MemoryTag;
+import io.questdb.client.std.Unsafe;
+import io.questdb.client.test.cutlass.qwp.client.TestPorts;
+import io.questdb.client.test.cutlass.qwp.websocket.TestWebSocketServer;
+import io.questdb.client.test.tools.TestUtils;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Paths;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+/**
+ * Down-then-up transport-outage recovery for {@link BackgroundDrainer},
+ * end-to-end over a real wire (M8 conjunction gap).
+ * <p>
+ * Invariant B's two halves were previously pinned only in isolation:
+ * "a transport outage longer than the settle budget never quarantines"
+ * (ScriptedFactory unit level, ends with {@code requestStop()}) and
+ * "the drainer recovers once errors clear" (scripted throws, no real
+ * outage). This test conjoins them on ONE endpoint: the server is DOWN at
+ * drainer start (every connect is a genuine ECONNREFUSED through the real
+ * {@link WebSocketClient} connect/upgrade path), stays down for several
+ * multiples of {@code reconnect_max_duration_millis} while the drainer
+ * sweeps, then comes back UP on the SAME port — and the drainer must
+ * complete the drain, having never dropped a {@code .failed} sentinel or
+ * fired a persistent-failure escalation during the outage.
+ */
+public class BackgroundDrainerTransportOutageRecoveryTest {
+
+    private static final long FAST_BACKOFF_MAX_MILLIS = 4L;
+    private static final long FAST_BACKOFF_MILLIS = 1L;
+    /** Deliberately tiny: the outage below outlives it several times over. */
+    private static final long RECONNECT_MAX_DURATION_MILLIS = 200L;
+    private static final int SEEDED_FRAMES = 5;
+    private static final long SEGMENT_SIZE_BYTES = 16384L;
+    private static final long SF_MAX_TOTAL_BYTES = 1L << 20;
+
+    private String slotPath;
+
+    @Before
+    public void setUp() {
+        slotPath = Paths.get(System.getProperty("java.io.tmpdir"),
+                "qdb-outage-recovery-" + System.nanoTime()).toString();
+        assertEquals("mkdir slot dir", 0, Files.mkdir(slotPath, Files.DIR_MODE_DEFAULT));
+    }
+
+    @After
+    public void tearDown() {
+        rmDirRec(slotPath);
+    }
+
+    @Test
+    public void testDrainerSurvivesOutageLongerThanBudgetThenDrainsWhenServerReturns() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            long targetFsn = seedSlot(SEEDED_FRAMES);
+            int port = TestPorts.findUnusedPort();
+            WireFactory factory = new WireFactory(port);
+            BackgroundDrainer drainer = new BackgroundDrainer(
+                    slotPath,
+                    SEGMENT_SIZE_BYTES,
+                    SF_MAX_TOTAL_BYTES,
+                    factory,
+                    RECONNECT_MAX_DURATION_MILLIS,
+                    FAST_BACKOFF_MILLIS,
+                    FAST_BACKOFF_MAX_MILLIS,
+                    /* requestDurableAck */ true,
+                    /* durableAckKeepaliveIntervalMillis */ 200L);
+            CountingListener listener = new CountingListener();
+            drainer.setListener(listener);
+
+            Thread t = new Thread(drainer, "outage-recovery-drainer");
+            t.setDaemon(true);
+            t.start();
+            try {
+                // OUTAGE PHASE: nothing listens on the port, so every sweep is a
+                // real refused connect. Hold the outage for 3x the wall-clock
+                // budget AND at least a handful of sweeps, whichever is later --
+                // under an Invariant B breach (transport errors charged to the
+                // budget / attempt cap) the drainer escalates well within this
+                // window and the thread dies.
+                long outageUntilNanos = System.nanoTime()
+                        + 3 * RECONNECT_MAX_DURATION_MILLIS * 1_000_000L;
+                while ((System.nanoTime() < outageUntilNanos || factory.attempts() < 8)
+                        && t.isAlive()) {
+                    Thread.sleep(10);
+                }
+                assertTrue("drainer gave up during a transport outage (attempts="
+                                + factory.attempts() + ", outcome=" + drainer.outcome()
+                                + "): Invariant B says a down server is transient -- the "
+                                + "drainer must still be retrying 3x past the settle budget",
+                        t.isAlive());
+                assertEquals("outage must not escalate past PENDING",
+                        BackgroundDrainer.DrainOutcome.PENDING, drainer.outcome());
+                assertEquals("outage must not fire a persistent-failure escalation",
+                        0, listener.persistentFailures.get());
+                assertFalse("outage must not quarantine (.failed sentinel) the slot",
+                        Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+
+                // RECOVERY PHASE: the server comes back on the SAME port. The
+                // drainer's next sweep connects for real and ships the slot.
+                try (TestWebSocketServer server = new TestWebSocketServer(
+                        new AckAllHandler(), true, null, port)) {
+                    server.start();
+                    assertTrue(server.awaitStart(5, TimeUnit.SECONDS));
+                    t.join(20_000);
+                    if (t.isAlive()) {
+                        drainer.requestStop();
+                        t.join(5_000);
+                        fail("drainer did not drain within 20s of the server returning "
+                                + "(outcome=" + drainer.outcome()
+                                + ", attempts=" + factory.attempts()
+                                + ", lastError=" + drainer.getLastErrorMessage() + ")");
+                    }
+                }
+            } finally {
+                drainer.requestStop();
+                t.join(5_000);
+            }
+
+            assertEquals("server recovery must complete the drain",
+                    BackgroundDrainer.DrainOutcome.SUCCESS, drainer.outcome());
+            assertEquals("every seeded frame must be durably acked",
+                    targetFsn, drainer.getAckedFsn());
+            assertEquals(0, listener.persistentFailures.get());
+            assertFalse("no .failed sentinel after a successful drain",
+                    Files.exists(slotPath + "/" + OrphanScanner.FAILED_SENTINEL_NAME));
+        });
+    }
+
+    private static void rmDirRec(String dir) {
+        if (dir == null || !Files.exists(dir)) return;
+        long find = Files.findFirst(dir);
+        if (find > 0) {
+            try {
+                int rc = 1;
+                while (rc > 0) {
+                    String name = Files.utf8ToString(Files.findName(find));
+                    if (name != null && !".".equals(name) && !"..".equals(name)) {
+                        String child = dir + "/" + name;
+                        if (!Files.remove(child)) rmDirRec(child);
+                    }
+                    rc = Files.findNext(find);
+                }
+            } finally {
+                Files.findClose(find);
+            }
+        }
+        Files.remove(dir);
+    }
+
+    /** Seeds {@code frames} frames and returns the slot's published fsn --
+     * the drain target the drainer must ack up to. */
+    private long seedSlot(int frames) {
+        try (CursorSendEngine engine = new CursorSendEngine(slotPath, SEGMENT_SIZE_BYTES)) {
+            long buf = Unsafe.malloc(16, MemoryTag.NATIVE_DEFAULT);
+            try {
+                byte[] payload = "frame-bytes-padd".getBytes(StandardCharsets.US_ASCII);
+                for (int i = 0; i < payload.length; i++) {
+                    Unsafe.getUnsafe().putByte(buf + i, payload[i]);
+                }
+                for (int i = 0; i < frames; i++) {
+                    engine.appendBlocking(buf, 16);
+                }
+            } finally {
+                Unsafe.free(buf, 16, MemoryTag.NATIVE_DEFAULT);
+            }
+            return engine.publishedFsn();
+        }
+    }
+
+    /**
+     * Acks every frame (OK + durable-ack, per-connection wire sequence) so a
+     * reconnected drainer drains to completion. Trimmed-down clone of the
+     * mid-drain test's healthy-server behaviour.
+     */
+    private static final class AckAllHandler implements TestWebSocketServer.WebSocketServerHandler {
+        private static final String TABLE = "trades";
+        private final java.util.Map<TestWebSocketServer.ClientHandler, long[]> wireSeqByConn =
+                new java.util.IdentityHashMap<>();
+
+        @Override
+        public synchronized void onBinaryMessage(TestWebSocketServer.ClientHandler client, byte[] data) {
+            long[] counter = wireSeqByConn.get(client);
+            if (counter == null) {
+                counter = new long[1];
+                wireSeqByConn.put(client, counter);
+            }
+            long seq = counter[0]++;
+            try {
+                client.sendBinary(okFrame(seq, seq));
+                client.sendBinary(durableAckFrame(seq));
+            } catch (IOException ignored) {
+                // Best-effort ack: the connection died under us; the client
+                // replays on its next connection.
+            }
+        }
+
+        private static byte[] durableAckFrame(long seqTxn) {
+            byte[] name = TABLE.getBytes(StandardCharsets.UTF_8);
+            ByteBuffer bb = ByteBuffer.allocate(1 + 2 + 2 + name.length + 8)
+                    .order(ByteOrder.LITTLE_ENDIAN);
+            bb.put((byte) 0x02); // STATUS_DURABLE_ACK
+            bb.putShort((short) 1); // tableCount
+            bb.putShort((short) name.length);
+            bb.put(name);
+            bb.putLong(seqTxn);
+            return bb.array();
+        }
+
+        private static byte[] okFrame(long wireSeq, long seqTxn) {
+            byte[] name = TABLE.getBytes(StandardCharsets.UTF_8);
+            ByteBuffer bb = ByteBuffer.allocate(1 + 8 + 2 + 2 + name.length + 8)
+                    .order(ByteOrder.LITTLE_ENDIAN);
+            bb.put((byte) 0x00); // STATUS_OK
+            bb.putLong(wireSeq);
+            bb.putShort((short) 1); // tableCount
+            bb.putShort((short) name.length);
+            bb.put(name);
+            bb.putLong(seqTxn);
+            return bb.array();
+        }
+    }
+
+    /** Records persistent-failure escalations; the outage must produce none. */
+    private static final class CountingListener implements BackgroundDrainerListener {
+        final AtomicInteger persistentFailures = new AtomicInteger();
+
+        @Override
+        public void onDurableAckPersistentFailure(String slotPath, int totalAttempts, long elapsedMillis) {
+            persistentFailures.incrementAndGet();
+        }
+
+        @Override
+        public void onDurableAckUnavailable(String slotPath, int attemptNumber) {
+            // transport errors never fire this; nothing to record
+        }
+    }
+
+    /**
+     * Real-wire connect factory: every call performs a genuine TCP connect +
+     * WebSocket upgrade against the fixed loopback port -- refused while the
+     * server is down, a live upgraded client once it is up. Exactly the client
+     * the production connect walk would hand back.
+     */
+    private static final class WireFactory implements CursorWebSocketSendLoop.ReconnectFactory {
+        private final AtomicInteger calls = new AtomicInteger();
+        private final int port;
+
+        WireFactory(int port) {
+            this.port = port;
+        }
+
+        int attempts() {
+            return calls.get();
+        }
+
+        @Override
+        public WebSocketClient reconnect() throws Exception {
+            calls.incrementAndGet();
+            WebSocketClient c = WebSocketClientFactory.newPlainTextInstance();
+            try {
+                c.setQwpMaxVersion(1);
+                c.setQwpRequestDurableAck(true);
+                c.setConnectTimeout(5_000);
+                c.connect("localhost", port);
+                c.upgrade("/write/v4", 5_000, null);
+            } catch (Throwable t) {
+                c.close();
+                throw t;
+            }
+            return c;
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CloseOwnershipRaceTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CloseOwnershipRaceTest.java
index f4cbffd1..e04bf15c 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CloseOwnershipRaceTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CloseOwnershipRaceTest.java
@@ -24,6 +24,7 @@
 
 package io.questdb.client.test.cutlass.qwp.client.sf.cursor;
 
+import io.questdb.client.cutlass.qwp.client.QwpAuthFailedException;
 import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine;
 import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop;
 import org.junit.Assert;
@@ -59,16 +60,19 @@ public void closeOwnershipSnapshotNeverClaimsAnUnsurfacedError() {
                 sfDir.getRoot().getAbsolutePath(), 16_384)) {
             Throwable leaked = null;
             for (int i = 0; i < ROUNDS && leaked == null; i++) {
-                // A null client, a reconnect factory that never produces one,
-                // and a zero reconnect budget: start()'s real I/O thread walks
-                // the production async-initial-connect path and latches a
-                // genuine RECONNECT_BUDGET_EXHAUSTED terminal within
-                // microseconds. One authentic null->error latch transition
-                // per round.
+                // A null client and a reconnect factory that throws a genuine
+                // terminal auth reject: start()'s real I/O thread walks the
+                // production async-initial-connect path and latches a genuine
+                // (SECURITY_ERROR) terminal within microseconds. One authentic
+                // null->error latch transition per round. (Under Invariant B a
+                // connection error / budget would retry forever and never latch;
+                // only a genuine terminal like auth does.)
                 CursorWebSocketSendLoop loop = new CursorWebSocketSendLoop(
                         null, engine, 0, 1_000_000L,
-                        () -> null,
-                        0,  // reconnect budget: exhausted on arrival
+                        () -> {
+                            throw new QwpAuthFailedException(401, "localhost", 1);
+                        },
+                        0,
                         1, 1);
                 loop.start();
                 // Race close()'s exact ownership snapshot against the latch
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopInterruptedCloseLeakTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopInterruptedCloseLeakTest.java
new file mode 100644
index 00000000..d0e1a8d2
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopInterruptedCloseLeakTest.java
@@ -0,0 +1,208 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client.sf.cursor;
+
+import io.questdb.client.DefaultHttpClientConfiguration;
+import io.questdb.client.cutlass.http.client.WebSocketClient;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorSendEngine;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop;
+import io.questdb.client.network.PlainSocketFactory;
+import io.questdb.client.std.Compat;
+import io.questdb.client.test.tools.TestUtils;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicReference;
+
+/**
+ * Red test for finding C5 — interrupted drainer teardown abandons a client
+ * installed by an in-flight reconnect.
+ * <p>
+ * Production sequence being modeled: during a server outage an orphan
+ * drainer's {@link CursorWebSocketSendLoop} I/O thread sits inside a
+ * blocking native connect ({@code connect_timeout} defaults to 0 = OS
+ * timeout, tens of seconds; neither {@code unpark} nor interrupt cancels
+ * {@code connect(2)}). Under Invariant B the drainer no longer exits on a
+ * wall-clock budget, so {@code BackgroundDrainerPool.close()} routinely
+ * escalates: 2500&nbsp;ms graceful drain &rarr; {@code requestStop()} &rarr;
+ * 500&nbsp;ms grace &rarr; {@code shutdownNow()}. The {@code shutdownNow()}
+ * interrupt lands in {@code loop.close()}'s {@code shutdownLatch.await()};
+ * pre-fix, {@code close()} swallows the {@link InterruptedException},
+ * re-interrupts, and returns while the I/O thread is still alive. When the
+ * in-flight {@code reconnect()} subsequently succeeds, {@code swapClient}
+ * installs the live client into the abandoned loop — and no code path ever
+ * closes it: {@code loop.close()} already ran (its {@code client} read saw
+ * null), and {@code ioLoop}'s exit path only counts down the latch. The
+ * client's native socket, fds and buffers leak for the life of the process.
+ * <p>
+ * The test pins the fix-agnostic ownership contract, not a fix strategy:
+ * <b>every {@code WebSocketClient} the loop obtains — via constructor or
+ * factory — must be closed by the time the loop is quiescent (I/O thread
+ * exited, {@code close()} completed or failed loudly).</b> Any of the
+ * candidate fixes satisfies it: (a) re-awaiting the shutdown latch in a
+ * loop (close() then picks up the swapped client), (b) closing the current
+ * client in {@code ioLoop}'s exit path, or (c) {@code connectLoop}
+ * discarding-and-closing a factory client obtained after {@code running}
+ * went false. A guard-only fix that merely skips engine teardown (the
+ * SEGV half of C5) correctly leaves this test red — the leak is a distinct
+ * defect.
+ * <p>
+ * Determinism notes: no sleeps or timing races. The interrupt is injected
+ * by pre-setting the closer thread's interrupt flag —
+ * {@code CountDownLatch.await()} checks {@code Thread.interrupted()} before
+ * parking, so the swallow path is entered on the first call. The "stuck
+ * native connect" is a factory blocked on a test latch, which is faithful:
+ * a latch await re-parks after {@code close()}'s spurious {@code unpark},
+ * and {@code close()} never interrupts the I/O thread.
+ */
+public class CursorWebSocketSendLoopInterruptedCloseLeakTest {
+
+    @Test
+    public void testC5_interruptedCloseMustNotLeakClientInstalledByInFlightReconnect() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            final CountDownLatch enteredReconnect = new CountDownLatch(1);
+            final CountDownLatch releaseConnect = new CountDownLatch(1);
+            final AtomicReference<Thread> ioThreadRef = new AtomicReference<>();
+            final TrackingStubWebSocketClient liveClient = new TrackingStubWebSocketClient();
+
+            // Stand-in for a blocking native connect(2): entered by the loop's
+            // I/O thread, immune to unpark, never interrupted by loop.close().
+            // Returns a live client once released — the "reconnect succeeds
+            // mid-teardown" arm of C5.
+            final CursorWebSocketSendLoop.ReconnectFactory stuckConnect = () -> {
+                ioThreadRef.set(Thread.currentThread());
+                enteredReconnect.countDown();
+                releaseConnect.await();
+                return liveClient;
+            };
+
+            final CursorSendEngine engine = new CursorSendEngine(null, 64 * 1024);
+            try {
+                CursorWebSocketSendLoop loop = new CursorWebSocketSendLoop(
+                        null /* async-initial-connect: the I/O thread drives the connect */,
+                        engine, 0L, 1_000L,
+                        stuckConnect,
+                        5_000L, 100L, 5_000L, false);
+                loop.start();
+                Assert.assertTrue("I/O thread never reached the reconnect factory",
+                        enteredReconnect.await(5, TimeUnit.SECONDS));
+
+                // Drainer-thread stand-in: BackgroundDrainer.run()'s finally calls
+                // loop.close() and shutdownNow()'s interrupt lands in the latch
+                // await. Pre-setting the flag makes that deterministic.
+                final AtomicReference<Throwable> closeFailure = new AtomicReference<>();
+                Thread closer = new Thread(() -> {
+                    Thread.currentThread().interrupt();
+                    try {
+                        loop.close();
+                    } catch (Throwable t) {
+                        // A close() that THROWS to signal the failed stop is a
+                        // valid fix shape (QwpWebSocketSender.close()'s
+                        // ioThreadStopped guard consumes exactly that signal).
+                        // The ownership assertion below is what must hold.
+                        closeFailure.set(t);
+                    }
+                }, "drainer-close-stand-in");
+                closer.setDaemon(true);
+                closer.start();
+
+                // close()'s first action is running=false. Once observable, the
+                // teardown is underway and the "connect" may complete. Under a
+                // re-await fix the closer is still blocked inside close() here,
+                // so the gate must open before joining it (fix-agnostic order).
+                long deadlineNanos = System.nanoTime() + TimeUnit.SECONDS.toNanos(5);
+                while (loop.isRunning()) {
+                    Assert.assertTrue("close() never started", System.nanoTime() < deadlineNanos);
+                    Compat.onSpinWait();
+                }
+                releaseConnect.countDown();
+
+                closer.join(5_000L);
+                Assert.assertFalse("closer thread did not finish", closer.isAlive());
+                Thread ioThread = ioThreadRef.get();
+                Assert.assertNotNull(ioThread);
+                ioThread.join(5_000L);
+                Assert.assertFalse("I/O thread did not exit after the connect returned",
+                        ioThread.isAlive());
+
+                // Loop is quiescent. Capture the verdict BEFORE any cleanup so
+                // the test's own close calls cannot mask the leak.
+                boolean closedByLoop = liveClient.closeCount() > 0;
+
+                Assert.assertTrue(
+                        "C5: the WebSocketClient handed to the loop by an in-flight "
+                                + "reconnect() was never closed. loop.close() swallowed the "
+                                + "InterruptedException from shutdownLatch.await() and returned "
+                                + "while the I/O thread was still inside the blocking connect; "
+                                + "swapClient then installed the live client into the abandoned "
+                                + "loop where nothing closes it — its native socket and fds leak "
+                                + "past drainer teardown. Every client the loop obtains "
+                                + "(constructor or factory) must be closed by the time the loop "
+                                + "is quiescent.",
+                        closedByLoop);
+            } finally {
+                liveClient.close();
+                engine.close();
+            }
+        });
+    }
+
+    /**
+     * Minimal concrete {@link WebSocketClient} — never performs I/O; counts
+     * {@code close()} calls so the test can assert ownership at quiescence.
+     * Close remains idempotent via the superclass, matching the production
+     * contract owners rely on.
+     */
+    private static final class TrackingStubWebSocketClient extends WebSocketClient {
+        private final AtomicInteger closeCount = new AtomicInteger();
+
+        TrackingStubWebSocketClient() {
+            super(DefaultHttpClientConfiguration.INSTANCE, PlainSocketFactory.INSTANCE);
+        }
+
+        @Override
+        public void close() {
+            closeCount.incrementAndGet();
+            super.close();
+        }
+
+        int closeCount() {
+            return closeCount.get();
+        }
+
+        @Override
+        protected void ioWait(int timeout, int op) {
+            throw new UnsupportedOperationException("stub: no socket");
+        }
+
+        @Override
+        protected void setupIoWait() {
+            // no-op
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopJvmErrorTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopJvmErrorTest.java
new file mode 100644
index 00000000..42e91748
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoopJvmErrorTest.java
@@ -0,0 +1,184 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.cutlass.qwp.client.sf.cursor;
+
+import io.questdb.client.cutlass.line.LineSenderException;
+import io.questdb.client.cutlass.qwp.client.sf.cursor.CursorWebSocketSendLoop;
+import io.questdb.client.std.Unsafe;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.lang.reflect.Field;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+/**
+ * Regression coverage (M3): {@code catch (Throwable)} in the reconnect
+ * machinery used to swallow {@link java.lang.Error} (OOM, LinkageError,
+ * StackOverflowError) into an indefinite "transport outage" retry with only
+ * a throttled, possibly-null-message WARN as a trace. A JVM/programming
+ * failure is not a transport outage -- retrying cannot clear it -- so every
+ * retry loop must rethrow {@code Error}, after latching it as terminal where
+ * a producer could otherwise hang in {@code checkError()}.
+ * <p>
+ * Uses the same {@code Unsafe.allocateInstance} bare-loop pattern as
+ * {@link CursorWebSocketSendLoopErrorLatchTest}: the retry loops only touch
+ * the fields wired below, so no live wire client or engine is needed.
+ */
+public class CursorWebSocketSendLoopJvmErrorTest {
+
+    @Test
+    public void testConnectWithRetryPropagatesJvmError() {
+        // The budgeted blocking initial-connect helper must not burn the
+        // connect budget retrying a JVM Error; it propagates to the caller
+        // (the producer thread in buildAndConnect) on the first attempt.
+        AtomicInteger attempts = new AtomicInteger();
+        try {
+            CursorWebSocketSendLoop.connectWithRetry(
+                    () -> {
+                        attempts.incrementAndGet();
+                        throw new LinkageError("simulated JVM failure");
+                    },
+                    /* maxDurationMillis */ 60_000L,
+                    /* initialBackoffMillis */ 1L,
+                    /* maxBackoffMillis */ 4L,
+                    "test initial connect");
+            Assert.fail("a JVM Error must propagate, not consume the connect budget");
+        } catch (LinkageError expected) {
+            Assert.assertEquals("simulated JVM failure", expected.getMessage());
+        }
+        Assert.assertEquals("no retry on a JVM Error", 1, attempts.get());
+    }
+
+    @Test
+    public void testConnectLoopPropagatesJvmErrorAndLatchesTerminal() throws Exception {
+        // The background per-outage reconnect loop must (1) latch the Error
+        // as terminal FIRST -- a producer parked in checkError() would
+        // otherwise never observe the failure -- and (2) rethrow so the I/O
+        // thread dies loudly instead of reconnect-looping forever.
+        CursorWebSocketSendLoop loop = newBareLoop();
+        AtomicInteger attempts = new AtomicInteger();
+        wireReconnectPlumbing(loop, attempts);
+
+        Method connectLoop = CursorWebSocketSendLoop.class.getDeclaredMethod(
+                "connectLoop", Throwable.class, String.class);
+        connectLoop.setAccessible(true);
+        try {
+            connectLoop.invoke(loop, new LineSenderException("initial wire failure"), "reconnect");
+            Assert.fail("a JVM Error must escape connectLoop, not be retried");
+        } catch (InvocationTargetException ite) {
+            Assert.assertTrue("expected LinkageError, got " + ite.getCause(),
+                    ite.getCause() instanceof LinkageError);
+        }
+        Assert.assertEquals("no retry on a JVM Error", 1, attempts.get());
+        assertErrorLatchedAndStopped(loop);
+    }
+
+    @Test
+    public void testIoLoopDoesNotFunnelJvmErrorIntoReconnect() throws Exception {
+        // ioLoop's catch (Throwable) used to funnel EVERYTHING into
+        // fail(t) -> connectLoop(t, "reconnect"). An Error must instead be
+        // latched as terminal and rethrown; the finally still counts down
+        // the shutdown latch so close() cannot hang on the dead thread.
+        CursorWebSocketSendLoop loop = newBareLoop();
+        AtomicInteger attempts = new AtomicInteger();
+        wireReconnectPlumbing(loop, attempts);
+        CountDownLatch shutdownLatch = new CountDownLatch(1);
+        setField(loop, "shutdownLatch", shutdownLatch);
+        // client == null + running routes ioLoop into attemptInitialConnect
+        // -> connectLoop -> the throwing factory, exercising the full funnel.
+
+        Method ioLoop = CursorWebSocketSendLoop.class.getDeclaredMethod("ioLoop");
+        ioLoop.setAccessible(true);
+        try {
+            ioLoop.invoke(loop);
+            Assert.fail("a JVM Error must escape ioLoop, not re-enter the reconnect loop");
+        } catch (InvocationTargetException ite) {
+            Assert.assertTrue("expected LinkageError, got " + ite.getCause(),
+                    ite.getCause() instanceof LinkageError);
+        }
+        Assert.assertEquals("no retry on a JVM Error", 1, attempts.get());
+        Assert.assertEquals("shutdown latch must count down so close() cannot hang",
+                0L, shutdownLatch.getCount());
+        assertErrorLatchedAndStopped(loop);
+    }
+
+    private static void assertErrorLatchedAndStopped(CursorWebSocketSendLoop loop)
+            throws Exception {
+        Throwable terminal = loop.getTerminalError();
+        Assert.assertNotNull("Error must be latched as terminal for checkError()", terminal);
+        Assert.assertTrue("latch wraps the raw Error once",
+                terminal instanceof LineSenderException);
+        Assert.assertTrue("latched cause must be the original Error",
+                terminal.getCause() instanceof LinkageError);
+        Assert.assertFalse("recordFatal must stop the loop",
+                (Boolean) getField(loop, "running"));
+        try {
+            loop.checkError();
+            Assert.fail("producer-facing checkError must surface the latched terminal");
+        } catch (LineSenderException thrown) {
+            Assert.assertSame(terminal, thrown);
+        }
+    }
+
+    /**
+     * Wires the minimal state the reconnect paths dereference: a factory
+     * throwing {@link LinkageError} on every attempt, live {@code running},
+     * and the attempt counters (field initializers do not run under
+     * {@code Unsafe.allocateInstance}).
+     */
+    private static void wireReconnectPlumbing(CursorWebSocketSendLoop loop,
+                                              AtomicInteger attempts) throws Exception {
+        CursorWebSocketSendLoop.ReconnectFactory factory = () -> {
+            attempts.incrementAndGet();
+            throw new LinkageError("simulated JVM failure");
+        };
+        setField(loop, "reconnectFactory", factory);
+        setField(loop, "running", true);
+        setField(loop, "totalReconnectAttempts", new AtomicLong());
+        setField(loop, "totalReconnects", new AtomicLong());
+    }
+
+    private static CursorWebSocketSendLoop newBareLoop() throws Exception {
+        // Bypass the real constructor -- no wire client or engine needed.
+        return (CursorWebSocketSendLoop) Unsafe.getUnsafe()
+                .allocateInstance(CursorWebSocketSendLoop.class);
+    }
+
+    private static Object getField(Object target, String name) throws Exception {
+        Field f = CursorWebSocketSendLoop.class.getDeclaredField(name);
+        f.setAccessible(true);
+        return f.get(target);
+    }
+
+    private static void setField(Object target, String name, Object value) throws Exception {
+        Field f = CursorWebSocketSendLoop.class.getDeclaredField(name);
+        f.setAccessible(true);
+        f.set(target, value);
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/EngineCloseSlotLockReleaseTest.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/EngineCloseSlotLockReleaseTest.java
index 19b51848..804ab6d3 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/EngineCloseSlotLockReleaseTest.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/client/sf/cursor/EngineCloseSlotLockReleaseTest.java
@@ -135,6 +135,14 @@ public void testSlotLockReleasedEvenIfRingCloseThrows() throws Exception {
             managerField.setAccessible(true);
             SegmentManager capturedManager = (SegmentManager) managerField.get(engine);
 
+            // The watermark's 16-byte mmap is also unreachable to the sabotaged
+            // close() (it NPEs before getting there), so capture and free it
+            // manually too or the leak check trips on MMAP_DEFAULT.
+            Field watermarkField = CursorSendEngine.class.getDeclaredField("watermark");
+            watermarkField.setAccessible(true);
+            io.questdb.client.cutlass.qwp.client.sf.cursor.AckWatermark capturedWatermark =
+                    (io.questdb.client.cutlass.qwp.client.sf.cursor.AckWatermark) watermarkField.get(engine);
+
             ringField.set(engine, null);
 
             try {
@@ -150,6 +158,9 @@ public void testSlotLockReleasedEvenIfRingCloseThrows() throws Exception {
             // are an artifact of the sabotage.
             capturedRing.close();
             capturedManager.close();
+            if (capturedWatermark != null) {
+                capturedWatermark.close();
+            }
 
             // The user-visible test: can a fresh SlotLock acquire the
             // same slot? If the original lock fd is still held, the
diff --git a/core/src/test/java/io/questdb/client/test/cutlass/qwp/websocket/TestWebSocketServer.java b/core/src/test/java/io/questdb/client/test/cutlass/qwp/websocket/TestWebSocketServer.java
index 806d3750..e9f380d2 100644
--- a/core/src/test/java/io/questdb/client/test/cutlass/qwp/websocket/TestWebSocketServer.java
+++ b/core/src/test/java/io/questdb/client/test/cutlass/qwp/websocket/TestWebSocketServer.java
@@ -83,12 +83,25 @@ public class TestWebSocketServer implements Closeable {
     // QwpQueryClient tests enable this; ingress sender tests leave it off so their
     // connections carry only ACK frames.
     private volatile boolean sendServerInfo;
+    // When true, the server fails the WebSocket upgrade on the egress read path
+    // (/read...) by dropping the connection before the 101, while still serving
+    // the ingest write path (/write...) normally. Lets one server + one cluster
+    // config drive a build where the sender pool connects but the query pool
+    // cannot. Set via setRejectReadUpgrade().
+    private volatile boolean rejectReadUpgrade;
     // When non-null the next handshake responds with HTTP 421 Misdirected
     // Request + X-QuestDB-Role: <rejectingRole>, mimicking a server whose
     // QwpServerInfoProvider reports REPLICA / PRIMARY_CATCHUP. Set after
     // construction via setRejectWithRole().
     private volatile String rejectingRole;
     private volatile int rejectingStatusCode;
+    // When true, 101 upgrade responses omit the X-QWP-Durable-Ack header even
+    // though the server was constructed with emitDurableAckHeader=true --
+    // simulating a rolling-upgrade window where an endpoint upgrades but does
+    // not advertise durable ack (the drainer's capability-gap condition).
+    // Live-updatable via setSuppressDurableAckHeader(), so a test can start
+    // in the gap and later let the cluster "settle".
+    private volatile boolean suppressDurableAckHeader;
     // When > 0, the next handshake responds with this status code + the
     // reason phrase from {@link #rejectingStatusReason}. Used to simulate
     // 401, 403, 404, 426, 503, etc. that the failover loop should
@@ -120,6 +133,23 @@ public TestWebSocketServer(WebSocketServerHandler handler, boolean emitDurableAc
      */
     public TestWebSocketServer(WebSocketServerHandler handler,
                                boolean emitDurableAckHeader, String advertisedRole) throws IOException {
+        this(handler, emitDurableAckHeader, advertisedRole, 0);
+    }
+
+    /**
+     * @param requestedPort loopback port to bind, or {@code 0} for an
+     *                      OS-assigned ephemeral port. A caller-chosen port
+     *                      lets a test model a server that goes DOWN and later
+     *                      comes back UP on the SAME endpoint (down-then-up
+     *                      outage realism): allocate via
+     *                      {@code TestPorts.findUnusedPort()}, let the client
+     *                      bang on the refused port, then bind here. Carries
+     *                      the standard bind-close-reuse exposure every
+     *                      pre-selected-port test in this suite accepts.
+     */
+    public TestWebSocketServer(WebSocketServerHandler handler,
+                               boolean emitDurableAckHeader, String advertisedRole,
+                               int requestedPort) throws IOException {
         this.handler = handler;
         this.emitDurableAckHeader = emitDurableAckHeader;
         this.advertisedRole = advertisedRole;
@@ -129,7 +159,7 @@ public TestWebSocketServer(WebSocketServerHandler handler,
         // which another process could grab a pre-selected port before start()
         // binds it. Pinning to loopback keeps client "localhost" connections
         // routed here rather than to a wildcard listener on the same port.
-        serverSocket = new ServerSocket(0, 50, java.net.InetAddress.getLoopbackAddress());
+        serverSocket = new ServerSocket(requestedPort, 50, java.net.InetAddress.getLoopbackAddress());
         serverSocket.setSoTimeout(100);
         this.port = serverSocket.getLocalPort();
     }
@@ -208,6 +238,18 @@ public void setRejectWithRole(String role) {
         this.rejectingRole = role;
     }
 
+    /**
+     * When enabled, the server fails the WebSocket upgrade on the egress read
+     * path ({@code /read/...}) while still serving the ingest write path
+     * ({@code /write/...}) normally. This lets a single server, addressed by a
+     * single cluster config, accept ingest senders but reject query clients --
+     * e.g. to exercise build()'s unwind of an already-built sender pool when the
+     * query pool fails.
+     */
+    public void setRejectReadUpgrade(boolean rejectReadUpgrade) {
+        this.rejectReadUpgrade = rejectReadUpgrade;
+    }
+
     /**
      * Configure the server to reject the next handshake with an arbitrary
      * HTTP status code (e.g. 401, 403, 404, 426, 503). Pass {@code 0} to
@@ -219,11 +261,26 @@ public void setRejectWithStatus(int statusCode, String reasonPhrase) {
         this.rejectingStatusReason = reasonPhrase;
     }
 
+    /**
+     * When enabled, 101 upgrade responses omit the {@code X-QWP-Durable-Ack}
+     * header even on a server constructed with {@code emitDurableAckHeader} —
+     * the next opted-in connect ({@code request_durable_ack=on}) observes a
+     * durable-ack capability gap. Pass {@code false} to clear and resume
+     * advertising, the way a rolling upgrade eventually settles. The setting
+     * applies to every new handshake until cleared.
+     */
+    public void setSuppressDurableAckHeader(boolean suppressDurableAckHeader) {
+        this.suppressDurableAckHeader = suppressDurableAckHeader;
+    }
+
     /**
      * When enabled, the server sends a {@code SERVER_INFO} frame immediately
-     * after a successful 101 upgrade, the way a real egress endpoint does. The
-     * advertised role follows {@link #setAdvertisedRole}, defaulting to
-     * {@code STANDALONE}. Leave disabled for ingress (Sender) tests.
+     * after a successful 101 upgrade on the egress read path ({@code /read/...}),
+     * the way a real egress endpoint does. Ingest write-path ({@code /write/...})
+     * connections never receive it -- their ACK-only response stream would choke
+     * on an unexpected frame -- so one server can serve both an ingest and a
+     * query pool from a single cluster config. The advertised role follows
+     * {@link #setAdvertisedRole}, defaulting to {@code STANDALONE}.
      */
     public void setSendServerInfo(boolean sendServerInfo) {
         this.sendServerInfo = sendServerInfo;
@@ -251,6 +308,10 @@ private static byte[] buildServerInfoFrame(byte role) {
         return bb.array();
     }
 
+    private static boolean isReadPath(String path) {
+        return path != null && path.startsWith("/read");
+    }
+
     private static byte roleByte(String role) {
         if (role == null) {
             return 0; // ROLE_STANDALONE
@@ -313,6 +374,10 @@ public class ClientHandler implements Closeable {
         private boolean isClosed;
         private OutputStream out;
         private Thread readThread;
+        // Request path from the WebSocket upgrade GET line (e.g. /write/v4,
+        // /read/v1). Captured during the handshake so the post-upgrade logic can
+        // distinguish ingest from egress connections.
+        private String requestPath = "";
 
         ClientHandler(Socket socket) {
             this.socket = socket;
@@ -459,7 +524,15 @@ private boolean performHandshake() throws IOException {
             }
 
             String key = null;
-            for (String line : request.toString().split("\r\n")) {
+            String[] lines = request.toString().split("\r\n");
+            if (lines.length > 0) {
+                // GET <path> HTTP/1.1
+                String[] parts = lines[0].split(" ");
+                if (parts.length >= 2) {
+                    requestPath = parts[1];
+                }
+            }
+            for (String line : lines) {
                 if (line.toLowerCase().startsWith("sec-websocket-key:")) {
                     key = line.substring(18).trim();
                     break;
@@ -470,6 +543,13 @@ private boolean performHandshake() throws IOException {
                 return false;
             }
 
+            // Read-path reject: drop the egress upgrade before the 101 so the
+            // query pool's connect fails fast, while ingest write-path upgrades
+            // still complete on this same server.
+            if (rejectReadUpgrade && isReadPath(requestPath)) {
+                return false;
+            }
+
             // Arbitrary-status reject path: tests use setRejectWithStatus
             // to drive the failover loop's terminal-vs-transient
             // classification (failover.md §6).
@@ -509,7 +589,7 @@ private boolean performHandshake() throws IOException {
                     .append("Upgrade: websocket\r\n")
                     .append("Connection: Upgrade\r\n")
                     .append("Sec-WebSocket-Accept: ").append(acceptKey).append("\r\n");
-            if (emitDurableAckHeader) {
+            if (emitDurableAckHeader && !suppressDurableAckHeader) {
                 sb.append("X-QWP-Durable-Ack: enabled\r\n");
             }
             String role = advertisedRole;
@@ -566,7 +646,11 @@ void start() {
                     liveConnections.incrementAndGet();
 
                     try {
-                        if (sendServerInfo) {
+                        // SERVER_INFO is an egress-only frame: send it only on a
+                        // read-path (query) connection. An ingest write-path
+                        // connection parses every inbound frame as an ACK and
+                        // would fail on it.
+                        if (sendServerInfo && isReadPath(requestPath)) {
                             sendBinary(buildServerInfoFrame(roleByte(advertisedRole)));
                         }
 
diff --git a/core/src/test/java/io/questdb/client/test/example/QuestDBExamples.java b/core/src/test/java/io/questdb/client/test/example/QuestDBExamples.java
index bd3e944a..1aa681f4 100644
--- a/core/src/test/java/io/questdb/client/test/example/QuestDBExamples.java
+++ b/core/src/test/java/io/questdb/client/test/example/QuestDBExamples.java
@@ -44,11 +44,11 @@
 public class QuestDBExamples {
 
     public static void main(String[] args) throws Exception {
-        // 1. Connect with a single configuration string. Both sides run over
-        //    QWP/WebSocket, so one ws:: string configures ingest and egress.
-        try (QuestDB db = QuestDB.connect("ws::addr=localhost:9000;")) {
+        // 1. Connect with a single configuration string for the whole cluster.
+        //    Both sides run over QWP/WebSocket, so one ws:: string configures
+        //    ingest and egress; list every node in one addr server list.
+        try (QuestDB db = QuestDB.connect("ws::addr=node1:9000,node2:9000,node3:9000;")) {
             ingestWithBorrowedSender(db);
-            ingestWithThreadAffineSender(db);
             queryOneShot(db);
             queryWithBinds(db);
             cancelExample(db);
@@ -59,21 +59,24 @@ public static void main(String[] args) throws Exception {
         try (QuestDB db = QuestDB.connect(
                 "wss::addr=db.questdb.cloud:9000;token=YOUR_TOKEN_HERE;")) {
             // ... use db ...
-            db.executeSql("SELECT 1", new PrintingHandler()).await();
+            try (Query q = db.borrowQuery()) {
+                q.sql("SELECT 1").handler(new PrintingHandler()).submit().await();
+            }
         }
 
-        // 3. Custom pool sizing and timeouts via the builder. Use this when
-        //    ingest and egress use separate address lists, or when you need to
-        //    override defaults.
+        // 3. Custom pool sizing and timeouts via the builder. One cluster config
+        //    (a single addr server list) drives both pools; use the builder to
+        //    override pool/timeout defaults.
         try (QuestDB db = QuestDB.builder()
-                .ingestConfig("ws::addr=ingest.cluster:9000;")
-                .queryConfig("ws::addr=read-replica.cluster:9000;")
+                .fromConfig("ws::addr=node1.cluster:9000,node2.cluster:9000;")
                 .senderPoolSize(8)
                 .queryPoolSize(4)
                 .acquireTimeoutMillis(10_000)
                 .build()) {
             // ... use db ...
-            db.executeSql("SELECT 1", new PrintingHandler()).await();
+            try (Query q = db.borrowQuery()) {
+                q.sql("SELECT 1").handler(new PrintingHandler()).submit().await();
+            }
         }
     }
 
@@ -84,15 +87,17 @@ public static void main(String[] args) throws Exception {
      * returns normally; either way the Completion reaches a terminal state.
      */
     static void cancelExample(QuestDB db) {
-        Completion c = db.executeSql(
-                "SELECT * FROM big_table ORDER BY ts",
-                new PrintingHandler());
-        // ... some condition decides to abort ...
-        c.cancel();
-        try {
-            c.await();
-        } catch (Exception cancelled) {
-            // expected when cancel won the race
+        try (Query q = db.borrowQuery()) {
+            Completion c = q.sql("SELECT * FROM big_table ORDER BY ts")
+                    .handler(new PrintingHandler())
+                    .submit();
+            // ... some condition decides to abort ...
+            c.cancel();
+            try {
+                c.await();
+            } catch (Exception cancelled) {
+                // expected when cancel won the race
+            }
         }
     }
 
@@ -113,62 +118,42 @@ static void ingestWithBorrowedSender(QuestDB db) {
     }
 
     /**
-     * Thread-affine Sender: the first call on a thread leases one and pins it;
-     * subsequent calls on the same thread return the same instance with zero
-     * borrow overhead. Best for long-lived dedicated producer threads.
-     * <p>
-     * Call {@link QuestDB#releaseSender()} on threads borrowed from pools you
-     * don't own (Netty event loops, etc.) before they're recycled.
-     */
-    static void ingestWithThreadAffineSender(QuestDB db) {
-        Sender s = db.sender();
-        for (int i = 0; i < 1_000; i++) {
-            s.table("trades")
-                    .symbol("symbol", "BTC-USD")
-                    .doubleColumn("price", 42_500.50 + i)
-                    .longColumn("size", 100)
-                    .atNow();
-        }
-        s.flush();
-        // Not strictly required: db.close() reaps pinned Senders. Call it
-        // only when handing this thread back to a foreign pool.
-        // db.releaseSender();
-    }
-
-    /**
-     * One-shot query, no bind parameters. {@link QuestDB#executeSql} returns
-     * a {@link Completion} that you can {@code await()} synchronously, time
+     * One-shot query, no bind parameters. Borrow a {@link Query} handle,
+     * submit, await, and close it (try-with-resources). {@code submit()}
+     * returns a {@link Completion} you can {@code await()} synchronously, time
      * out on, or cancel.
      */
     static void queryOneShot(QuestDB db) throws InterruptedException {
-        Completion c = db.executeSql(
-                "SELECT price FROM trades WHERE symbol = 'BTC-USD' LIMIT 10",
-                new PrintingHandler());
-        c.await();
+        try (Query q = db.borrowQuery()) {
+            q.sql("SELECT price FROM trades WHERE symbol = 'BTC-USD' LIMIT 10")
+                    .handler(new PrintingHandler())
+                    .submit()
+                    .await();
+        }
     }
 
     /**
-     * Query with bind parameters. Use {@link QuestDB#query()} to get the
-     * per-thread Query builder, then set SQL, binds (via QwpBindSetter), and
-     * handler.
+     * Query with bind parameters. Borrow a {@link Query} handle, then set SQL,
+     * binds (via QwpBindSetter), and handler.
      * <p>
      * The same SQL text reuses the server's compiled-factory cache -- bind
      * values supply the per-call inputs. Interpolating values into the SQL
      * string defeats that cache.
      */
     static void queryWithBinds(QuestDB db) throws InterruptedException {
-        Query q = db.query()
-                .sql("SELECT price FROM trades WHERE symbol = $1 LIMIT $2")
-                .binds(binds -> {
-                    binds.setVarchar(0, "BTC-USD");
-                    binds.setLong(1, 10L);
-                })
-                .handler(new PrintingHandler());
-        Completion c = q.submit();
-        // Optional timeout: returns false if the query is still in flight.
-        if (!c.await(5, TimeUnit.SECONDS)) {
-            c.cancel();
-            c.await();
+        try (Query q = db.borrowQuery()) {
+            q.sql("SELECT price FROM trades WHERE symbol = $1 LIMIT $2")
+                    .binds(binds -> {
+                        binds.setVarchar(0, "BTC-USD");
+                        binds.setLong(1, 10L);
+                    })
+                    .handler(new PrintingHandler());
+            Completion c = q.submit();
+            // Optional timeout: returns false if the query is still in flight.
+            if (!c.await(5, TimeUnit.SECONDS)) {
+                c.cancel();
+                c.await();
+            }
         }
     }
 
diff --git a/core/src/test/java/io/questdb/client/test/impl/ConfigViewTest.java b/core/src/test/java/io/questdb/client/test/impl/ConfigViewTest.java
index 38891719..d8258c3b 100644
--- a/core/src/test/java/io/questdb/client/test/impl/ConfigViewTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/ConfigViewTest.java
@@ -129,6 +129,35 @@ public void testGetLongNonNumericRejected() {
                 "invalid auth_timeout_ms: abc");
     }
 
+    @Test
+    public void testGetBoolAcceptsTrueFalseOnOff() {
+        Assert.assertTrue(view("ws::addr=h:9000;lazy_connect=true;").getBool("lazy_connect", false));
+        Assert.assertTrue(view("ws::addr=h:9000;lazy_connect=on;").getBool("lazy_connect", false));
+        Assert.assertFalse(view("ws::addr=h:9000;lazy_connect=false;").getBool("lazy_connect", true));
+        Assert.assertFalse(view("ws::addr=h:9000;lazy_connect=off;").getBool("lazy_connect", true));
+        // absent key -> caller's default, both polarities
+        Assert.assertTrue(view("ws::addr=h:9000;").getBool("lazy_connect", true));
+        Assert.assertFalse(view("ws::addr=h:9000;").getBool("lazy_connect", false));
+    }
+
+    @Test
+    public void testGetBoolInvalidRejected() {
+        assertParseError("ws::addr=h:9000;lazy_connect=maybe;",
+                v -> v.getBool("lazy_connect", false),
+                "invalid lazy_connect: maybe (expected true, false, on, off)");
+    }
+
+    @Test
+    public void testGetBoolIsCaseSensitive() {
+        // The connect-string value surface is exact-match lowercase: the
+        // tokenizer preserves value case and getBool accepts only
+        // true/false/on/off, so TRUE is rejected loudly rather than silently
+        // coerced (or worse, silently treated as the default).
+        assertParseError("ws::addr=h:9000;lazy_connect=TRUE;",
+                v -> v.getBool("lazy_connect", false),
+                "invalid lazy_connect: TRUE (expected true, false, on, off)");
+    }
+
     @Test
     public void testGetBoolOnOffInvalidRejected() {
         assertParseError("ws::addr=h:9000;failover=maybe;",
diff --git a/core/src/test/java/io/questdb/client/test/impl/PoolConfigHonoredTest.java b/core/src/test/java/io/questdb/client/test/impl/PoolConfigHonoredTest.java
index 34ba4d1a..a8c94e40 100644
--- a/core/src/test/java/io/questdb/client/test/impl/PoolConfigHonoredTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/PoolConfigHonoredTest.java
@@ -28,6 +28,7 @@
 import io.questdb.client.QuestDBBuilder;
 import io.questdb.client.impl.ConfigSchema;
 import io.questdb.client.impl.Side;
+import io.questdb.client.test.tools.TestUtils;
 import org.junit.Assert;
 import org.junit.Test;
 
@@ -43,40 +44,49 @@
 public class PoolConfigHonoredTest {
 
     @Test
-    public void testEveryPoolKeyIsHonored() {
-        // Drive both the value assertions and the drift guard from one map, so the
-        // coverage check cannot drift from what is actually asserted. min=0 keys
-        // let build() resolve the pool keys without pre-warming/connecting. Pool
-        // sizes resolve to int, the timeouts to long (the snapshot's boxed types).
-        Map<String, Object> expected = new LinkedHashMap<>();
-        expected.put("sender_pool_min", 0);
-        expected.put("sender_pool_max", 7);
-        expected.put("query_pool_min", 0);
-        expected.put("query_pool_max", 5);
-        expected.put("acquire_timeout_ms", 1234L);
-        expected.put("idle_timeout_ms", 4321L);
-        expected.put("max_lifetime_ms", 98765L);
-        expected.put("housekeeper_interval_ms", 222L);
+    public void testEveryPoolKeyIsHonored() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            // Drive both the value assertions and the drift guard from one map, so the
+            // coverage check cannot drift from what is actually asserted. min=0 keys
+            // let build() resolve the pool keys without pre-warming/connecting. Pool
+            // sizes resolve to int, the timeouts to long (the snapshot's boxed types).
+            Map<String, Object> expected = new LinkedHashMap<>();
+            expected.put("sender_pool_min", 0);
+            expected.put("sender_pool_max", 7);
+            expected.put("query_pool_min", 0);
+            expected.put("query_pool_max", 5);
+            expected.put("acquire_timeout_ms", 1234L);
+            expected.put("query_close_timeout_ms", 2468L);
+            expected.put("idle_timeout_ms", 4321L);
+            expected.put("max_lifetime_ms", 98765L);
+            expected.put("housekeeper_interval_ms", 222L);
 
-        StringBuilder cfg = new StringBuilder("ws::addr=127.0.0.1:1;");
-        for (Map.Entry<String, Object> e : expected.entrySet()) {
-            cfg.append(e.getKey()).append('=').append(e.getValue()).append(';');
-        }
-        QuestDBBuilder b = QuestDB.builder().fromConfig(cfg.toString());
-        b.build().close();
+            StringBuilder cfg = new StringBuilder("ws::addr=127.0.0.1:1;");
+            for (Map.Entry<String, Object> e : expected.entrySet()) {
+                cfg.append(e.getKey()).append('=').append(e.getValue()).append(';');
+            }
+            QuestDBBuilder b = QuestDB.builder().fromConfig(cfg.toString());
+            b.build().close();
 
-        Map<String, Object> snap = b.poolConfigSnapshotForTest();
-        for (Map.Entry<String, Object> e : expected.entrySet()) {
-            Assert.assertEquals("pool key '" + e.getKey() + "' not honored", e.getValue(), snap.get(e.getKey()));
-        }
+            Map<String, Object> snap = b.poolConfigSnapshotForTest();
+            for (Map.Entry<String, Object> e : expected.entrySet()) {
+                Assert.assertEquals("pool key '" + e.getKey() + "' not honored", e.getValue(), snap.get(e.getKey()));
+            }
 
-        // Drift guard: every POOL registry key must appear in the map that drove
-        // the assertions above, so a new pool key with no assertion trips this.
-        for (ConfigSchema.KeySpec spec : ConfigSchema.all()) {
-            if (spec.side() == Side.POOL) {
-                Assert.assertTrue("registry pool key '" + spec.name() + "' has no honored assertion",
-                        expected.containsKey(spec.name()));
+            // Drift guard: every POOL registry key must appear in the map that drove
+            // the assertions above, so a new pool key with no assertion trips this.
+            for (ConfigSchema.KeySpec spec : ConfigSchema.all()) {
+                if (spec.side() == Side.POOL) {
+                    // lazy_connect is a facade flag (build()'s tolerant-startup
+                    // branch, covered by QuestDBLazyConnectTest), not a numeric
+                    // pool-sizing knob resolved into the snapshot.
+                    if ("lazy_connect".equals(spec.name())) {
+                        continue;
+                    }
+                    Assert.assertTrue("registry pool key '" + spec.name() + "' has no honored assertion",
+                            expected.containsKey(spec.name()));
+                }
             }
-        }
+        });
     }
 }
diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolErrorSafetyTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolErrorSafetyTest.java
index 3994a1d2..3ef9a1b0 100644
--- a/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolErrorSafetyTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolErrorSafetyTest.java
@@ -30,11 +30,10 @@
 import io.questdb.client.impl.QueryWorker;
 import io.questdb.client.std.MemoryTag;
 import io.questdb.client.std.Unsafe;
+import io.questdb.client.test.tools.TestUtils;
 import org.junit.Assert;
 import org.junit.Test;
 
-import java.lang.reflect.Constructor;
-import java.lang.reflect.Method;
 import java.util.concurrent.atomic.AtomicInteger;
 import java.util.function.Consumer;
 
@@ -44,8 +43,8 @@
 // OutOfMemoryError); the old catches let that Error skip cleanup.
 //
 // QwpQueryClient is a concrete class with no fake seam, so these tests inject an
-// Error at the real connect step via the package-private connectHook constructor
-// (reached by reflection -- the main module is declared `open`). fromConfig()
+// Error at the real connect step via the public connectHook constructor.
+// fromConfig()
 // still runs for real, committing the NATIVE_DEFAULT scratch the cleanup must
 // reclaim, so the memory assertions are meaningful.
 public class QueryClientPoolErrorSafetyTest {
@@ -61,22 +60,24 @@ public class QueryClientPoolErrorSafetyTest {
     // GREEN: catch (Throwable) -> client.close() runs -> no leak.
     @Test(timeout = 30_000)
     public void acquireDoesNotLeakNativeScratchOnErrorFromConnect() throws Exception {
-        QueryClientPool pool = newPool(CFG, 0, 1, 250, alwaysThrow());
-        try {
-            long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+        TestUtils.assertMemoryLeak(() -> {
+            QueryClientPool pool = newPool(CFG, 0, 1, 250, alwaysThrow());
             try {
-                pool.acquire();
-                Assert.fail("expected acquire() to propagate the injected Error");
-            } catch (Throwable expected) {
-                // wrapped or raw -- the leak check is the discriminator
+                long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+                try {
+                    pool.acquire();
+                    Assert.fail("expected acquire() to propagate the injected Error");
+                } catch (Throwable expected) {
+                    // wrapped or raw -- the leak check is the discriminator
+                }
+                long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+                Assert.assertEquals(
+                        "acquire() leaked NATIVE_DEFAULT scratch on an Error from connect()",
+                        baseline, after);
+            } finally {
+                pool.close();
             }
-            long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
-            Assert.assertEquals(
-                    "acquire() leaked NATIVE_DEFAULT scratch on an Error from connect()",
-                    baseline, after);
-        } finally {
-            pool.close();
-        }
+        });
     }
 
     // Site: acquire() outer catch around createUnlocked()/start(). An Error must
@@ -86,35 +87,37 @@ public void acquireDoesNotLeakNativeScratchOnErrorFromConnect() throws Exception
     // GREEN: catch (Throwable) -> inFlightCreations restored to 0.
     @Test(timeout = 30_000)
     public void acquireRestoresInFlightCreationsOnErrorFromConnect() throws Exception {
-        QueryClientPool pool = newPool(CFG, 0, 1, 250, alwaysThrow());
-        try {
+        TestUtils.assertMemoryLeak(() -> {
+            QueryClientPool pool = newPool(CFG, 0, 1, 250, alwaysThrow());
             try {
-                pool.acquire();
-                Assert.fail("expected acquire() to propagate the injected Error");
-            } catch (Throwable expected) {
-                // expected
-            }
+                try {
+                    pool.acquire();
+                    Assert.fail("expected acquire() to propagate the injected Error");
+                } catch (Throwable expected) {
+                    // expected
+                }
 
-            Assert.assertEquals(
-                    "acquire() leaked an in-flight creation slot on an Error from connect()",
-                    0, inFlightCreations(pool));
+                Assert.assertEquals(
+                        "acquire() leaked an in-flight creation slot on an Error from connect()",
+                        0, inFlightCreations(pool));
 
-            // Corollary: capacity is usable again -- the next acquire() must
-            // reach the creation path (and fail there) rather than time out.
-            try {
-                pool.acquire();
-                Assert.fail("expected second acquire() to re-attempt creation");
-            } catch (QueryException e) {
-                Assert.assertFalse(
-                        "pool wedged: second acquire() timed out -> capacity permanently lost ("
-                                + e.getMessage() + ")",
-                        e.getMessage() != null && e.getMessage().contains("timed out"));
-            } catch (Throwable injectedAgain) {
-                // also fine: the Error surfaced again from the re-attempt
+                // Corollary: capacity is usable again -- the next acquire() must
+                // reach the creation path (and fail there) rather than time out.
+                try {
+                    pool.acquire();
+                    Assert.fail("expected second acquire() to re-attempt creation");
+                } catch (QueryException e) {
+                    Assert.assertFalse(
+                            "pool wedged: second acquire() timed out -> capacity permanently lost ("
+                                    + e.getMessage() + ")",
+                            e.getMessage() != null && e.getMessage().contains("timed out"));
+                } catch (Throwable injectedAgain) {
+                    // also fine: the Error surfaced again from the re-attempt
+                }
+            } finally {
+                pool.close();
             }
-        } finally {
-            pool.close();
-        }
+        });
     }
 
     // Site: constructor prewarm outer catch. An Error mid-prewarm must run the
@@ -124,25 +127,27 @@ public void acquireRestoresInFlightCreationsOnErrorFromConnect() throws Exceptio
     // GREEN: catch (Throwable) -> cleanup loop closes it -> no leak.
     @Test(timeout = 30_000)
     public void preWarmDoesNotLeakNativeScratchOnErrorFromConnect() throws Exception {
-        long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
-        // First connect() succeeds (no-op, leaves the client unconnected but
-        // built); the second throws an Error mid-prewarm.
-        AtomicInteger calls = new AtomicInteger();
-        Consumer<QwpQueryClient> hook = client -> {
-            if (calls.incrementAndGet() >= 2) {
-                throw new AssertionError("injected native connect failure");
+        TestUtils.assertMemoryLeak(() -> {
+            long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+            // First connect() succeeds (no-op, leaves the client unconnected but
+            // built); the second throws an Error mid-prewarm.
+            AtomicInteger calls = new AtomicInteger();
+            Consumer<QwpQueryClient> hook = client -> {
+                if (calls.incrementAndGet() >= 2) {
+                    throw new AssertionError("injected native connect failure");
+                }
+            };
+            try {
+                newPool(CFG, 2, 2, 250, hook);
+                Assert.fail("expected prewarm to propagate the injected Error");
+            } catch (Throwable expected) {
+                // expected -- construction aborts
             }
-        };
-        try {
-            newPool(CFG, 2, 2, 250, hook);
-            Assert.fail("expected prewarm to propagate the injected Error");
-        } catch (Throwable expected) {
-            // expected -- construction aborts
-        }
-        long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
-        Assert.assertEquals(
-                "prewarm leaked NATIVE_DEFAULT scratch of an already-built worker on an Error",
-                baseline, after);
+            long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+            Assert.assertEquals(
+                    "prewarm leaked NATIVE_DEFAULT scratch of an already-built worker on an Error",
+                    baseline, after);
+        });
     }
 
     // Site: acquire() outer catch around createUnlocked()/start(). When start()
@@ -155,26 +160,28 @@ public void preWarmDoesNotLeakNativeScratchOnErrorFromConnect() throws Exception
     // GREEN: catch calls created.shutdown() -> client.close() -> no leak.
     @Test(timeout = 30_000)
     public void acquireDoesNotLeakNativeScratchOnErrorFromStart() throws Exception {
-        QueryClientPool pool = newPool(CFG, 0, 1, 250, noConnect(), alwaysThrowStart());
-        try {
-            long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+        TestUtils.assertMemoryLeak(() -> {
+            QueryClientPool pool = newPool(CFG, 0, 1, 250, noConnect(), alwaysThrowStart());
             try {
-                pool.acquire();
-                Assert.fail("expected acquire() to propagate the injected start Error");
-            } catch (Throwable expected) {
-                // wrapped or raw -- the leak check is the discriminator
+                long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+                try {
+                    pool.acquire();
+                    Assert.fail("expected acquire() to propagate the injected start Error");
+                } catch (Throwable expected) {
+                    // wrapped or raw -- the leak check is the discriminator
+                }
+                long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+                Assert.assertEquals(
+                        "acquire() leaked NATIVE_DEFAULT scratch on an Error from start()",
+                        baseline, after);
+                // The reservation must also be restored so the pool is not wedged.
+                Assert.assertEquals(
+                        "acquire() leaked an in-flight creation slot on an Error from start()",
+                        0, inFlightCreations(pool));
+            } finally {
+                pool.close();
             }
-            long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
-            Assert.assertEquals(
-                    "acquire() leaked NATIVE_DEFAULT scratch on an Error from start()",
-                    baseline, after);
-            // The reservation must also be restored so the pool is not wedged.
-            Assert.assertEquals(
-                    "acquire() leaked an in-flight creation slot on an Error from start()",
-                    0, inFlightCreations(pool));
-        } finally {
-            pool.close();
-        }
+        });
     }
 
     // Site: constructor prewarm. When start() throws after createUnlocked()
@@ -186,30 +193,32 @@ public void acquireDoesNotLeakNativeScratchOnErrorFromStart() throws Exception {
     // GREEN: the pending-worker teardown closes it -> no leak.
     @Test(timeout = 30_000)
     public void preWarmDoesNotLeakNativeScratchOnErrorFromStart() throws Exception {
-        long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
-        // First worker is admitted to `all` (start is a no-op here -- the test
-        // never runs queries, and an unstarted thread keeps the later
-        // shutdown()'s join() instant); the second throws at start() after its
-        // client is fully built. That second worker is the stranded one: it
-        // never made it into `all`, so only the new pending-worker teardown
-        // closes it. The assertion catches a leak of EITHER worker's scratch.
-        AtomicInteger calls = new AtomicInteger();
-        Consumer<QueryWorker> startHook = w -> {
-            if (calls.incrementAndGet() >= 2) {
-                throw new AssertionError("injected thread-start failure");
+        TestUtils.assertMemoryLeak(() -> {
+            long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+            // First worker is admitted to `all` (start is a no-op here -- the test
+            // never runs queries, and an unstarted thread keeps the later
+            // shutdown()'s join() instant); the second throws at start() after its
+            // client is fully built. That second worker is the stranded one: it
+            // never made it into `all`, so only the new pending-worker teardown
+            // closes it. The assertion catches a leak of EITHER worker's scratch.
+            AtomicInteger calls = new AtomicInteger();
+            Consumer<QueryWorker> startHook = w -> {
+                if (calls.incrementAndGet() >= 2) {
+                    throw new AssertionError("injected thread-start failure");
+                }
+                // first worker: leave the dispatch thread unstarted (see above)
+            };
+            try {
+                newPool(CFG, 2, 2, 250, noConnect(), startHook);
+                Assert.fail("expected prewarm to propagate the injected start Error");
+            } catch (Throwable expected) {
+                // expected -- construction aborts
             }
-            // first worker: leave the dispatch thread unstarted (see above)
-        };
-        try {
-            newPool(CFG, 2, 2, 250, noConnect(), startHook);
-            Assert.fail("expected prewarm to propagate the injected start Error");
-        } catch (Throwable expected) {
-            // expected -- construction aborts
-        }
-        long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
-        Assert.assertEquals(
-                "prewarm leaked NATIVE_DEFAULT scratch of a start()-failed worker",
-                baseline, after);
+            long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+            Assert.assertEquals(
+                    "prewarm leaked NATIVE_DEFAULT scratch of a start()-failed worker",
+                    baseline, after);
+        });
     }
 
     private static Consumer<QwpQueryClient> alwaysThrow() {
@@ -232,30 +241,21 @@ private static Consumer<QueryWorker> alwaysThrowStart() {
         };
     }
 
-    private static int inFlightCreations(QueryClientPool pool) throws Exception {
-        Method m = QueryClientPool.class.getDeclaredMethod("inFlightCreations");
-        m.setAccessible(true);
-        return (int) m.invoke(pool);
+    private static int inFlightCreations(QueryClientPool pool) {
+        return pool.inFlightCreations();
     }
 
     private static QueryClientPool newPool(
             String cfg, int min, int max, long acquireMs, Consumer<QwpQueryClient> connectHook
-    ) throws Exception {
-        Constructor<QueryClientPool> c = QueryClientPool.class.getDeclaredConstructor(
-                String.class, int.class, int.class, long.class, long.class, long.class, Consumer.class);
-        c.setAccessible(true);
-        return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, connectHook);
+    ) {
+        return new QueryClientPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, connectHook);
     }
 
     private static QueryClientPool newPool(
             String cfg, int min, int max, long acquireMs,
             Consumer<QwpQueryClient> connectHook, Consumer<QueryWorker> startHook
-    ) throws Exception {
-        Constructor<QueryClientPool> c = QueryClientPool.class.getDeclaredConstructor(
-                String.class, int.class, int.class, long.class, long.class, long.class,
-                Consumer.class, Consumer.class);
-        c.setAccessible(true);
-        return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE,
+    ) {
+        return new QueryClientPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE,
                 connectHook, startHook);
     }
 }
diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolLeakTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolLeakTest.java
index 53def4b5..a32a488d 100644
--- a/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolLeakTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/QueryClientPoolLeakTest.java
@@ -27,6 +27,7 @@
 import io.questdb.client.impl.QueryClientPool;
 import io.questdb.client.std.MemoryTag;
 import io.questdb.client.std.Unsafe;
+import io.questdb.client.test.tools.TestUtils;
 import org.junit.Assert;
 import org.junit.Test;
 
@@ -57,51 +58,55 @@ public class QueryClientPoolLeakTest {
 
     @Test(timeout = 10_000)
     public void acquireDoesNotLeakNativeScratchOnConnectFailure() throws Exception {
-        try (FakeStatusServer rejecter = new FakeStatusServer(421, "X-QuestDB-Role: REPLICA")) {
-            rejecter.start();
-            String cfg = "ws::addr=127.0.0.1:" + rejecter.port()
-                    + ";target=primary;failover=off;auth_timeout_ms=1000;";
-
-            QueryClientPool pool = new QueryClientPool(
-                    cfg, 0, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE);
-            try {
-                long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+        TestUtils.assertMemoryLeak(() -> {
+            try (FakeStatusServer rejecter = new FakeStatusServer(421, "X-QuestDB-Role: REPLICA")) {
+                rejecter.start();
+                String cfg = "ws::addr=127.0.0.1:" + rejecter.port()
+                        + ";target=primary;failover=off;auth_timeout_ms=1000;";
+
+                QueryClientPool pool = new QueryClientPool(
+                        cfg, 0, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE);
                 try {
-                    pool.acquire();
-                    Assert.fail("expected acquire() to throw on connect rejection");
-                } catch (RuntimeException expected) {
-                    // QueryException wrapping the underlying connect failure.
+                    long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+                    try {
+                        pool.acquire();
+                        Assert.fail("expected acquire() to throw on connect rejection");
+                    } catch (RuntimeException expected) {
+                        // QueryException wrapping the underlying connect failure.
+                    }
+                    long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+                    Assert.assertEquals(
+                            "acquire() leaked NATIVE_DEFAULT bytes on connect failure",
+                            baseline, after);
+                } finally {
+                    pool.close();
                 }
-                long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
-                Assert.assertEquals(
-                        "acquire() leaked NATIVE_DEFAULT bytes on connect failure",
-                        baseline, after);
-            } finally {
-                pool.close();
             }
-        }
+        });
     }
 
     @Test(timeout = 10_000)
     public void preWarmDoesNotLeakNativeScratchOnConnectFailure() throws Exception {
-        try (FakeStatusServer rejecter = new FakeStatusServer(421, "X-QuestDB-Role: REPLICA")) {
-            rejecter.start();
-            String cfg = "ws::addr=127.0.0.1:" + rejecter.port()
-                    + ";target=primary;failover=off;auth_timeout_ms=1000;";
-
-            long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
-            try {
-                new QueryClientPool(cfg, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE);
-                Assert.fail("expected QueryClientPool ctor to throw on connect rejection");
-            } catch (RuntimeException expected) {
-                // target=primary against role=REPLICA yields a connect failure
-                // out of createUnlocked().
+        TestUtils.assertMemoryLeak(() -> {
+            try (FakeStatusServer rejecter = new FakeStatusServer(421, "X-QuestDB-Role: REPLICA")) {
+                rejecter.start();
+                String cfg = "ws::addr=127.0.0.1:" + rejecter.port()
+                        + ";target=primary;failover=off;auth_timeout_ms=1000;";
+
+                long baseline = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+                try {
+                    new QueryClientPool(cfg, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE);
+                    Assert.fail("expected QueryClientPool ctor to throw on connect rejection");
+                } catch (RuntimeException expected) {
+                    // target=primary against role=REPLICA yields a connect failure
+                    // out of createUnlocked().
+                }
+                long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
+                Assert.assertEquals(
+                        "pool ctor leaked NATIVE_DEFAULT bytes on connect failure",
+                        baseline, after);
             }
-            long after = Unsafe.getMemUsedByTag(MemoryTag.NATIVE_DEFAULT);
-            Assert.assertEquals(
-                    "pool ctor leaked NATIVE_DEFAULT bytes on connect failure",
-                    baseline, after);
-        }
+        });
     }
 
     private static final class FakeStatusServer implements AutoCloseable {
diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryCloseDrainTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryCloseDrainTest.java
new file mode 100644
index 00000000..76a4adb7
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/impl/QueryCloseDrainTest.java
@@ -0,0 +1,174 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.impl;
+
+import io.questdb.client.cutlass.qwp.client.QwpQueryClient;
+import io.questdb.client.impl.QueryClientPool;
+import io.questdb.client.impl.QueryWorker;
+import io.questdb.client.test.tools.TestUtils;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.lang.reflect.Field;
+import java.lang.reflect.Method;
+import java.util.ArrayList;
+import java.util.function.Consumer;
+
+/**
+ * Regression tests for the bounded, interruptible {@code Query.close()} drain.
+ * When a submit is still in flight at close() time, the old drain blocked the
+ * caller unbounded and uninterruptibly on the terminal event (and could hang
+ * forever if a racing {@code QuestDB.close()} stranded it). The drain now waits
+ * at most {@code closeQueryTimeoutMillis}, an interrupt aborts it, and a worker
+ * that fails to drain in time is discarded -- its connection may still carry
+ * late frames for the abandoned query -- rather than returned to the pool.
+ * <p>
+ * White-box style: a no-op connect hook builds workers without a network, and
+ * the in-flight state is simulated by setting {@code QueryImpl.done=false}
+ * reflectively, so no server or real {@code execute()} is needed to exercise
+ * the close() drain logic.
+ */
+public class QueryCloseDrainTest {
+
+    private static final String CFG = "ws::addr=127.0.0.1:1;";
+    private static final Consumer<QwpQueryClient> NO_CONNECT = c -> {
+    };
+
+    @Test(timeout = 30_000)
+    public void testCloseDiscardsWorkerWhenDrainTimesOut() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            try (QueryClientPool pool = new QueryClientPool(
+                    CFG, 0, 2, 1_000L, Long.MAX_VALUE, Long.MAX_VALUE, NO_CONNECT)) {
+                setCloseQueryTimeout(pool, 150L);
+                QueryWorker w = pool.acquire();
+                long gen = generation(w);
+                setDone(w, false); // pretend a submit is in flight; nothing will ever signal done
+
+                long startNanos = System.nanoTime();
+                closeQuery(w, gen);
+                long elapsedMs = (System.nanoTime() - startNanos) / 1_000_000;
+
+                Assert.assertTrue("close() must wait about the close budget, elapsed=" + elapsedMs,
+                        elapsedMs >= 120);
+                Assert.assertTrue("close() must be bounded, not block unbounded, elapsed=" + elapsedMs,
+                        elapsedMs < 5_000);
+                Assert.assertFalse("a worker that did not drain must be discarded, not returned to the pool",
+                        allWorkers(pool).contains(w));
+                Assert.assertEquals("the discarded worker must leave the pool so it can grow a fresh one",
+                        0, allWorkers(pool).size());
+                Assert.assertFalse("the discarded worker's dispatch thread must have exited",
+                        dispatchThread(w).isAlive());
+            }
+        });
+    }
+
+    @Test(timeout = 30_000)
+    public void testCloseIsInterruptible() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            try (QueryClientPool pool = new QueryClientPool(
+                    CFG, 0, 2, 1_000L, Long.MAX_VALUE, Long.MAX_VALUE, NO_CONNECT)) {
+                // A long budget: the only way close() can return promptly is by
+                // honoring the caller's interrupt.
+                setCloseQueryTimeout(pool, 60_000L);
+                QueryWorker w = pool.acquire();
+                long gen = generation(w);
+                setDone(w, false);
+
+                Thread.currentThread().interrupt();
+                long startNanos = System.nanoTime();
+                closeQuery(w, gen);
+                long elapsedMs = (System.nanoTime() - startNanos) / 1_000_000;
+
+                Assert.assertTrue("close() must preserve the caller's interrupt flag", Thread.interrupted());
+                Assert.assertTrue("interrupt must abort the drain promptly, elapsed=" + elapsedMs,
+                        elapsedMs < 5_000);
+                Assert.assertFalse("an interrupted close() must discard the worker",
+                        allWorkers(pool).contains(w));
+            }
+        });
+    }
+
+    @Test(timeout = 30_000)
+    public void testCloseReturnsWorkerWhenAlreadyDrained() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            try (QueryClientPool pool = new QueryClientPool(
+                    CFG, 0, 2, 1_000L, Long.MAX_VALUE, Long.MAX_VALUE, NO_CONNECT)) {
+                setCloseQueryTimeout(pool, 150L);
+                QueryWorker w = pool.acquire();
+                long gen = generation(w);
+                // done stays true (no in-flight submit): close() must take the fast
+                // path and return the worker to the pool for reuse, not discard it.
+                closeQuery(w, gen);
+                Assert.assertTrue("an already-drained worker must be returned to the pool, not discarded",
+                        allWorkers(pool).contains(w));
+            }
+        });
+    }
+
+    @SuppressWarnings("unchecked")
+    private static ArrayList<QueryWorker> allWorkers(QueryClientPool pool) throws Exception {
+        Field f = QueryClientPool.class.getDeclaredField("all");
+        f.setAccessible(true);
+        return (ArrayList<QueryWorker>) f.get(pool);
+    }
+
+    private static void closeQuery(QueryWorker w, long gen) throws Exception {
+        Object impl = queryImpl(w);
+        Method close = impl.getClass().getDeclaredMethod("close", long.class);
+        close.setAccessible(true);
+        close.invoke(impl, gen);
+    }
+
+    private static Thread dispatchThread(QueryWorker w) throws Exception {
+        Field f = QueryWorker.class.getDeclaredField("thread");
+        f.setAccessible(true);
+        return (Thread) f.get(w);
+    }
+
+    private static long generation(QueryWorker w) throws Exception {
+        Method m = QueryWorker.class.getDeclaredMethod("generation");
+        m.setAccessible(true);
+        return (long) m.invoke(w);
+    }
+
+    private static Object queryImpl(QueryWorker w) throws Exception {
+        Field queryF = QueryWorker.class.getDeclaredField("query");
+        queryF.setAccessible(true);
+        return queryF.get(w);
+    }
+
+    private static void setCloseQueryTimeout(QueryClientPool pool, long millis) throws Exception {
+        Field f = QueryClientPool.class.getDeclaredField("closeQueryTimeoutMillis");
+        f.setAccessible(true);
+        f.setLong(pool, millis);
+    }
+
+    private static void setDone(QueryWorker w, boolean done) throws Exception {
+        Object impl = queryImpl(w);
+        Field doneF = impl.getClass().getDeclaredField("done");
+        doneF.setAccessible(true);
+        doneF.setBoolean(impl, done);
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryImplResetTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryImplResetTest.java
index 1ff33b76..bfe3c24f 100644
--- a/core/src/test/java/io/questdb/client/test/impl/QueryImplResetTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/QueryImplResetTest.java
@@ -24,11 +24,12 @@
 
 package io.questdb.client.test.impl;
 
-import io.questdb.client.Query;
 import io.questdb.client.cutlass.qwp.client.QwpBindSetter;
 import io.questdb.client.cutlass.qwp.client.QwpColumnBatch;
 import io.questdb.client.cutlass.qwp.client.QwpColumnBatchHandler;
 import io.questdb.client.cutlass.qwp.client.QwpServerInfo;
+import io.questdb.client.std.str.StringSink;
+import io.questdb.client.test.tools.TestUtils;
 import org.junit.Assert;
 import org.junit.Test;
 
@@ -39,113 +40,101 @@
 public class QueryImplResetTest {
 
     /**
-     * Regression test for the state-carryover bug between consecutive
-     * submits on the per-thread {@code QuestDB#query()} handle.
+     * The Javadoc on both {@code Query} and {@code QuestDB#borrowQuery()}
+     * promises the leased handle is handed out "reset to empty". The reset is
+     * {@code QueryImpl.resetForBorrow()}, invoked from {@code QueryWorker.lease()}
+     * when {@code borrowQuery()} hands the pre-allocated handle out. It must
+     * clear the builder state (SQL, binds, handler) so a follow-up
+     * {@code submit()} cannot silently reuse a prior borrow's handler/binds,
+     * and it must leave the handle idle (done).
      * <p>
-     * The Javadoc on both {@code Query} and {@code QuestDB#query()} promises
-     * that the returned instance is "reset to empty" / "in a reset state".
-     * Before the fix, {@code QuestDBImpl.query()} returned the bare
-     * thread-local without nulling {@code userHandler} / {@code userBinds},
-     * so the second call below would silently reuse {@code h1}:
-     * <pre>
-     *   db.query().sql("SELECT 1").handler(h1).submit().await();
-     *   db.query().sql("SELECT 2").submit();    // no .handler() -- reuses h1
-     * </pre>
-     * The {@code if (userHandler == null)} check in {@code submit()} could
-     * not catch the misuse because the field was still set from the prior
-     * submit.
-     * <p>
-     * The fix is {@code QueryImpl.resetIfDone()}, invoked from
-     * {@code QuestDBImpl.query()} before the per-thread handle is returned.
-     * This test reaches into {@code QueryImpl} via reflection (the class is
-     * package-private and lives in a different package from this test) and
-     * asserts the reset clears all three configured fields when the prior
-     * run is in a terminal state.
+     * The reset is unconditional: the leased worker was just acquired from the
+     * pool, so it is always idle (done) at borrow time. This test reaches into
+     * {@code QueryImpl} by reflection (the class is package-private and lives
+     * in a different package from this test). Builder state is seeded directly
+     * via reflection rather than through the {@code Query} API because the
+     * lease-generation guard on the setters would dereference the (null) worker.
      */
     @Test
-    public void testResetIfDoneClearsBuilderStateInTerminalState() throws Exception {
-        Class<?> queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl");
-        Class<?> poolClass = Class.forName("io.questdb.client.impl.QueryClientPool");
-
-        Constructor<?> ctor = queryImplClass.getDeclaredConstructor(poolClass);
-        ctor.setAccessible(true);
-        // QueryImpl never dereferences the pool outside of submit(); a null
-        // pool is fine for this state-only test.
-        Query q = (Query) ctor.newInstance(new Object[]{null});
-
-        // Mirror the post-submit().await() state: builder fields set,
-        // done flag true (the constructor default).
-        QwpColumnBatchHandler h = new NoopHandler();
-        QwpBindSetter b = values -> {
-            // no-op
-        };
-        q.sql("SELECT 1").binds(b).handler(h);
-
-        Method reset = queryImplClass.getDeclaredMethod("resetIfDone");
-        reset.setAccessible(true);
-        reset.invoke(q);
-
-        Field handlerF = queryImplClass.getDeclaredField("userHandler");
-        Field bindsF = queryImplClass.getDeclaredField("userBinds");
-        Field sqlBufF = queryImplClass.getDeclaredField("sqlBuffer");
-        handlerF.setAccessible(true);
-        bindsF.setAccessible(true);
-        sqlBufF.setAccessible(true);
-
-        Assert.assertNull("userHandler must be cleared so a follow-up submit() without .handler() fails fast",
-                handlerF.get(q));
-        Assert.assertNull("userBinds must be cleared so a follow-up submit() without .binds() does not reuse the prior setter",
-                bindsF.get(q));
-        CharSequence sqlBuffer = (CharSequence) sqlBufF.get(q);
-        Assert.assertEquals("sqlBuffer must be empty so a follow-up submit() without .sql() throws 'sql is required'",
-                0, sqlBuffer.length());
+    public void testResetForBorrowClearsBuilderState() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            Class<?> queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl");
+            Class<?> workerClass = Class.forName("io.questdb.client.impl.QueryWorker");
+
+            Constructor<?> ctor = queryImplClass.getDeclaredConstructor(workerClass);
+            ctor.setAccessible(true);
+            // resetForBorrow() never dereferences the worker; a null worker is fine
+            // for this state-only test.
+            Object q = ctor.newInstance(new Object[]{null});
+
+            Field handlerF = queryImplClass.getDeclaredField("userHandler");
+            Field bindsF = queryImplClass.getDeclaredField("userBinds");
+            Field sqlBufF = queryImplClass.getDeclaredField("sqlBuffer");
+            Field doneF = queryImplClass.getDeclaredField("done");
+            handlerF.setAccessible(true);
+            bindsF.setAccessible(true);
+            sqlBufF.setAccessible(true);
+            doneF.setAccessible(true);
+
+            // Seed builder state as a prior borrow would have left it.
+            handlerF.set(q, new NoopHandler());
+            bindsF.set(q, (QwpBindSetter) values -> {
+                // no-op
+            });
+            ((StringSink) sqlBufF.get(q)).put("SELECT 1");
+            doneF.setBoolean(q, false);
+
+            Method reset = queryImplClass.getDeclaredMethod("resetForBorrow");
+            reset.setAccessible(true);
+            reset.invoke(q);
+
+            Assert.assertNull("userHandler must be cleared so a follow-up submit() without .handler() fails fast",
+                    handlerF.get(q));
+            Assert.assertNull("userBinds must be cleared so a follow-up submit() without .binds() does not reuse the prior setter",
+                    bindsF.get(q));
+            CharSequence sqlBuffer = (CharSequence) sqlBufF.get(q);
+            Assert.assertEquals("sqlBuffer must be empty so a follow-up submit() without .sql() throws 'sql is required'",
+                    0, sqlBuffer.length());
+            Assert.assertTrue("done must be true so the handle starts idle, not in flight",
+                    doneF.getBoolean(q));
+        });
     }
 
     /**
-     * Symmetric guard: when a submit is in flight ({@code done == false}),
-     * {@code resetIfDone()} must NOT touch the configured fields. The
-     * dispatched worker thread is reading {@code sqlBuffer} in
-     * {@code runOn()} and {@code userHandler} via the wrapping handler;
-     * clearing them mid-flight would race.
+     * {@code QuestDB#borrowQuery()} returns a thin lease that is freshly
+     * allocated per borrow, but the heavy state it wraps -- the per-worker
+     * {@code QueryImpl} -- is pre-allocated once and reused across borrows. This
+     * pins that contract: two {@code lease()} calls on the same worker return
+     * distinct lease wrappers that delegate to the same pooled {@code QueryImpl}.
+     * Reaches both package-private classes by reflection.
      */
     @Test
-    public void testResetIfDoneIsNoOpWhileSubmitInFlight() throws Exception {
-        Class<?> queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl");
-        Class<?> poolClass = Class.forName("io.questdb.client.impl.QueryClientPool");
-
-        Constructor<?> ctor = queryImplClass.getDeclaredConstructor(poolClass);
-        ctor.setAccessible(true);
-        Query q = (Query) ctor.newInstance(new Object[]{null});
-
-        QwpColumnBatchHandler h = new NoopHandler();
-        QwpBindSetter b = values -> {
-            // no-op
-        };
-        q.sql("SELECT 1").binds(b).handler(h);
-
-        // Flip the in-flight flag by setting done=false directly.
-        Field doneF = queryImplClass.getDeclaredField("done");
-        doneF.setAccessible(true);
-        doneF.setBoolean(q, false);
-
-        Method reset = queryImplClass.getDeclaredMethod("resetIfDone");
-        reset.setAccessible(true);
-        reset.invoke(q);
-
-        Field handlerF = queryImplClass.getDeclaredField("userHandler");
-        Field bindsF = queryImplClass.getDeclaredField("userBinds");
-        Field sqlBufF = queryImplClass.getDeclaredField("sqlBuffer");
-        handlerF.setAccessible(true);
-        bindsF.setAccessible(true);
-        sqlBufF.setAccessible(true);
-
-        Assert.assertSame("userHandler must survive resetIfDone() while a submit is in flight",
-                h, handlerF.get(q));
-        Assert.assertSame("userBinds must survive resetIfDone() while a submit is in flight",
-                b, bindsF.get(q));
-        CharSequence sqlBuffer = (CharSequence) sqlBufF.get(q);
-        Assert.assertEquals("sqlBuffer must survive resetIfDone() while a submit is in flight",
-                "SELECT 1", sqlBuffer.toString());
+    public void testLeaseWrapsSamePooledQueryImpl() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            Class<?> workerClass = Class.forName("io.questdb.client.impl.QueryWorker");
+            Class<?> poolClass = Class.forName("io.questdb.client.impl.QueryClientPool");
+            Class<?> clientClass = Class.forName("io.questdb.client.cutlass.qwp.client.QwpQueryClient");
+            Class<?> leaseClass = Class.forName("io.questdb.client.impl.QueryLease");
+
+            // lease() never dereferences the client or pool (it only resets the
+            // reused QueryImpl and stamps the current generation), so nulls are fine
+            // for this structure-only test -- mirrors the null-worker shortcut above.
+            Constructor<?> ctor = workerClass.getDeclaredConstructor(clientClass, poolClass, int.class);
+            ctor.setAccessible(true);
+            Object worker = ctor.newInstance(null, null, 0);
+
+            Method leaseM = workerClass.getDeclaredMethod("lease");
+            leaseM.setAccessible(true);
+            Object leaseA = leaseM.invoke(worker);
+            Object leaseB = leaseM.invoke(worker);
+
+            Assert.assertNotSame("each borrow must hand back a fresh lease wrapper", leaseA, leaseB);
+
+            Field implF = leaseClass.getDeclaredField("impl");
+            implF.setAccessible(true);
+            Assert.assertSame("both leases must wrap the same pooled QueryImpl (zero-allocation reuse of the heavy state)",
+                    implF.get(leaseA), implF.get(leaseB));
+        });
     }
 
     private static final class NoopHandler implements QwpColumnBatchHandler {
diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryLeaseGenerationTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryLeaseGenerationTest.java
new file mode 100644
index 00000000..f878ccd0
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/impl/QueryLeaseGenerationTest.java
@@ -0,0 +1,280 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.impl;
+
+import io.questdb.client.cutlass.qwp.client.QwpQueryClient;
+import io.questdb.client.impl.QueryClientPool;
+import io.questdb.client.impl.QueryWorker;
+import io.questdb.client.test.tools.TestUtils;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.lang.reflect.Field;
+import java.lang.reflect.Method;
+import java.util.ArrayDeque;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * Regression tests for M1: a stale {@code Query} lease (held after close, or a
+ * cached {@code Completion}) must not disturb a later borrow of the same
+ * worker. The reused per-worker {@code QueryImpl} alone cannot distinguish a
+ * stale handle from a live one -- the fix stamps each borrow with a monotonic
+ * generation under the pool lock and validates it on close/cancel/release.
+ * <p>
+ * These exercise the package-private internals by reflection (the same
+ * white-box style as the other tests in this package). They construct workers
+ * with a non-connected {@code newPlainText} client and never start the worker
+ * thread, so no network or I/O thread is involved.
+ */
+public class QueryLeaseGenerationTest {
+
+    /**
+     * A stale {@code Completion.cancel()} (its lease long since released and the
+     * worker re-borrowed) must NOT reach the worker's client -- otherwise it
+     * would cancel whatever query the current borrower is running. We observe
+     * "reached the client" via the client's pending-cancel latch, which
+     * {@code QwpQueryClient.cancel()} sets first thing.
+     */
+    @Test
+    public void testStaleCancelDoesNotReachClient() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            Class<?> workerClass = Class.forName("io.questdb.client.impl.QueryWorker");
+            Class<?> queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl");
+            Method bump = workerClass.getDeclaredMethod("bumpGeneration");
+            bump.setAccessible(true);
+            Field queryF = workerClass.getDeclaredField("query");
+            queryF.setAccessible(true);
+            Field doneF = queryImplClass.getDeclaredField("done");
+            doneF.setAccessible(true);
+            Method cancel = queryImplClass.getDeclaredMethod("cancel", long.class);
+            cancel.setAccessible(true);
+
+            // cancel(gen) validates the generation under the pool lock, so the
+            // worker needs a real pool to lock on (the worker thread is never
+            // started, so no network or I/O thread is involved).
+            QueryClientPool pool = new QueryClientPool(
+                    "ws::addr=localhost:9000;",
+                    /*minSize*/ 0, /*maxSize*/ 2,
+                    /*acquireTimeoutMillis*/ 1_000L,
+                    /*idleTimeoutMillis*/ Long.MAX_VALUE,
+                    /*maxLifetimeMillis*/ Long.MAX_VALUE);
+            try {
+                // Live lease: generation 1 (one acquire), query in flight -> cancel(1)
+                // must reach the client.
+                try (QwpQueryClient live = QwpQueryClient.newPlainText("localhost", 9000)) {
+                    QueryWorker w = new QueryWorker(live, pool, 0);
+                    bump.invoke(w); // generation -> 1 (acquire stamp)
+                    Object impl = queryF.get(w);
+                    doneF.setBoolean(impl, false); // pretend a submit is in flight
+                    cancel.invoke(impl, 1L);
+                    Assert.assertTrue("cancel() on the live lease must reach the client",
+                            live.isPendingCancelForTest());
+                }
+
+                // Stale lease: the worker was borrowed (gen 1), released and re-borrowed
+                // (gen now 3). A cancel from the old lease (gen 1) must be dropped, even
+                // though the current query is in flight.
+                try (QwpQueryClient reused = QwpQueryClient.newPlainText("localhost", 9000)) {
+                    QueryWorker w = new QueryWorker(reused, pool, 0);
+                    bump.invoke(w); // -> 1 (first acquire)
+                    bump.invoke(w); // -> 2 (release)
+                    bump.invoke(w); // -> 3 (second acquire by a new borrower)
+                    Object impl = queryF.get(w);
+                    doneF.setBoolean(impl, false); // the new borrower's query is in flight
+                    cancel.invoke(impl, 1L); // stale lease cancels
+                    Assert.assertFalse("a stale lease's cancel() must NOT reach the client and "
+                                    + "cancel a different borrower's in-flight query",
+                            reused.isPendingCancelForTest());
+                }
+            } finally {
+                pool.close();
+            }
+        });
+    }
+
+    /**
+     * The TOCTOU the locked cancel closes: a cross-thread watchdog calls
+     * {@code cancel(gen)} while its lease is live, but the lease goes stale (the
+     * worker is released and re-borrowed) before the wire cancel fires. The
+     * cancel must re-validate the generation atomically with the cancel, under
+     * the pool lock, or it would abort the new borrower's query.
+     * <p>
+     * Driven deterministically: the test thread holds the pool lock, so the
+     * watchdog's cancel parks inside the pool's generation re-check. We then
+     * advance the generation (release + re-borrow) under the lock and release
+     * it. The parked cancel must observe the new generation and drop. An
+     * unlocked check-then-cancel would not park, would pass its check at the
+     * still-live generation, and would fire the wire cancel.
+     */
+    @Test
+    public void testConcurrentCancelDoesNotReachClientAfterReborrow() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            Method bump = QueryWorker.class.getDeclaredMethod("bumpGeneration");
+            bump.setAccessible(true);
+            Field queryF = QueryWorker.class.getDeclaredField("query");
+            queryF.setAccessible(true);
+            Class<?> queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl");
+            Field doneF = queryImplClass.getDeclaredField("done");
+            doneF.setAccessible(true);
+            Method cancel = queryImplClass.getDeclaredMethod("cancel", long.class);
+            cancel.setAccessible(true);
+            Field poolLockF = QueryClientPool.class.getDeclaredField("lock");
+            poolLockF.setAccessible(true);
+
+            QueryClientPool pool = new QueryClientPool(
+                    "ws::addr=localhost:9000;",
+                    /*minSize*/ 0, /*maxSize*/ 2,
+                    /*acquireTimeoutMillis*/ 1_000L,
+                    /*idleTimeoutMillis*/ Long.MAX_VALUE,
+                    /*maxLifetimeMillis*/ Long.MAX_VALUE);
+            QwpQueryClient client = QwpQueryClient.newPlainText("localhost", 9000);
+            try {
+                final QueryWorker w = new QueryWorker(client, pool, 0);
+                bump.invoke(w); // generation -> 1; the watchdog's lease captured 1
+                final Object impl = queryF.get(w);
+                doneF.setBoolean(impl, false); // a query is in flight
+
+                ReentrantLock poolLock = (ReentrantLock) poolLockF.get(pool);
+                final CountDownLatch atCancel = new CountDownLatch(1);
+                final CountDownLatch cancelReturned = new CountDownLatch(1);
+                final AtomicReference<Throwable> err = new AtomicReference<>();
+
+                // Hold the pool lock so the watchdog's cancel cannot finish its
+                // generation re-check + wire cancel until we let go.
+                poolLock.lock();
+                Thread watchdog = new Thread(() -> {
+                    atCancel.countDown();
+                    try {
+                        cancel.invoke(impl, 1L); // lease generation captured at borrow = 1
+                    } catch (Throwable t) {
+                        err.set(t);
+                    } finally {
+                        cancelReturned.countDown();
+                    }
+                }, "watchdog-cancel");
+                watchdog.start();
+                Assert.assertTrue("watchdog must start", atCancel.await(5, TimeUnit.SECONDS));
+
+                // With the locked cancel, cancel() parks on the pool lock and cannot
+                // return while we hold it. An unlocked check-then-cancel would have
+                // already fired the wire cancel and returned.
+                Assert.assertFalse("cancel() must re-check the generation under the pool "
+                                + "lock, so it cannot complete while the lock is held",
+                        cancelReturned.await(200, TimeUnit.MILLISECONDS));
+
+                // The lease goes stale underneath the parked cancel: released (-> 2)
+                // and re-borrowed by a new owner (-> 3).
+                bump.invoke(w);
+                bump.invoke(w);
+                poolLock.unlock();
+
+                Assert.assertTrue("cancel() must return once the pool lock is free",
+                        cancelReturned.await(5, TimeUnit.SECONDS));
+                if (err.get() != null) {
+                    throw new AssertionError("cancel() threw", err.get());
+                }
+                Assert.assertFalse("a cancel whose lease went stale while parked on the pool "
+                                + "lock must NOT reach the client and abort the new borrower's query",
+                        client.isPendingCancelForTest());
+            } finally {
+                client.close();
+                pool.close();
+            }
+        });
+    }
+
+    /**
+     * The pool-wide blast radius of M1: a stale (duplicate / post-reborrow)
+     * release must never enqueue a worker that a live borrower owns, otherwise
+     * the worker sits in {@code available} twice and is handed to two borrowers
+     * at once. The generation captured at borrow time, re-checked under the pool
+     * lock, makes this impossible.
+     */
+    @Test
+    @SuppressWarnings("unchecked")
+    public void testStaleReleaseDoesNotEnqueueWorkerTwice() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            Class<?> poolClass = Class.forName("io.questdb.client.impl.QueryClientPool");
+            Method release = poolClass.getDeclaredMethod("release", QueryWorker.class, long.class);
+            release.setAccessible(true);
+            Field availableF = poolClass.getDeclaredField("available");
+            availableF.setAccessible(true);
+            Method bump = QueryWorker.class.getDeclaredMethod("bumpGeneration");
+            bump.setAccessible(true);
+            Method generation = QueryWorker.class.getDeclaredMethod("generation");
+            generation.setAccessible(true);
+
+            QueryClientPool pool = new QueryClientPool(
+                    "ws::addr=localhost:9000;",
+                    /*minSize*/ 0, /*maxSize*/ 2,
+                    /*acquireTimeoutMillis*/ 1_000L,
+                    /*idleTimeoutMillis*/ Long.MAX_VALUE,
+                    /*maxLifetimeMillis*/ Long.MAX_VALUE);
+            QwpQueryClient client = QwpQueryClient.newPlainText("localhost", 9000);
+            try {
+                ArrayDeque<QueryWorker> available = (ArrayDeque<QueryWorker>) availableF.get(pool);
+                QueryWorker w = new QueryWorker(client, pool, 0);
+
+                // acquire #1 stamps generation 1; the lease (A) captures 1.
+                bump.invoke(w);
+                Assert.assertEquals(1L, generation.invoke(w));
+
+                // close A -> release(w, 1): matches, enqueues once.
+                release.invoke(pool, w, 1L);
+                Assert.assertEquals("valid release must enqueue the worker once", 1, available.size());
+
+                // close A again (duplicate, e.g. explicit close + try-with-resources)
+                // -> release(w, 1): generation already bumped to 2, so it is dropped.
+                release.invoke(pool, w, 1L);
+                Assert.assertEquals("duplicate release of the same lease must be dropped",
+                        1, available.size());
+
+                // acquire #2 hands the worker to a new borrower (B): pull it out and
+                // stamp generation 3.
+                available.pollFirst();
+                bump.invoke(w);
+                Assert.assertEquals(3L, generation.invoke(w));
+
+                // A stray close from the long-dead lease A -> release(w, 1): dropped,
+                // so B's worker is NOT re-enqueued while B still owns it.
+                release.invoke(pool, w, 1L);
+                Assert.assertEquals("a post-reborrow stale release must NOT enqueue the "
+                                + "worker while another borrower owns it",
+                        0, available.size());
+
+                // B's own close -> release(w, 3): matches, enqueues legitimately.
+                release.invoke(pool, w, 3L);
+                Assert.assertEquals("the current borrower's release must still work",
+                        1, available.size());
+            } finally {
+                client.close();
+                pool.close();
+            }
+        });
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/impl/QueryWorkerTest.java b/core/src/test/java/io/questdb/client/test/impl/QueryWorkerTest.java
index e9041448..0dd6ee75 100644
--- a/core/src/test/java/io/questdb/client/test/impl/QueryWorkerTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/QueryWorkerTest.java
@@ -26,16 +26,37 @@
 
 import io.questdb.client.Completion;
 import io.questdb.client.cutlass.qwp.client.QwpQueryClient;
+import io.questdb.client.impl.QueryClientPool;
 import io.questdb.client.impl.QueryWorker;
+import io.questdb.client.test.tools.TestUtils;
 import org.junit.Assert;
 import org.junit.Test;
 
 import java.lang.reflect.Constructor;
 import java.lang.reflect.Field;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.locks.Condition;
 import java.util.concurrent.locks.ReentrantLock;
 
+/**
+ * Unit tests for {@link QueryWorker}.
+ * <p>
+ * Coverage boundary: the lost-dispatch fix for the single-flight-reuse race
+ * (clearing {@code current} under {@code signalLock} at the moment of
+ * consumption rather than in a post-{@code runOn()} finally) has no
+ * deterministic unit reproduction here. Reproducing the clobber needs the
+ * worker to be mid-{@code runOn(client)} when the user thread re-dispatches on
+ * the same lease, which requires a live query client to drive
+ * {@code client.execute(...)} to its terminal callback. That regression is
+ * guarded end-to-end by {@code QuestDBFacadeE2ETest.testSustainedMixedConcurrency}
+ * in the parent questdb repo (more threads than pool slots, repeated
+ * submit/await per lease). {@link #testShutdownRacingDispatchMustNotStrandCaller()}
+ * below covers the adjacent but distinct shutdown-vs-dispatch branch only --
+ * reverting the lost-dispatch hunk would not fail it.
+ */
 public class QueryWorkerTest {
 
     /**
@@ -44,14 +65,16 @@ public class QueryWorkerTest {
      * connect is needed; {@code newPlainText} only allocates the client.
      */
     @Test
-    public void testClientGetterReturnsConstructorInstance() {
-        try (QwpQueryClient client = QwpQueryClient.newPlainText("localhost", 9000)) {
-            QueryWorker worker = new QueryWorker(client, null, 0);
-            Assert.assertSame("client() must return the instance passed to the constructor",
-                    client, worker.client());
-            // Idempotent across calls -- the field is final.
-            Assert.assertSame(worker.client(), worker.client());
-        }
+    public void testClientGetterReturnsConstructorInstance() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            try (QwpQueryClient client = QwpQueryClient.newPlainText("localhost", 9000)) {
+                QueryWorker worker = new QueryWorker(client, null, 0);
+                Assert.assertSame("client() must return the instance passed to the constructor",
+                        client, worker.client());
+                // Idempotent across calls -- the field is final.
+                Assert.assertSame(worker.client(), worker.client());
+            }
+        });
     }
 
     /**
@@ -68,97 +91,283 @@ public void testClientGetterReturnsConstructorInstance() {
      * state directly: it parks the worker on its condition, then takes the
      * worker's own {@code signalLock} and atomically sets both
      * {@code current} and {@code shuttingDown} before signalling. After the
-     * worker thread exits, the test asserts the {@link Completion} has been
-     * signalled. Today the assertion fails because the run loop's early
-     * return strands the {@code QueryImpl}.
+     * worker thread exits, the test asserts the {@code QueryImpl} was signalled
+     * to done. Without the fix the assertion fails because the run loop's early
+     * return strands the {@code QueryImpl} with {@code done==false}, so any
+     * caller blocked in {@code Completion.await()} would hang forever.
      */
     @Test(timeout = 30_000)
     public void testShutdownRacingDispatchMustNotStrandCaller() throws Exception {
-        Class<?> queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl");
-        Class<?> poolClass = Class.forName("io.questdb.client.impl.QueryClientPool");
-
-        Field lockF = QueryWorker.class.getDeclaredField("signalLock");
-        Field condF = QueryWorker.class.getDeclaredField("signalCondition");
-        Field currentF = QueryWorker.class.getDeclaredField("current");
-        Field shuttingF = QueryWorker.class.getDeclaredField("shuttingDown");
-        Field threadF = QueryWorker.class.getDeclaredField("thread");
-        for (Field f : new Field[]{lockF, condF, currentF, shuttingF, threadF}) {
-            f.setAccessible(true);
-        }
+        TestUtils.assertMemoryLeak(() -> {
+            Class<?> queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl");
+
+            Field lockF = QueryWorker.class.getDeclaredField("signalLock");
+            Field condF = QueryWorker.class.getDeclaredField("signalCondition");
+            Field currentF = QueryWorker.class.getDeclaredField("current");
+            Field shuttingF = QueryWorker.class.getDeclaredField("shuttingDown");
+            Field threadF = QueryWorker.class.getDeclaredField("thread");
+            for (Field f : new Field[]{lockF, condF, currentF, shuttingF, threadF}) {
+                f.setAccessible(true);
+            }
+
+            Field doneF = queryImplClass.getDeclaredField("done");
+            Field unexpectedF = queryImplClass.getDeclaredField("unexpectedError");
+            doneF.setAccessible(true);
+            unexpectedF.setAccessible(true);
 
-        Field doneF = queryImplClass.getDeclaredField("done");
-        Field completionF = queryImplClass.getDeclaredField("completion");
-        doneF.setAccessible(true);
-        completionF.setAccessible(true);
-
-        // No QwpQueryClient is constructed here: runLoop exits at the
-        // shuttingDown check before reaching the first reference to
-        // {@code client} or {@code pool}, so passing null for both is fine
-        // and keeps the test cleanly isolated from any network or socket state.
-        QueryWorker worker = new QueryWorker(null, null, 0);
-        Thread t = (Thread) threadF.get(worker);
-        t.start();
-
-        ReentrantLock lock = (ReentrantLock) lockF.get(worker);
-        Condition cond = (Condition) condF.get(worker);
-
-        // Wait until the worker thread is parked on its signalCondition.
-        long deadlineNanos = System.nanoTime() + TimeUnit.SECONDS.toNanos(5);
-        while (true) {
-            boolean parked;
+            // No QwpQueryClient is constructed here: runLoop exits at the
+            // shuttingDown check before reaching the first reference to
+            // {@code client} or {@code pool}, so passing null for both is fine
+            // and keeps the test cleanly isolated from any network or socket state.
+            QueryWorker worker = new QueryWorker(null, null, 0);
+            Thread t = (Thread) threadF.get(worker);
+            t.start();
+
+            ReentrantLock lock = (ReentrantLock) lockF.get(worker);
+            Condition cond = (Condition) condF.get(worker);
+
+            // Wait until the worker thread is parked on its signalCondition.
+            long deadlineNanos = System.nanoTime() + TimeUnit.SECONDS.toNanos(5);
+            while (true) {
+                boolean parked;
+                lock.lock();
+                try {
+                    parked = lock.hasWaiters(cond);
+                } finally {
+                    lock.unlock();
+                }
+                if (parked) {
+                    break;
+                }
+                if (System.nanoTime() > deadlineNanos) {
+                    Assert.fail("worker thread never parked on its signalCondition");
+                }
+                Thread.sleep(1);
+            }
+
+            // Construct a QueryImpl with done=false, mimicking the state set up
+            // by QueryImpl.submit() just before it calls worker.dispatch().
+            Constructor<?> ctor = queryImplClass.getDeclaredConstructor(QueryWorker.class);
+            ctor.setAccessible(true);
+            Object queryImpl = ctor.newInstance(new Object[]{null});
+            doneF.setBoolean(queryImpl, false);
+
+            // Atomically force the racy state under the worker's own lock:
+            // current set AND shuttingDown set before the worker wakes.
             lock.lock();
             try {
-                parked = lock.hasWaiters(cond);
+                currentF.set(worker, queryImpl);
+                shuttingF.setBoolean(worker, true);
+                cond.signalAll();
             } finally {
                 lock.unlock();
             }
-            if (parked) {
-                break;
-            }
-            if (System.nanoTime() > deadlineNanos) {
-                Assert.fail("worker thread never parked on its signalCondition");
+
+            // The worker thread must exit (it has observed shuttingDown).
+            t.join(5_000);
+            Assert.assertFalse("worker thread did not exit after shuttingDown=true",
+                    t.isAlive());
+
+            // The QueryImpl must have been signalled to done. Without the fix,
+            // done stays false because signalDone is never called, so a caller in
+            // Completion.await() would hang forever. The worker reaches the
+            // shutdown-race branch and calls signalUnexpected("QuestDB handle is
+            // closed"), which sets done=true and records the unexpected error.
+            Assert.assertTrue("BUG: QueryWorker.runLoop returned with shuttingDown=true "
+                    + "while current!=null, never invoking runOn or signalUnexpected. "
+                    + "The caller's Completion.await() hangs forever.", doneF.getBoolean(queryImpl));
+            Assert.assertNotNull("signalUnexpected must record the closed-handle error",
+                    unexpectedF.get(queryImpl));
+        });
+    }
+
+    /**
+     * Busy-worker variant of the shutdown-drop race fixed in df6f7ca
+     * ({@code while (!shuttingDown)} -> {@code while (true)} in
+     * {@link QueryWorker}'s run loop). Unlike
+     * {@link #testShutdownRacingDispatchMustNotStrandCaller()} -- which only
+     * drives the PARKED-worker branch (worker blocked in
+     * {@code awaitUninterruptibly} before {@code shuttingDown} flips) and stays
+     * green even with the fix reverted -- this test forces the worker THROUGH a
+     * job's {@code runOn()} and then, on the worker thread at the exact instant
+     * that job returns, reproduces a reused lease re-dispatching
+     * ({@code current = q2}) racing a shutdown ({@code shuttingDown = true}),
+     * both set before the loop re-enters the strand check.
+     * <p>
+     * With the fix the loop re-enters the {@code signalLock} block, observes
+     * {@code shuttingDown}, and strands q2 (signalling its caller). With the bug
+     * the loop exits at the top without re-reading {@code current}, so q2 is
+     * dropped -- never run, never signalled -- and its caller's
+     * {@code Completion.await()} would hang forever. The assertion on
+     * {@code q2.done} fails if the fix is reverted.
+     * <p>
+     * The interleaving is made deterministic with a test-only worker-thread
+     * barrier ({@code QueryWorker.busyWorkerTestHook}) instead of a sleep:
+     * {@link QueryWorker} and {@code QueryImpl} are final and
+     * {@code QwpQueryClient} has no test seam, so pausing between
+     * {@code runOn()} and the loop check is the only race-free reproduction.
+     * {@code client}/{@code pool} are null -- {@code q1.runOn(null)} throws an
+     * NPE that {@code runLoop} catches and turns into q1's terminal signal, a
+     * fast stand-in for a real job returning from {@code runOn()}.
+     */
+    @Test(timeout = 30_000)
+    public void testBusyWorkerShutdownStrandsReDispatchedCurrent() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            Class<?> queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl");
+
+            Field lockF = QueryWorker.class.getDeclaredField("signalLock");
+            Field currentF = QueryWorker.class.getDeclaredField("current");
+            Field shuttingF = QueryWorker.class.getDeclaredField("shuttingDown");
+            Field threadF = QueryWorker.class.getDeclaredField("thread");
+            Field hookF = QueryWorker.class.getDeclaredField("busyWorkerTestHook");
+            for (Field f : new Field[]{lockF, currentF, shuttingF, threadF, hookF}) {
+                f.setAccessible(true);
             }
-            Thread.sleep(1);
-        }
 
-        // Construct a QueryImpl with done=false, mimicking the state set up
-        // by QueryImpl.submit() just before it calls worker.dispatch().
-        Constructor<?> ctor = queryImplClass.getDeclaredConstructor(poolClass);
-        ctor.setAccessible(true);
-        Object queryImpl = ctor.newInstance(new Object[]{null});
-        doneF.setBoolean(queryImpl, false);
-        Completion completion = (Completion) completionF.get(queryImpl);
-
-        // Atomically force the racy state under the worker's own lock:
-        // current set AND shuttingDown set before the worker wakes.
-        lock.lock();
-        try {
-            currentF.set(worker, queryImpl);
-            shuttingF.setBoolean(worker, true);
-            cond.signalAll();
-        } finally {
-            lock.unlock();
-        }
+            Field doneF = queryImplClass.getDeclaredField("done");
+            Field unexpectedF = queryImplClass.getDeclaredField("unexpectedError");
+            doneF.setAccessible(true);
+            unexpectedF.setAccessible(true);
+
+            // client == null: q1.runOn(null) throws NPE, which runLoop catches and
+            // turns into q1's terminal signal -- a fast, deterministic stand-in for
+            // a real job returning from runOn(). pool == null is never touched here.
+            QueryWorker worker = new QueryWorker(null, null, 0);
+
+            Constructor<?> ctor = queryImplClass.getDeclaredConstructor(QueryWorker.class);
+            ctor.setAccessible(true);
+            Object q1 = ctor.newInstance(new Object[]{worker});
+            Object q2 = ctor.newInstance(new Object[]{worker});
+            doneF.setBoolean(q1, false);
+            doneF.setBoolean(q2, false);
+
+            ReentrantLock lock = (ReentrantLock) lockF.get(worker);
+            AtomicBoolean fired = new AtomicBoolean(false);
+
+            // The busy-worker barrier: the FIRST time the worker returns from a
+            // job's runOn(), simulate submit() -> dispatch() re-arming current with
+            // q2 while shutdown() flips shuttingDown -- both set, under signalLock,
+            // before the loop re-checks. Runs on the worker thread.
+            Runnable hook = () -> {
+                if (fired.compareAndSet(false, true)) {
+                    lock.lock();
+                    try {
+                        currentF.set(worker, q2);
+                        shuttingF.setBoolean(worker, true);
+                    } catch (IllegalAccessException e) {
+                        throw new RuntimeException(e);
+                    } finally {
+                        lock.unlock();
+                    }
+                }
+            };
+            hookF.set(worker, hook);
+
+            // Pre-arm current with q1 so the worker consumes it immediately on
+            // start (no need to wait for the await park); start() establishes the
+            // happens-before that publishes current and the hook to the worker.
+            currentF.set(worker, q1);
+
+            Thread t = (Thread) threadF.get(worker);
+            t.start();
+
+            t.join(5_000);
+            Assert.assertFalse("worker thread must exit after shuttingDown=true", t.isAlive());
+
+            Assert.assertTrue(
+                    "BUG (df6f7ca regressed): the busy worker returned from runOn() with a "
+                            + "re-dispatched current!=null and shuttingDown=true, then exited the loop "
+                            + "without stranding it. q2 was never signalled; its caller's await() hangs "
+                            + "forever.",
+                    doneF.getBoolean(q2));
+            Assert.assertNotNull("the stranded busy-path job must record the closed-handle error",
+                    unexpectedF.get(q2));
+        });
+    }
+
+    /**
+     * Result handlers (onBatch/onEnd/onError) run inline on the worker's
+     * dispatch thread. The blocking lease ops -- {@code close()} and the two
+     * {@code await()} variants -- would there wait on a terminal event that
+     * only this same thread can deliver, a permanent self-deadlock. The
+     * reentrancy guard must turn that into an immediate IllegalStateException.
+     * <p>
+     * The guard compares {@code Thread.currentThread()} to the worker's
+     * dispatch thread, so this test points that field at the test thread (the
+     * worker is never started) to stand in for a reentrant in-handler call.
+     * Without the guard, {@code close()}/{@code await()} would park forever and
+     * the method-level timeout would fail the test.
+     */
+    @Test(timeout = 30_000)
+    public void testCloseAndAwaitFromWorkerThreadThrowInsteadOfDeadlocking() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            Class<?> queryImplClass = Class.forName("io.questdb.client.impl.QueryImpl");
+            Field queryF = QueryWorker.class.getDeclaredField("query");
+            queryF.setAccessible(true);
+            Field threadF = QueryWorker.class.getDeclaredField("thread");
+            threadF.setAccessible(true);
+            Field doneF = queryImplClass.getDeclaredField("done");
+            doneF.setAccessible(true);
+            Method bump = QueryWorker.class.getDeclaredMethod("bumpGeneration");
+            bump.setAccessible(true);
+            Method isWorker = QueryWorker.class.getDeclaredMethod("isCurrentThreadWorker");
+            isWorker.setAccessible(true);
+            Method close = queryImplClass.getDeclaredMethod("close", long.class);
+            close.setAccessible(true);
+            Method awaitNoTimeout = queryImplClass.getDeclaredMethod("await", long.class);
+            awaitNoTimeout.setAccessible(true);
+            Method awaitTimed = queryImplClass.getDeclaredMethod("await", long.class, long.class, TimeUnit.class);
+            awaitTimed.setAccessible(true);
+
+            QueryClientPool pool = new QueryClientPool(
+                    "ws::addr=localhost:9000;",
+                    /*minSize*/ 0, /*maxSize*/ 2,
+                    /*acquireTimeoutMillis*/ 1_000L,
+                    /*idleTimeoutMillis*/ Long.MAX_VALUE,
+                    /*maxLifetimeMillis*/ Long.MAX_VALUE);
+            QwpQueryClient client = QwpQueryClient.newPlainText("localhost", 9000);
+            try {
+                QueryWorker w = new QueryWorker(client, pool, 0);
+                bump.invoke(w); // generation -> 1: a live lease
+                Object impl = queryF.get(w);
+                doneF.setBoolean(impl, false); // a submit is in flight, as during a handler
+
+                // Off the worker thread the guard must NOT fire.
+                Assert.assertFalse("guard must not fire on a normal caller thread",
+                        (Boolean) isWorker.invoke(w));
 
-        // The worker thread must exit (it has observed shuttingDown).
-        t.join(5_000);
-        Assert.assertFalse("worker thread did not exit after shuttingDown=true",
-                t.isAlive());
+                // Stand in for a reentrant call from inside a result handler: the
+                // guard compares Thread.currentThread() to the worker's dispatch
+                // thread, so point that field at this thread.
+                threadF.set(w, Thread.currentThread());
+                Assert.assertTrue((Boolean) isWorker.invoke(w));
 
-        // The Completion must have been signalled. Without the fix, await(2s)
-        // returns false because signalDone is never called.
-        boolean completed;
+                assertThrowsHandlerReentry("close", () -> close.invoke(impl, 1L));
+                assertThrowsHandlerReentry("await", () -> awaitNoTimeout.invoke(impl, 1L));
+                assertThrowsHandlerReentry("await(timeout)",
+                        () -> awaitTimed.invoke(impl, 1L, 5L, TimeUnit.SECONDS));
+            } finally {
+                client.close();
+                pool.close();
+            }
+        });
+    }
+
+    private static void assertThrowsHandlerReentry(String op, ReflectiveCall call) throws Exception {
         try {
-            completed = completion.await(2, TimeUnit.SECONDS);
-        } catch (RuntimeException expectedAfterFix) {
-            // Once fixed, the worker is expected to call signalUnexpected
-            // with a QueryException("QuestDB handle is closed") which
-            // await() rethrows. Either form of "completed" is acceptable;
-            // the bug is the silent hang.
-            completed = true;
+            call.run();
+            Assert.fail(op + "() from the worker thread must throw, not block/deadlock");
+        } catch (InvocationTargetException e) {
+            Throwable cause = e.getCause();
+            Assert.assertTrue(op + "(): expected IllegalStateException, was " + cause,
+                    cause instanceof IllegalStateException);
+            Assert.assertTrue(op + "(): message must point at cancel(), was: " + cause.getMessage(),
+                    cause.getMessage().contains("cancel()"));
         }
-        Assert.assertTrue("BUG: QueryWorker.runLoop returned with shuttingDown=true "
-                + "while current!=null, never invoking runOn or signalUnexpected. "
-                + "The caller's Completion.await() hangs forever.", completed);
+    }
+
+    @FunctionalInterface
+    private interface ReflectiveCall {
+        void run() throws Exception;
     }
 }
diff --git a/core/src/test/java/io/questdb/client/test/impl/QuestDBImplErrorSafetyTest.java b/core/src/test/java/io/questdb/client/test/impl/QuestDBImplErrorSafetyTest.java
index 93b10301..75f89c3a 100644
--- a/core/src/test/java/io/questdb/client/test/impl/QuestDBImplErrorSafetyTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/QuestDBImplErrorSafetyTest.java
@@ -27,11 +27,10 @@
 import io.questdb.client.Sender;
 import io.questdb.client.cutlass.qwp.client.QwpQueryClient;
 import io.questdb.client.impl.QuestDBImpl;
+import io.questdb.client.test.tools.TestUtils;
 import org.junit.Assert;
 import org.junit.Test;
 
-import java.lang.reflect.Constructor;
-import java.lang.reflect.InvocationTargetException;
 import java.lang.reflect.Proxy;
 import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.function.Consumer;
@@ -48,9 +47,9 @@
 //
 // Sender is an interface, faked with a Proxy whose close() flips a flag, injected
 // via the SenderPool senderFactory seam. The connect Error is injected via the
-// QueryClientPool connectHook seam. Both are passed through the package-private
-// QuestDBImpl seam constructor (reached by reflection -- the main module is
-// declared `open`); production callers pass null for both.
+// QueryClientPool connectHook seam. Both are passed through the @TestOnly public
+// QuestDBImpl seam constructor; production uses the public overload that passes
+// null for both.
 public class QuestDBImplErrorSafetyTest {
 
     // Non-SF http config: the SenderPool factory replaces the build, but the
@@ -67,25 +66,27 @@ public class QuestDBImplErrorSafetyTest {
     //      delegate's close() runs.
     @Test(timeout = 30_000)
     public void ctorClosesBuiltSenderPoolWhenQueryPoolConstructionThrowsError() throws Exception {
-        AtomicBoolean senderClosed = new AtomicBoolean(false);
-        // senderMin = 1 -> SenderPool prewarms one observable delegate.
-        IntFunction<Sender> senderFactory = slotIndex -> fakeSender(senderClosed);
-        // queryMin = 1 -> QueryClientPool prewarm reaches connect(), which throws.
-        Consumer<QwpQueryClient> connectHook = client -> {
-            throw new AssertionError("injected native connect failure");
-        };
+        TestUtils.assertMemoryLeak(() -> {
+            AtomicBoolean senderClosed = new AtomicBoolean(false);
+            // senderMin = 1 -> SenderPool prewarms one observable delegate.
+            IntFunction<Sender> senderFactory = slotIndex -> fakeSender(senderClosed);
+            // queryMin = 1 -> QueryClientPool prewarm reaches connect(), which throws.
+            Consumer<QwpQueryClient> connectHook = client -> {
+                throw new AssertionError("injected native connect failure");
+            };
 
-        try {
-            newQuestDB(senderFactory, connectHook);
-            Assert.fail("expected QuestDBImpl construction to propagate the injected Error");
-        } catch (Throwable expected) {
-            // expected -- construction aborts
-        }
+            try {
+                newQuestDB(senderFactory, connectHook);
+                Assert.fail("expected QuestDBImpl construction to propagate the injected Error");
+            } catch (Throwable expected) {
+                // expected -- construction aborts
+            }
 
-        Assert.assertTrue(
-                "QuestDBImpl ctor leaked the already-built SenderPool on an Error from "
-                        + "QueryClientPool construction: the prewarmed delegate's close() was never called",
-                senderClosed.get());
+            Assert.assertTrue(
+                    "QuestDBImpl ctor leaked the already-built SenderPool on an Error from "
+                            + "QueryClientPool construction: the prewarmed delegate's close() was never called",
+                    senderClosed.get());
+        });
     }
 
     private static Sender fakeSender(AtomicBoolean closedFlag) {
@@ -122,33 +123,15 @@ private static Sender fakeSender(AtomicBoolean closedFlag) {
 
     private static QuestDBImpl newQuestDB(
             IntFunction<Sender> senderFactory, Consumer<QwpQueryClient> connectHook
-    ) throws Exception {
-        Constructor<QuestDBImpl> c = QuestDBImpl.class.getDeclaredConstructor(
-                String.class, String.class, int.class, int.class, int.class, int.class,
-                long.class, long.class, long.class, long.class,
-                IntFunction.class, Consumer.class);
-        c.setAccessible(true);
-        try {
-            return c.newInstance(
-                    SENDER_CFG, QUERY_CFG,
-                    /*senderMin*/ 1, /*senderMax*/ 1,
-                    /*queryMin*/ 1, /*queryMax*/ 1,
-                    /*acquireTimeoutMillis*/ 250L,
-                    /*idleTimeoutMillis*/ Long.MAX_VALUE,
-                    /*maxLifetimeMillis*/ Long.MAX_VALUE,
-                    /*housekeeperIntervalMillis*/ Long.MAX_VALUE,
-                    senderFactory, connectHook);
-        } catch (InvocationTargetException e) {
-            // Unwrap so the caller sees the real construction failure (Error or
-            // RuntimeException), matching a direct constructor invocation.
-            Throwable cause = e.getCause();
-            if (cause instanceof RuntimeException) {
-                throw (RuntimeException) cause;
-            }
-            if (cause instanceof Error) {
-                throw (Error) cause;
-            }
-            throw e;
-        }
+    ) {
+        return new QuestDBImpl(
+                SENDER_CFG, QUERY_CFG,
+                /*senderMin*/ 1, /*senderMax*/ 1,
+                /*queryMin*/ 1, /*queryMax*/ 1,
+                /*acquireTimeoutMillis*/ 250L,
+                /*idleTimeoutMillis*/ Long.MAX_VALUE,
+                /*maxLifetimeMillis*/ Long.MAX_VALUE,
+                /*housekeeperIntervalMillis*/ Long.MAX_VALUE,
+                senderFactory, connectHook);
     }
 }
diff --git a/core/src/test/java/io/questdb/client/test/impl/QwpConfigKeysTest.java b/core/src/test/java/io/questdb/client/test/impl/QwpConfigKeysTest.java
index b0706189..526f1d74 100644
--- a/core/src/test/java/io/questdb/client/test/impl/QwpConfigKeysTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/QwpConfigKeysTest.java
@@ -28,6 +28,7 @@
 import io.questdb.client.cutlass.qwp.client.QwpQueryClient;
 import io.questdb.client.impl.ConfigSchema;
 import io.questdb.client.impl.ConfigView;
+import io.questdb.client.test.tools.TestUtils;
 import org.junit.Assert;
 import org.junit.Test;
 
@@ -40,71 +41,81 @@
 public class QwpConfigKeysTest {
 
     @Test
-    public void testEverySchemaKeyIsRecognizedByBothClients() {
-        for (ConfigSchema.KeySpec spec : ConfigSchema.all()) {
-            String cfg = "ws::addr=h:9000;" + spec.name() + "=" + sampleValue(spec) + ";";
-            // A key may still fail a cross-key or range check; it must NOT fail
-            // as an unknown key -- that would mean it is missing from the
-            // registry (or that a consumer rejects a key it should ignore).
-            assertNotUnknown(spec.name(), () -> Sender.builder(cfg));
-            assertNotUnknown(spec.name(), () -> QwpQueryClient.fromConfig(cfg).close());
-        }
+    public void testEverySchemaKeyIsRecognizedByBothClients() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            for (ConfigSchema.KeySpec spec : ConfigSchema.all()) {
+                String cfg = "ws::addr=h:9000;" + spec.name() + "=" + sampleValue(spec) + ";";
+                // A key may still fail a cross-key or range check; it must NOT fail
+                // as an unknown key -- that would mean it is missing from the
+                // registry (or that a consumer rejects a key it should ignore).
+                assertNotUnknown(spec.name(), () -> Sender.builder(cfg));
+                assertNotUnknown(spec.name(), () -> QwpQueryClient.fromConfig(cfg).close());
+            }
+        });
     }
 
     @Test
-    public void testJunkKeyRejectedOnBoth() {
-        assertRejected("ws::addr=h:9000;not_a_real_key=foo;",
-                "unknown configuration key: not_a_real_key");
+    public void testJunkKeyRejectedOnBoth() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            assertRejected("ws::addr=h:9000;not_a_real_key=foo;",
+                    "unknown configuration key: not_a_real_key");
+        });
     }
 
     @Test
-    public void testLegacyKeysRejectedWithHintOnBoth() {
-        String legacyHint = "(applies to legacy http/tcp/udp transports only)";
-        assertRejected("ws::addr=h:9000;init_buf_size=1024;",
-                "unknown configuration key: init_buf_size", legacyHint);
-        assertRejected("ws::addr=h:9000;max_buf_size=1024;",
-                "unknown configuration key: max_buf_size", legacyHint);
-        assertRejected("ws::addr=h:9000;request_timeout=1000;",
-                "unknown configuration key: request_timeout", legacyHint);
-        assertRejected("ws::addr=h:9000;request_min_throughput=1000;",
-                "unknown configuration key: request_min_throughput", legacyHint);
-        assertRejected("ws::addr=h:9000;max_datagram_size=1400;",
-                "unknown configuration key: max_datagram_size", legacyHint);
-        assertRejected("ws::addr=h:9000;multicast_ttl=4;",
-                "unknown configuration key: multicast_ttl", legacyHint);
-        assertRejected("ws::addr=h:9000;retry_timeout=1000;",
-                "unknown configuration key: retry_timeout", "(use reconnect_max_duration_millis on ws/wss)");
-        assertRejected("ws::addr=h:9000;protocol_version=2;",
-                "unknown configuration key: protocol_version", "(QWP negotiates the protocol version during the WebSocket upgrade)");
+    public void testLegacyKeysRejectedWithHintOnBoth() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            String legacyHint = "(applies to legacy http/tcp/udp transports only)";
+            assertRejected("ws::addr=h:9000;init_buf_size=1024;",
+                    "unknown configuration key: init_buf_size", legacyHint);
+            assertRejected("ws::addr=h:9000;max_buf_size=1024;",
+                    "unknown configuration key: max_buf_size", legacyHint);
+            assertRejected("ws::addr=h:9000;request_timeout=1000;",
+                    "unknown configuration key: request_timeout", legacyHint);
+            assertRejected("ws::addr=h:9000;request_min_throughput=1000;",
+                    "unknown configuration key: request_min_throughput", legacyHint);
+            assertRejected("ws::addr=h:9000;max_datagram_size=1400;",
+                    "unknown configuration key: max_datagram_size", legacyHint);
+            assertRejected("ws::addr=h:9000;multicast_ttl=4;",
+                    "unknown configuration key: multicast_ttl", legacyHint);
+            assertRejected("ws::addr=h:9000;retry_timeout=1000;",
+                    "unknown configuration key: retry_timeout", "(use reconnect_max_duration_millis on ws/wss)");
+            assertRejected("ws::addr=h:9000;protocol_version=2;",
+                    "unknown configuration key: protocol_version", "(QWP negotiates the protocol version during the WebSocket upgrade)");
+        });
     }
 
     @Test
-    public void testRelocatedHintTableIsExactlyTheLegacyKeys() {
-        String legacyHint = "(applies to legacy http/tcp/udp transports only)";
-        Assert.assertEquals(legacyHint, ConfigView.relocatedHint("init_buf_size"));
-        Assert.assertEquals(legacyHint, ConfigView.relocatedHint("max_buf_size"));
-        Assert.assertEquals(legacyHint, ConfigView.relocatedHint("request_timeout"));
-        Assert.assertEquals(legacyHint, ConfigView.relocatedHint("request_min_throughput"));
-        Assert.assertEquals(legacyHint, ConfigView.relocatedHint("max_datagram_size"));
-        Assert.assertEquals(legacyHint, ConfigView.relocatedHint("multicast_ttl"));
-        Assert.assertEquals("(use reconnect_max_duration_millis on ws/wss)", ConfigView.relocatedHint("retry_timeout"));
-        Assert.assertEquals("(QWP negotiates the protocol version during the WebSocket upgrade)", ConfigView.relocatedHint("protocol_version"));
+    public void testRelocatedHintTableIsExactlyTheLegacyKeys() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            String legacyHint = "(applies to legacy http/tcp/udp transports only)";
+            Assert.assertEquals(legacyHint, ConfigView.relocatedHint("init_buf_size"));
+            Assert.assertEquals(legacyHint, ConfigView.relocatedHint("max_buf_size"));
+            Assert.assertEquals(legacyHint, ConfigView.relocatedHint("request_timeout"));
+            Assert.assertEquals(legacyHint, ConfigView.relocatedHint("request_min_throughput"));
+            Assert.assertEquals(legacyHint, ConfigView.relocatedHint("max_datagram_size"));
+            Assert.assertEquals(legacyHint, ConfigView.relocatedHint("multicast_ttl"));
+            Assert.assertEquals("(use reconnect_max_duration_millis on ws/wss)", ConfigView.relocatedHint("retry_timeout"));
+            Assert.assertEquals("(QWP negotiates the protocol version during the WebSocket upgrade)", ConfigView.relocatedHint("protocol_version"));
 
-        // No registry key (including POOL keys) carries a relocated hint.
-        for (ConfigSchema.KeySpec spec : ConfigSchema.all()) {
-            Assert.assertNull("registry key '" + spec.name() + "' must not be in the hint table",
-                    ConfigView.relocatedHint(spec.name()));
-        }
-        // ECDSA keys are plain unknowns (only the C client handles them).
-        Assert.assertNull(ConfigView.relocatedHint("token_x"));
-        Assert.assertNull(ConfigView.relocatedHint("token_y"));
-        Assert.assertNull(ConfigView.relocatedHint("not_a_real_key"));
+            // No registry key (including POOL keys) carries a relocated hint.
+            for (ConfigSchema.KeySpec spec : ConfigSchema.all()) {
+                Assert.assertNull("registry key '" + spec.name() + "' must not be in the hint table",
+                        ConfigView.relocatedHint(spec.name()));
+            }
+            // ECDSA keys are plain unknowns (only the C client handles them).
+            Assert.assertNull(ConfigView.relocatedHint("token_x"));
+            Assert.assertNull(ConfigView.relocatedHint("token_y"));
+            Assert.assertNull(ConfigView.relocatedHint("not_a_real_key"));
+        });
     }
 
     @Test
-    public void testTokenXYRejectedWithoutHintOnBoth() {
-        assertRejectedNoHint("ws::addr=h:9000;token_x=abc;", "token_x");
-        assertRejectedNoHint("ws::addr=h:9000;token_y=def;", "token_y");
+    public void testTokenXYRejectedWithoutHintOnBoth() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            assertRejectedNoHint("ws::addr=h:9000;token_x=abc;", "token_x");
+            assertRejectedNoHint("ws::addr=h:9000;token_y=def;", "token_y");
+        });
     }
 
     private static void assertNotUnknown(String key, Runnable action) {
diff --git a/core/src/test/java/io/questdb/client/test/impl/QwpQueryClientConfigHonoredTest.java b/core/src/test/java/io/questdb/client/test/impl/QwpQueryClientConfigHonoredTest.java
index c5c5edb7..00313004 100644
--- a/core/src/test/java/io/questdb/client/test/impl/QwpQueryClientConfigHonoredTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/QwpQueryClientConfigHonoredTest.java
@@ -28,6 +28,7 @@
 import io.questdb.client.cutlass.qwp.client.QwpQueryClient;
 import io.questdb.client.impl.ConfigSchema;
 import io.questdb.client.impl.Side;
+import io.questdb.client.test.tools.TestUtils;
 import org.junit.Assert;
 import org.junit.Test;
 
@@ -51,59 +52,62 @@ public class QwpQueryClientConfigHonoredTest {
     private final Set<String> honored = new HashSet<>();
 
     @Test
-    public void testEveryEgressKeyIsHonored() {
-        assertHonored("target=primary", "target", "primary");
-        assertHonored("failover=off", "failover", false);
-        assertHonored("failover_max_attempts=9", "failover_max_attempts", 9);
-        assertHonored("failover_backoff_initial_ms=120", "failover_backoff_initial_ms", 120L);
-        assertHonored("failover_backoff_max_ms=99999", "failover_backoff_max_ms", 99999L);
-        assertHonored("failover_max_duration_ms=56000", "failover_max_duration_ms", 56000L);
-        assertHonored("max_batch_rows=512", "max_batch_rows", 512);
-        assertHonored("initial_credit=65536", "initial_credit", 65536L);
-        assertHonored("buffer_pool_size=3", "buffer_pool_size", 3);
-        assertHonored("compression=zstd", "compression", "zstd");
-        assertHonored("compression_level=9", "compression_level", 9);
-        assertHonored("client_id=probe/1.0", "client_id", "probe/1.0");
-        assertHonored("zone=us-east", "zone", "us-east");
-        // COMMON applied by egress.
-        assertHonored("auth_timeout_ms=7777", "auth_timeout_ms", 7777L);
+    public void testEveryEgressKeyIsHonored() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            assertHonored("target=primary", "target", "primary");
+            assertHonored("failover=off", "failover", false);
+            assertHonored("failover_max_attempts=9", "failover_max_attempts", 9);
+            assertHonored("failover_backoff_initial_ms=120", "failover_backoff_initial_ms", 120L);
+            assertHonored("failover_backoff_max_ms=99999", "failover_backoff_max_ms", 99999L);
+            assertHonored("failover_max_duration_ms=56000", "failover_max_duration_ms", 56000L);
+            assertHonored("max_batch_rows=512", "max_batch_rows", 512);
+            assertHonored("initial_credit=65536", "initial_credit", 65536L);
+            assertHonored("buffer_pool_size=3", "buffer_pool_size", 3);
+            assertHonored("compression=zstd", "compression", "zstd");
+            assertHonored("compression_level=9", "compression_level", 9);
+            assertHonored("client_id=probe/1.0", "client_id", "probe/1.0");
+            assertHonored("zone=us-east", "zone", "us-east");
+            // COMMON applied by egress.
+            assertHonored("auth_timeout_ms=7777", "auth_timeout_ms", 7777L);
+            assertHonored("connect_timeout=6000", "connect_timeout", 6000);
 
-        // Credentials become the Authorization header, including the user/pass aliases.
-        String basic = "Basic " + Base64.getEncoder()
-                .encodeToString("alice:secret".getBytes(StandardCharsets.UTF_8));
-        Assert.assertEquals(basic, snapshot("ws::addr=h:9000;username=alice;password=secret;").get("authorization_header"));
-        Assert.assertEquals(basic, snapshot("ws::addr=h:9000;user=alice;pass=secret;").get("authorization_header"));
-        Assert.assertEquals("Bearer ey.abc", snapshot("ws::addr=h:9000;token=ey.abc;").get("authorization_header"));
-        markHonored("username", "password", "token");
+            // Credentials become the Authorization header, including the user/pass aliases.
+            String basic = "Basic " + Base64.getEncoder()
+                    .encodeToString("alice:secret".getBytes(StandardCharsets.UTF_8));
+            Assert.assertEquals(basic, snapshot("ws::addr=h:9000;username=alice;password=secret;").get("authorization_header"));
+            Assert.assertEquals(basic, snapshot("ws::addr=h:9000;user=alice;pass=secret;").get("authorization_header"));
+            Assert.assertEquals("Bearer ey.abc", snapshot("ws::addr=h:9000;token=ey.abc;").get("authorization_header"));
+            markHonored("username", "password", "token");
 
-        // COMMON TLS keys applied by egress (require the wss schema). tls_verify
-        // drives the validation mode; tls_roots/tls_roots_password set the trust
-        // store. All three read back from the snapshot.
-        Assert.assertEquals(ClientTlsConfiguration.TLS_VALIDATION_MODE_NONE,
-                snapshot("wss::addr=h:9000;tls_verify=unsafe_off;").get("tls_verify"));
-        Map<String, Object> tls = snapshot("wss::addr=h:9000;tls_roots=/ca.p12;tls_roots_password=pw;");
-        Assert.assertEquals("/ca.p12", tls.get("tls_roots"));
-        Assert.assertEquals("pw", tls.get("tls_roots_password"));
-        markHonored("tls_verify", "tls_roots", "tls_roots_password");
+            // COMMON TLS keys applied by egress (require the wss schema). tls_verify
+            // drives the validation mode; tls_roots/tls_roots_password set the trust
+            // store. All three read back from the snapshot.
+            Assert.assertEquals(ClientTlsConfiguration.TLS_VALIDATION_MODE_NONE,
+                    snapshot("wss::addr=h:9000;tls_verify=unsafe_off;").get("tls_verify"));
+            Map<String, Object> tls = snapshot("wss::addr=h:9000;tls_roots=/ca.p12;tls_roots_password=pw;");
+            Assert.assertEquals("/ca.p12", tls.get("tls_roots"));
+            Assert.assertEquals("pw", tls.get("tls_roots_password"));
+            markHonored("tls_verify", "tls_roots", "tls_roots_password");
 
-        // Drift guard: every egress-applied registry key must have an assertion
-        // above. The honored set is populated by the assertions themselves, so
-        // deleting one trips this -- unlike a hand-maintained list, it cannot
-        // silently drift from what is actually asserted.
-        for (ConfigSchema.KeySpec spec : ConfigSchema.all()) {
-            if (!spec.name().equals(spec.canonical())) {
-                continue; // alias (user/pass) -- covered via its canonical key
+            // Drift guard: every egress-applied registry key must have an assertion
+            // above. The honored set is populated by the assertions themselves, so
+            // deleting one trips this -- unlike a hand-maintained list, it cannot
+            // silently drift from what is actually asserted.
+            for (ConfigSchema.KeySpec spec : ConfigSchema.all()) {
+                if (!spec.name().equals(spec.canonical())) {
+                    continue; // alias (user/pass) -- covered via its canonical key
+                }
+                // The egress client applies its own EGRESS keys plus the COMMON keys
+                // (credentials, TLS, auth_timeout_ms). addr is the endpoint list (the
+                // connection target), not a snapshot value, so it is excluded.
+                boolean egressApplied = spec.side() == Side.EGRESS
+                        || (spec.side() == Side.COMMON && !spec.name().equals("addr"));
+                if (egressApplied) {
+                    Assert.assertTrue("registry egress key '" + spec.name() + "' has no honored assertion",
+                            honored.contains(spec.name()));
+                }
             }
-            // The egress client applies its own EGRESS keys plus the COMMON keys
-            // (credentials, TLS, auth_timeout_ms). addr is the endpoint list (the
-            // connection target), not a snapshot value, so it is excluded.
-            boolean egressApplied = spec.side() == Side.EGRESS
-                    || (spec.side() == Side.COMMON && !spec.name().equals("addr"));
-            if (egressApplied) {
-                Assert.assertTrue("registry egress key '" + spec.name() + "' has no honored assertion",
-                        honored.contains(spec.name()));
-            }
-        }
+        });
     }
 
     private void assertHonored(String kv, String snapKey, Object expected) {
diff --git a/core/src/test/java/io/questdb/client/test/impl/SenderLeaseGenerationTest.java b/core/src/test/java/io/questdb/client/test/impl/SenderLeaseGenerationTest.java
new file mode 100644
index 00000000..7b5b627a
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/impl/SenderLeaseGenerationTest.java
@@ -0,0 +1,152 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.impl;
+
+import io.questdb.client.Sender;
+import io.questdb.client.impl.PooledSender;
+import io.questdb.client.impl.SenderPool;
+import io.questdb.client.test.tools.TestUtils;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.lang.reflect.Constructor;
+import java.lang.reflect.Field;
+import java.lang.reflect.Method;
+import java.util.ArrayDeque;
+
+/**
+ * Ingest-side mirror of {@code QueryLeaseGenerationTest}: a stale pooled-Sender
+ * handle (held after close, with the slot since re-borrowed) must not disturb a
+ * later borrow of the same slot. {@code PooledSender} is now a fresh per-borrow
+ * wrapper carrying the lease generation; the reused {@code SenderSlot} validates
+ * it under the pool lock so a stale close/write is dropped.
+ * <p>
+ * Reaches package-private internals by reflection (same white-box style as the
+ * other tests here); {@code SenderSlot} is constructed with a {@code null}
+ * delegate, which the paths under test never dereference.
+ */
+public class SenderLeaseGenerationTest {
+
+    private static final String DEAD_HTTP_CONFIG =
+            "http::addr=127.0.0.1:1;protocol_version=2;auto_flush=off;";
+
+    /**
+     * The pool-wide blast radius: a stale (duplicate / post-reborrow) close must
+     * never enqueue a slot a live borrower owns, or two borrowers would write
+     * into one delegate's buffer at once. {@code giveBack} validates the lease
+     * generation under the pool lock, so this is impossible.
+     */
+    @Test
+    @SuppressWarnings("unchecked")
+    public void testStaleGiveBackDoesNotEnqueueSlotTwice() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            Class<?> slotClass = Class.forName("io.questdb.client.impl.SenderSlot");
+            Constructor<?> slotCtor = slotClass.getDeclaredConstructor(Sender.class, SenderPool.class, int.class);
+            slotCtor.setAccessible(true);
+            Method bump = slotClass.getDeclaredMethod("bumpGeneration");
+            bump.setAccessible(true);
+            Method generation = slotClass.getDeclaredMethod("generation");
+            generation.setAccessible(true);
+            Constructor<PooledSender> leaseCtor =
+                    PooledSender.class.getDeclaredConstructor(slotClass, long.class);
+            leaseCtor.setAccessible(true);
+            Field availableF = SenderPool.class.getDeclaredField("available");
+            availableF.setAccessible(true);
+
+            try (SenderPool pool = new SenderPool(
+                    DEAD_HTTP_CONFIG, /*minSize*/ 0, /*maxSize*/ 2,
+                    /*acquireTimeoutMillis*/ 1_000L,
+                    /*idleTimeoutMillis*/ Long.MAX_VALUE,
+                    /*maxLifetimeMillis*/ Long.MAX_VALUE)) {
+                ArrayDeque<Object> available = (ArrayDeque<Object>) availableF.get(pool);
+                Object slot = slotCtor.newInstance(null, pool, -1);
+
+                // borrow #1 stamps generation 1; lease A captures 1.
+                bump.invoke(slot);
+                Assert.assertEquals(1L, generation.invoke(slot));
+                PooledSender leaseA = leaseCtor.newInstance(slot, 1L);
+
+                // close A -> giveBack(A): matches, enqueues once.
+                pool.giveBack(leaseA);
+                Assert.assertEquals("valid close must enqueue the slot once", 1, available.size());
+
+                // duplicate close A (e.g. explicit close + try-with-resources)
+                // -> giveBack(A): generation already bumped to 2, so it is dropped.
+                pool.giveBack(leaseA);
+                Assert.assertEquals("duplicate close of the same lease must be dropped",
+                        1, available.size());
+
+                // borrow #2 hands the slot to a new borrower B: pull it out, stamp 3.
+                available.pollFirst();
+                bump.invoke(slot);
+                Assert.assertEquals(3L, generation.invoke(slot));
+                PooledSender leaseB = leaseCtor.newInstance(slot, 3L);
+
+                // A stray close from the long-dead lease A -> dropped, so B's slot is
+                // NOT re-enqueued while B still owns it.
+                pool.giveBack(leaseA);
+                Assert.assertEquals("a post-reborrow stale close must NOT enqueue the slot "
+                        + "while another borrower owns it", 0, available.size());
+
+                // B's own close -> giveBack(B): matches, enqueues legitimately.
+                pool.giveBack(leaseB);
+                Assert.assertEquals("the current borrower's close must still work",
+                        1, available.size());
+            }
+        });
+    }
+
+    /**
+     * A stale lease's data write must be rejected (not silently land in a slot a
+     * later borrower now owns). The generation guard in
+     * {@code SenderSlot.live()} throws before the delegate is touched.
+     */
+    @Test
+    public void testStaleWriteIsRejected() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            Class<?> slotClass = Class.forName("io.questdb.client.impl.SenderSlot");
+            Constructor<?> slotCtor = slotClass.getDeclaredConstructor(Sender.class, SenderPool.class, int.class);
+            slotCtor.setAccessible(true);
+            Method bump = slotClass.getDeclaredMethod("bumpGeneration");
+            bump.setAccessible(true);
+            Constructor<PooledSender> leaseCtor =
+                    PooledSender.class.getDeclaredConstructor(slotClass, long.class);
+            leaseCtor.setAccessible(true);
+
+            Object slot = slotCtor.newInstance(null, null, -1);
+            bump.invoke(slot); // generation -> 1, lease A captures 1
+            PooledSender leaseA = leaseCtor.newInstance(slot, 1L);
+            bump.invoke(slot); // released
+            bump.invoke(slot); // re-borrowed -> generation 3
+
+            try {
+                leaseA.table("x");
+                Assert.fail("a stale lease's write must throw, not reach the re-borrowed slot");
+            } catch (IllegalStateException expected) {
+                Assert.assertTrue(expected.getMessage(), expected.getMessage().contains("closed"));
+            }
+        });
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/impl/SenderPoolErrorSafetyTest.java b/core/src/test/java/io/questdb/client/test/impl/SenderPoolErrorSafetyTest.java
index b7b56e7a..81055bb6 100644
--- a/core/src/test/java/io/questdb/client/test/impl/SenderPoolErrorSafetyTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/SenderPoolErrorSafetyTest.java
@@ -25,11 +25,13 @@
 package io.questdb.client.test.impl;
 
 import io.questdb.client.Sender;
+import io.questdb.client.impl.PooledSender;
 import io.questdb.client.impl.SenderPool;
+import io.questdb.client.test.tools.TestUtils;
 import org.junit.Assert;
 import org.junit.Test;
 
-import java.lang.reflect.Constructor;
+import java.lang.reflect.Field;
 import java.lang.reflect.Proxy;
 import java.nio.file.Paths;
 import java.util.concurrent.atomic.AtomicBoolean;
@@ -58,25 +60,27 @@ public class SenderPoolErrorSafetyTest {
     // GREEN: catch (Throwable) -> the cleanup loop closes the 1st delegate.
     @Test(timeout = 30_000)
     public void preWarmClosesBuiltDelegatesWhenBuildThrowsError() throws Exception {
-        AtomicBoolean firstClosed = new AtomicBoolean(false);
-        AtomicInteger calls = new AtomicInteger();
-        IntFunction<Sender> factory = slotIndex -> {
-            if (calls.incrementAndGet() >= 2) {
-                throw new AssertionError("injected native build failure");
+        TestUtils.assertMemoryLeak(() -> {
+            AtomicBoolean firstClosed = new AtomicBoolean(false);
+            AtomicInteger calls = new AtomicInteger();
+            IntFunction<Sender> factory = slotIndex -> {
+                if (calls.incrementAndGet() >= 2) {
+                    throw new AssertionError("injected native build failure");
+                }
+                return fakeSender(firstClosed);
+            };
+
+            try {
+                newPool(CFG, 2, 2, 250, factory);
+                Assert.fail("expected prewarm to propagate the injected Error");
+            } catch (Throwable expected) {
+                // expected -- construction aborts
             }
-            return fakeSender(firstClosed);
-        };
-
-        try {
-            newPool(CFG, 2, 2, 250, factory);
-            Assert.fail("expected prewarm to propagate the injected Error");
-        } catch (Throwable expected) {
-            // expected -- construction aborts
-        }
-
-        Assert.assertTrue(
-                "prewarm leaked an already-built delegate: its close() was never called on an Error",
-                firstClosed.get());
+
+            Assert.assertTrue(
+                    "prewarm leaked an already-built delegate: its close() was never called on an Error",
+                    firstClosed.get());
+        });
     }
 
     // Companion to the catch (RuntimeException) -> track-normal-completion fix in
@@ -92,31 +96,46 @@ public void preWarmClosesBuiltDelegatesWhenBuildThrowsError() throws Exception {
     //      discardBroken() -> the next borrow() builds a fresh wrapper.
     @Test(timeout = 30_000)
     public void flushErrorDiscardsBrokenSenderInsteadOfRecycling() throws Exception {
-        IntFunction<Sender> factory = slotIndex -> flushThrowingSender();
+        TestUtils.assertMemoryLeak(() -> {
+            IntFunction<Sender> factory = slotIndex -> flushThrowingSender();
 
-        try (SenderPool pool = newPool(CFG, 1, 1, 1_000, factory)) {
-            Sender first = pool.borrow();
-            try {
-                first.close();
-                Assert.fail("close() must propagate the Error thrown by flush()");
-            } catch (AssertionError expected) {
-                // expected: the original throwable propagates naturally
-            }
+            try (SenderPool pool = newPool(CFG, 1, 1, 1_000, factory)) {
+                Sender first = pool.borrow();
+                // Capture the underlying slot before close(): borrow() always hands
+                // out a FRESH PooledSender wrapper, so assertNotSame(first, second)
+                // on the wrappers is vacuously true and proves nothing -- it stays
+                // true whether or not the broken slot was discarded. The pool
+                // recycles slots, not wrappers, so a broken slot leaking back to
+                // the next borrower shows up as the SAME slot. Assert on the slot.
+                Object firstSlot = slotOf(first);
+                try {
+                    first.close();
+                    Assert.fail("close() must propagate the Error thrown by flush()");
+                } catch (AssertionError expected) {
+                    // expected: the original throwable propagates naturally
+                }
 
-            Sender second = pool.borrow();
-            try {
-                Assert.assertNotSame(
-                        "a sender whose flush() exited with an Error must be discarded, not recycled",
-                        first, second);
-            } finally {
-                // second's flush() also throws on close(); swallow on teardown.
+                Sender second = pool.borrow();
                 try {
-                    second.close();
-                } catch (AssertionError ignored) {
-                    // expected
+                    Assert.assertNotSame(
+                            "a sender whose flush() exited with an Error must be discarded, not recycled",
+                            firstSlot, slotOf(second));
+                } finally {
+                    // second's flush() also throws on close(); swallow on teardown.
+                    try {
+                        second.close();
+                    } catch (AssertionError ignored) {
+                        // expected
+                    }
                 }
             }
-        }
+        });
+    }
+
+    private static Object slotOf(Sender pooledWrapper) throws Exception {
+        Field f = PooledSender.class.getDeclaredField("slot");
+        f.setAccessible(true);
+        return f.get(pooledWrapper);
     }
 
     // Like fakeSender(), but flush() throws an Error to drive the
@@ -173,42 +192,44 @@ private static Sender flushThrowingSender() {
     //      succeeds, proving capacity survived the failed grow.
     @Test(timeout = 30_000)
     public void borrowReleasesSfSlotIndexWhenCreationFails() throws Exception {
-        // Unique, non-existent sf_dir: minSize=0 means no pre-warm, so the dir
-        // is never created and the constructor's startup SF recovery is a no-op.
-        // The factory replaces createUnlocked(), so localhost:1 is never dialed.
-        String sfDir = Paths.get(System.getProperty("java.io.tmpdir"),
-                "qdb-sf-borrowfail-" + System.nanoTime()).toString();
-        String sfCfg = "ws::addr=localhost:1;sf_dir=" + sfDir + ";";
-
-        AtomicInteger calls = new AtomicInteger();
-        IntFunction<Sender> factory = slotIndex -> {
-            // First borrow-triggered build fails (the slot index reserved for
-            // it must be released); later builds succeed.
-            if (calls.getAndIncrement() == 0) {
-                throw new AssertionError("injected native build failure on first grow");
-            }
-            return fakeSender(new AtomicBoolean());
-        };
+        TestUtils.assertMemoryLeak(() -> {
+            // Unique, non-existent sf_dir: minSize=0 means no pre-warm, so the dir
+            // is never created and the constructor's startup SF recovery is a no-op.
+            // The factory replaces createUnlocked(), so localhost:1 is never dialed.
+            String sfDir = Paths.get(System.getProperty("java.io.tmpdir"),
+                    "qdb-sf-borrowfail-" + System.nanoTime()).toString();
+            String sfCfg = "ws::addr=localhost:1;sf_dir=" + sfDir + ";";
 
-        try (SenderPool pool = newPool(sfCfg, 0, 1, 2_000, factory)) {
-            try {
-                pool.borrow();
-                Assert.fail("borrow() must propagate the Error from the failed build");
-            } catch (AssertionError expected) {
-                // expected: the original throwable propagates out of borrow()
-            }
+            AtomicInteger calls = new AtomicInteger();
+            IntFunction<Sender> factory = slotIndex -> {
+                // First borrow-triggered build fails (the slot index reserved for
+                // it must be released); later builds succeed.
+                if (calls.getAndIncrement() == 0) {
+                    throw new AssertionError("injected native build failure on first grow");
+                }
+                return fakeSender(new AtomicBoolean());
+            };
 
-            // The single SF slot index must have been returned to the free set.
-            // If it leaked, this borrow() trips the capacity invariant (or, in
-            // the timeout-only variant, exhausts the acquire budget).
-            Sender second = pool.borrow();
-            try {
-                Assert.assertNotNull(
-                        "after a failed grow the SF slot index must be reusable", second);
-            } finally {
-                second.close();
+            try (SenderPool pool = newPool(sfCfg, 0, 1, 2_000, factory)) {
+                try {
+                    pool.borrow();
+                    Assert.fail("borrow() must propagate the Error from the failed build");
+                } catch (AssertionError expected) {
+                    // expected: the original throwable propagates out of borrow()
+                }
+
+                // The single SF slot index must have been returned to the free set.
+                // If it leaked, this borrow() trips the capacity invariant (or, in
+                // the timeout-only variant, exhausts the acquire budget).
+                Sender second = pool.borrow();
+                try {
+                    Assert.assertNotNull(
+                            "after a failed grow the SF slot index must be reusable", second);
+                } finally {
+                    second.close();
+                }
             }
-        }
+        });
     }
 
     private static Sender fakeSender(AtomicBoolean closedFlag) {
@@ -246,10 +267,7 @@ private static Sender fakeSender(AtomicBoolean closedFlag) {
 
     private static SenderPool newPool(
             String cfg, int min, int max, long acquireMs, IntFunction<Sender> senderFactory
-    ) throws Exception {
-        Constructor<SenderPool> c = SenderPool.class.getDeclaredConstructor(
-                String.class, int.class, int.class, long.class, long.class, long.class, IntFunction.class);
-        c.setAccessible(true);
-        return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, senderFactory);
+    ) {
+        return new SenderPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, senderFactory);
     }
 }
diff --git a/core/src/test/java/io/questdb/client/test/impl/SenderPoolSfTest.java b/core/src/test/java/io/questdb/client/test/impl/SenderPoolSfTest.java
index e4b2b49a..2c76997d 100644
--- a/core/src/test/java/io/questdb/client/test/impl/SenderPoolSfTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/SenderPoolSfTest.java
@@ -43,7 +43,6 @@
 import org.slf4j.LoggerFactory;
 
 import java.io.IOException;
-import java.lang.reflect.Constructor;
 import java.lang.reflect.Field;
 import java.lang.reflect.Method;
 import java.nio.ByteBuffer;
@@ -207,7 +206,10 @@ public void testReturnedSenderReusesSameSlot() throws Exception {
                     first.close();
                     PooledSender second = pool.borrow();
                     try {
-                        Assert.assertSame("returned slot must be recycled", first, second);
+                        // borrow() now returns a fresh wrapper each time; the
+                        // recycled thing is the underlying slot.
+                        Assert.assertSame("returned slot must be recycled",
+                                getField(first, "slot"), getField(second, "slot"));
                         Assert.assertEquals("no new slot dir on recycle", 1, countSlotDirs());
                         Assert.assertTrue(Files.exists(slot("default-0")));
                     } finally {
@@ -1883,9 +1885,12 @@ private static void rmDir(String dir) {
     }
 
     private static Sender getDelegate(PooledSender ps) throws Exception {
-        Field f = PooledSender.class.getDeclaredField("delegate");
+        Field slotF = PooledSender.class.getDeclaredField("slot");
+        slotF.setAccessible(true);
+        Object slot = slotF.get(ps);
+        Field f = slot.getClass().getDeclaredField("delegate");
         f.setAccessible(true);
-        return (Sender) f.get(ps);
+        return (Sender) f.get(slot);
     }
 
     // Invokes one of the pool's private managed-slot delegate factories
@@ -1931,27 +1936,20 @@ private static void invokeDiscardBroken(SenderPool pool, PooledSender ps) throws
         m.invoke(pool, ps);
     }
 
-    // Reaches the package-private senderFactory test seam by reflection so a
-    // test can inject a fake/forged delegate (mirrors SenderPoolErrorSafetyTest).
+    // Uses the @TestOnly senderFactory seam so a test can inject a fake/forged
+    // delegate (mirrors SenderPoolErrorSafetyTest).
     private static SenderPool newPoolWithFactory(
             String cfg, int min, int max, long acquireMs, IntFunction<Sender> senderFactory
-    ) throws Exception {
-        Constructor<SenderPool> c = SenderPool.class.getDeclaredConstructor(
-                String.class, int.class, int.class, long.class, long.class, long.class, IntFunction.class);
-        c.setAccessible(true);
-        return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, senderFactory);
+    ) {
+        return new SenderPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, senderFactory);
     }
 
-    // Reaches the package-private 8-arg constructor (deferStartupRecovery=true)
-    // by reflection so a test can build a pool whose SF startup recovery is NOT
-    // run inline -- mirroring the pooled QuestDB handle, which defers it to the
-    // housekeeper. senderFactory=null -> the real defaultSender().
-    private static SenderPool newDeferredPool(String cfg, int min, int max, long acquireMs) throws Exception {
-        Constructor<SenderPool> c = SenderPool.class.getDeclaredConstructor(
-                String.class, int.class, int.class, long.class, long.class, long.class,
-                IntFunction.class, boolean.class);
-        c.setAccessible(true);
-        return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, null, true);
+    // Uses the @TestOnly 8-arg constructor (deferStartupRecovery=true) so a test
+    // can build a pool whose SF startup recovery is NOT run inline -- mirroring
+    // the pooled QuestDB handle, which defers it to the housekeeper.
+    // senderFactory=null -> the real defaultSender().
+    private static SenderPool newDeferredPool(String cfg, int min, int max, long acquireMs) {
+        return new SenderPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, null, true);
     }
 
     // Drives a deferred pool's startup recovery to completion (the housekeeper
@@ -1982,12 +1980,8 @@ private static void invokeMarkClosing(SenderPool pool) throws Exception {
     // test can drive the housekeeper recovery path against fully controlled
     // (fake) recoverers.
     private static SenderPool newDeferredPoolWithFactory(
-            String cfg, int min, int max, long acquireMs, IntFunction<Sender> factory) throws Exception {
-        Constructor<SenderPool> c = SenderPool.class.getDeclaredConstructor(
-                String.class, int.class, int.class, long.class, long.class, long.class,
-                IntFunction.class, boolean.class);
-        c.setAccessible(true);
-        return c.newInstance(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, factory, true);
+            String cfg, int min, int max, long acquireMs, IntFunction<Sender> factory) {
+        return new SenderPool(cfg, min, max, acquireMs, Long.MAX_VALUE, Long.MAX_VALUE, factory, true);
     }
 
     // Fake Sender whose drain() (for slot 0 only) parks until released, opening a
diff --git a/core/src/test/java/io/questdb/client/test/impl/SenderPoolTest.java b/core/src/test/java/io/questdb/client/test/impl/SenderPoolTest.java
index 85952f85..3f16b965 100644
--- a/core/src/test/java/io/questdb/client/test/impl/SenderPoolTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/SenderPoolTest.java
@@ -34,10 +34,7 @@
 
 import java.lang.reflect.Field;
 import java.lang.reflect.Proxy;
-import java.util.concurrent.CountDownLatch;
-import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;
-import java.util.concurrent.atomic.AtomicReference;
 
 /**
  * Unit tests for the {@link SenderPool} borrow/return semantics. Uses the
@@ -57,26 +54,36 @@ public class SenderPoolTest {
             "http::addr=127.0.0.1:1;protocol_version=2;auto_flush=off;";
 
     @Test
-    public void testBorrowReturnRecyclesSameDecorator() {
+    public void testBorrowReturnRecyclesSameDecorator() throws Exception {
         try (SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE)) {
             Sender first = pool.borrow();
             first.close();
             Sender second = pool.borrow();
-            Assert.assertSame("returned decorator should be reused after close()", first, second);
+            // Each borrow is a fresh PooledSender wrapper; what the pool recycles
+            // is the underlying slot, so compare those rather than the handles.
+            Assert.assertSame("returned slot should be recycled after close()",
+                    slotOf(first), slotOf(second));
             second.close();
         }
     }
 
+    private static Object slotOf(Sender pooledWrapper) throws Exception {
+        Field f = PooledSender.class.getDeclaredField("slot");
+        f.setAccessible(true);
+        return f.get(pooledWrapper);
+    }
+
     @Test
-    public void testBrokenSenderIsNotReturnedToPool() {
+    public void testBrokenSenderIsNotReturnedToPool() throws Exception {
         // Borrowing, buffering a row, and then closing forces flush() against
-        // the unreachable address, which throws. The broken wrapper must not
-        // be returned to the pool: its delegate's buffer still holds the
-        // failed row, and on transports with terminal-failure semantics the
-        // delegate is also unusable. Either way, the next borrower must get
-        // a fresh wrapper.
+        // the unreachable address, which throws. The broken slot must not be
+        // returned to the pool: its delegate's buffer still holds the failed
+        // row, and on transports with terminal-failure semantics the delegate
+        // is also unusable. Either way, the next borrower must get a fresh
+        // slot, not the broken one.
         try (SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE)) {
             Sender first = pool.borrow();
+            Object firstSlot = slotOf(first);
             first.table("t").longColumn("v", 1).atNow();
             try {
                 first.close();
@@ -86,11 +93,23 @@ public void testBrokenSenderIsNotReturnedToPool() {
             }
             Sender second = pool.borrow();
             try {
-                Assert.assertNotSame("broken sender must not be handed back to next borrower",
-                        first, second);
+                // borrow() always hands out a FRESH PooledSender wrapper, so
+                // assertNotSame(first, second) on the wrappers is vacuously
+                // true and proves nothing -- it stays true whether or not the
+                // broken slot was discarded. What the pool recycles is the
+                // underlying slot, so a broken slot leaking back to the next
+                // borrower shows up as the SAME slot. Assert the slot differs.
+                Assert.assertNotSame("broken slot must not be handed back to next borrower",
+                        firstSlot, slotOf(second));
             } finally {
-                if (second != first) {
+                // On the failing path (broken slot recycled) second.close()
+                // re-throws, since its delegate's buffer still holds the
+                // failed row; swallow it so the assertion above is what
+                // surfaces rather than this incidental close() failure.
+                try {
                     second.close();
+                } catch (LineSenderException ignored) {
+                    // expected only when the regression is present
                 }
             }
         }
@@ -319,180 +338,6 @@ public void testReapIdleRespectsMinSize() throws InterruptedException {
         }
     }
 
-    @Test
-    public void testPinAfterCloseRejectsStaleEntry() throws Exception {
-        // Pin from a worker thread, close the pool from main. The worker's
-        // ThreadLocal still references its PooledSender, but the underlying
-        // delegate has been closed. The next pinToCurrentThread() on the
-        // worker must reject the stale entry instead of handing it back.
-        SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE);
-        CountDownLatch pinned = new CountDownLatch(1);
-        CountDownLatch closed = new CountDownLatch(1);
-        AtomicReference<Throwable> secondCallError = new AtomicReference<>();
-        Thread worker = new Thread(() -> {
-            try {
-                pool.pinToCurrentThread();
-                pinned.countDown();
-                Assert.assertTrue(closed.await(2, TimeUnit.SECONDS));
-                try {
-                    pool.pinToCurrentThread();
-                    secondCallError.set(new AssertionError("pinToCurrentThread after close must throw"));
-                } catch (LineSenderException e) {
-                    // expected
-                }
-            } catch (Throwable t) {
-                secondCallError.set(t);
-            }
-        });
-        worker.start();
-        Assert.assertTrue(pinned.await(2, TimeUnit.SECONDS));
-        pool.close();
-        closed.countDown();
-        worker.join(2_000);
-        if (secondCallError.get() != null) {
-            throw new AssertionError(secondCallError.get());
-        }
-    }
-
-    @Test
-    public void testPinAfterUserCloseDoesNotShareWrapper() {
-        // Same-thread reproducer for the pinToCurrentThread() sharing bug.
-        // The user closes a pinned Sender (the natural try-with-resources
-        // idiom on the public Sender API), then another consumer borrows
-        // the slot. pinToCurrentThread() must not hand that wrapper back:
-        // it is now owned by the second consumer.
-        //
-        // Pool size 1 collapses the race window into a linear sequence:
-        // the second borrower deterministically receives the same slot
-        // that was just returned, so the bug is observable at the
-        // wrapper-identity level without timing.
-        try (SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 100, Long.MAX_VALUE, Long.MAX_VALUE)) {
-            Sender pinned = pool.pinToCurrentThread();
-            pinned.close();                                   // pool slot returned; ThreadLocal still points at it
-            Sender stolen = pool.borrow();                    // pollFirst hands the same wrapper to a new consumer
-            try {
-                Sender repinned = pool.pinToCurrentThread();
-                Assert.fail("pinToCurrentThread() returned wrapper " + repinned
-                        + " already borrowed by another consumer " + stolen);
-            } catch (LineSenderException expected) {
-                // After fix: TL cleared (or owner-thread invalidated) on close;
-                // re-pin tries to borrow, pool is empty, acquireTimeout fires.
-            } finally {
-                stolen.close();
-            }
-        }
-    }
-
-    @Test
-    public void testPinAfterUserCloseDoesNotShareWrapperCrossThread() throws InterruptedException {
-        // Cross-thread variant of the same bug, mirroring the originally
-        // reported trigger: Thread A pins, closes, then re-pins while
-        // Thread B has borrowed the slot in between. A's ThreadLocal still
-        // references the wrapper, and pinToCurrentThread() hands it back --
-        // so A and B end up writing to the same underlying Sender.
-        try (SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 100, Long.MAX_VALUE, Long.MAX_VALUE)) {
-            CountDownLatch aClosed = new CountDownLatch(1);
-            CountDownLatch bBorrowed = new CountDownLatch(1);
-            AtomicReference<Sender> bSender = new AtomicReference<>();
-            AtomicReference<Throwable> failure = new AtomicReference<>();
-
-            Thread a = new Thread(() -> {
-                try {
-                    Sender s = pool.pinToCurrentThread();
-                    s.close();
-                    aClosed.countDown();
-                    Assert.assertTrue(bBorrowed.await(2, TimeUnit.SECONDS));
-                    try {
-                        Sender repinned = pool.pinToCurrentThread();
-                        failure.compareAndSet(null, new AssertionError(
-                                "pinToCurrentThread() returned wrapper " + repinned
-                                        + " already borrowed by another thread " + bSender.get()));
-                    } catch (LineSenderException expected) {
-                        // After fix: re-pin tries to borrow, pool is empty, times out.
-                    }
-                } catch (Throwable t) {
-                    failure.compareAndSet(null, t);
-                }
-            });
-            Thread b = new Thread(() -> {
-                try {
-                    Assert.assertTrue(aClosed.await(2, TimeUnit.SECONDS));
-                    bSender.set(pool.borrow());
-                } catch (Throwable t) {
-                    failure.compareAndSet(null, t);
-                } finally {
-                    bBorrowed.countDown();
-                }
-            });
-
-            a.start();
-            b.start();
-            a.join(4_000);
-            b.join(4_000);
-
-            if (bSender.get() != null) {
-                bSender.get().close();
-            }
-            if (failure.get() != null) {
-                throw new AssertionError(failure.get());
-            }
-        }
-    }
-
-    @Test
-    public void testReleaseAfterCloseIsSafe() throws Exception {
-        // Same setup as the pin test, but exercise releaseCurrentThread()
-        // instead. With a closed delegate underneath, the release path must
-        // not invoke flush() on the dead Sender.
-        SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 1, 1, 1_000, Long.MAX_VALUE, Long.MAX_VALUE);
-        CountDownLatch pinned = new CountDownLatch(1);
-        CountDownLatch closed = new CountDownLatch(1);
-        AtomicReference<Throwable> releaseError = new AtomicReference<>();
-        Thread worker = new Thread(() -> {
-            try {
-                pool.pinToCurrentThread();
-                pinned.countDown();
-                Assert.assertTrue(closed.await(2, TimeUnit.SECONDS));
-                pool.releaseCurrentThread();
-            } catch (Throwable t) {
-                releaseError.set(t);
-            }
-        });
-        worker.start();
-        Assert.assertTrue(pinned.await(2, TimeUnit.SECONDS));
-        pool.close();
-        closed.countDown();
-        worker.join(2_000);
-        if (releaseError.get() != null) {
-            throw new AssertionError(releaseError.get());
-        }
-    }
-
-    @Test
-    public void testThreadAffinityIsPerThread() throws InterruptedException {
-        try (SenderPool pool = new SenderPool(DEAD_HTTP_CONFIG, 2, 2, 1_000, Long.MAX_VALUE, Long.MAX_VALUE)) {
-            Sender mainPinned = pool.pinToCurrentThread();
-            Assert.assertSame("re-pin on same thread returns same instance",
-                    mainPinned, pool.pinToCurrentThread());
-
-            AtomicReference<Sender> otherPinned = new AtomicReference<>();
-            CountDownLatch done = new CountDownLatch(1);
-            Thread t = new Thread(() -> {
-                try {
-                    otherPinned.set(pool.pinToCurrentThread());
-                } finally {
-                    done.countDown();
-                }
-            });
-            t.start();
-            Assert.assertTrue(done.await(2, TimeUnit.SECONDS));
-            Assert.assertNotSame("different threads must get different pinned Senders",
-                    mainPinned, otherPinned.get());
-
-            pool.releaseCurrentThread();
-        }
-    }
-
     // ----------------------------------------------------------------------
     // Teardown robustness: a delegate close() can throw an Error (e.g. an
     // -ea AssertionError), not just a RuntimeException. The pool's best-effort
@@ -578,9 +423,12 @@ public void testCloseSurvivesDelegateCloseError() throws Exception {
      * while the test does not leak native memory.
      */
     private static void installFailingCloseDelegate(PooledSender ps, AtomicInteger closeAttempts) throws Exception {
-        Field f = PooledSender.class.getDeclaredField("delegate");
+        Field slotF = PooledSender.class.getDeclaredField("slot");
+        slotF.setAccessible(true);
+        Object slot = slotF.get(ps);
+        Field f = slot.getClass().getDeclaredField("delegate");
         f.setAccessible(true);
-        Sender real = (Sender) f.get(ps);
+        Sender real = (Sender) f.get(slot);
         Sender failing = (Sender) Proxy.newProxyInstance(
                 Sender.class.getClassLoader(),
                 new Class[]{Sender.class},
@@ -601,6 +449,6 @@ private static void installFailingCloseDelegate(PooledSender ps, AtomicInteger c
                     }
                     return method.invoke(real, args);
                 });
-        f.set(ps, failing);
+        f.set(slot, failing);
     }
 }
diff --git a/core/src/test/java/io/questdb/client/test/impl/WsSenderConfigHonoredTest.java b/core/src/test/java/io/questdb/client/test/impl/WsSenderConfigHonoredTest.java
index 69453c77..51003bfc 100644
--- a/core/src/test/java/io/questdb/client/test/impl/WsSenderConfigHonoredTest.java
+++ b/core/src/test/java/io/questdb/client/test/impl/WsSenderConfigHonoredTest.java
@@ -77,6 +77,7 @@ public void testEveryIngressKeyIsHonored() {
         assertHonored("connection_listener_inbox_capacity=64", "connection_listener_inbox_capacity", 64);
         assertHonored("token=ey.abc", "token", "ey.abc");
         assertHonored("auth_timeout_ms=4321", "auth_timeout_ms", 4321L);
+        assertHonored("connect_timeout=7000", "connect_timeout", 7000);
 
         // username/password together (both-or-neither), and the user/pass aliases.
         Map<String, Object> creds = snapshot("ws::addr=h:9000;username=alice;password=secret;");
diff --git a/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketHandshakeOverflowTest.java b/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketHandshakeOverflowTest.java
index 25b138bd..8d4ca755 100644
--- a/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketHandshakeOverflowTest.java
+++ b/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketHandshakeOverflowTest.java
@@ -81,7 +81,8 @@ public void testHandshakeWrapOverflowWithNonEmptyBufferShouldNotLoopForever() th
                 CountDownLatch done = new CountDownLatch(1);
                 t = new Thread(() -> {
                     try {
-                        socket.startTlsSession("test.host");
+                        socket.startTlsSession("test.host", op -> {
+                        });
                     } catch (Throwable ignored) {
                         // Expected: a healthy handshake loop should fail loudly here,
                         // not spin forever. Any exception (AssertionError, SSLException,
diff --git a/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketTest.java b/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketTest.java
index 506ce783..05313368 100644
--- a/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketTest.java
+++ b/core/src/test/java/io/questdb/client/test/network/JavaTlsClientSocketTest.java
@@ -25,9 +25,11 @@
 package io.questdb.client.test.network;
 
 import io.questdb.client.ClientTlsConfiguration;
+import io.questdb.client.network.IOOperation;
 import io.questdb.client.network.JavaTlsClientSocket;
 import io.questdb.client.network.NetworkFacade;
 import io.questdb.client.network.NetworkFacadeImpl;
+import io.questdb.client.network.SocketReadinessWaiter;
 import io.questdb.client.std.MemoryTag;
 import io.questdb.client.std.Unsafe;
 import io.questdb.client.test.tools.TestUtils;
@@ -40,9 +42,11 @@
 import javax.net.ssl.SSLParameters;
 import javax.net.ssl.SSLSession;
 import java.lang.reflect.Field;
+import java.lang.reflect.InvocationTargetException;
 import java.lang.reflect.Method;
 import java.nio.ByteBuffer;
 import java.util.List;
+import java.util.concurrent.atomic.AtomicInteger;
 import java.util.function.BiFunction;
 
 import static org.junit.Assert.assertEquals;
@@ -190,6 +194,136 @@ public void testRecvProcessesBufferedRecordAfterEmptyOkUnwrap() throws Exception
         }
     }
 
+    /**
+     * Regression test for the TLS handshake busy-spin / unbounded handshake.
+     * On a non-blocking socket, a peer that completes TCP but stalls before
+     * sending its half of the handshake leaves the engine in NEED_UNWRAP with
+     * the socket returning "would block" (recv == 0). The handshake must hand
+     * control to the readiness waiter -- which in production parks on the event
+     * loop bounded by the connect deadline -- instead of re-reading in a tight
+     * loop. Here the waiter stands in for that deadline: it records the wait
+     * and then throws, exactly as the bounded ioWait() does once the budget is
+     * spent. The method-level timeout fails the test if the handshake ever
+     * busy-spins past the waiter (i.e. if the deadline-aware wait is removed).
+     */
+    @Test(timeout = 30_000)
+    public void testHandshakeWaitsForReadabilityInsteadOfBusySpinning() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            try (JavaTlsClientSocket socket = newSocket()) {
+                invoke(socket, "prepareInternalBuffers");
+                setField(socket, "sslEngine", new StallingUnwrapSslEngine());
+                // Mark the session as TLS so try-with-resources close() frees the internal buffers
+                // allocated above. Without this the socket stays STATE_EMPTY and close() returns early,
+                // leaking the 3x256KB NATIVE_TLS_RSS buffers.
+                setIntField(socket, "state", 2);
+
+                Method runHandshake = JavaTlsClientSocket.class.getDeclaredMethod(
+                        "runHandshake", SocketReadinessWaiter.class);
+                runHandshake.setAccessible(true);
+
+                AtomicInteger readWaits = new AtomicInteger();
+                AtomicInteger writeWaits = new AtomicInteger();
+                SocketReadinessWaiter waiter = op -> {
+                    if (op == IOOperation.READ) {
+                        readWaits.incrementAndGet();
+                    } else {
+                        writeWaits.incrementAndGet();
+                    }
+                    // Stand in for the connect deadline firing inside ioWait().
+                    throw new DeadlineReached();
+                };
+
+                try {
+                    runHandshake.invoke(socket, waiter);
+                    Assert.fail("runHandshake must not complete the handshake against a stalled peer");
+                } catch (InvocationTargetException e) {
+                    Assert.assertTrue(
+                            "handshake must surface the readiness waiter's deadline, was: " + e.getCause(),
+                            e.getCause() instanceof DeadlineReached);
+                }
+
+                Assert.assertEquals(
+                        "handshake must wait for the socket to become readable instead of busy-spinning",
+                        1, readWaits.get());
+                Assert.assertEquals(
+                        "a NEED_UNWRAP stall must not trigger a write wait", 0, writeWaits.get());
+            }
+        });
+    }
+
+    /**
+     * Happy-path guard for the refactor: when the engine makes progress (a
+     * complete record is available, unwrap returns OK and the handshake
+     * finishes), runHandshake must complete without ever parking on socket
+     * readiness. The would-block waits only fire on recv/send == 0, so a
+     * responsive peer never triggers them.
+     */
+    @Test(timeout = 30_000)
+    public void testHandshakeCompletesWithoutWaitingWhenEngineMakesProgress() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            try (JavaTlsClientSocket socket = newSocket()) {
+                invoke(socket, "prepareInternalBuffers");
+                setField(socket, "sslEngine", new ProgressingUnwrapSslEngine());
+                // Mark the session as TLS so try-with-resources close() frees the internal buffers
+                // allocated above. Without this the socket stays STATE_EMPTY and close() returns early,
+                // leaking the 3x256KB NATIVE_TLS_RSS buffers.
+                setIntField(socket, "state", 2);
+
+                Method runHandshake = JavaTlsClientSocket.class.getDeclaredMethod(
+                        "runHandshake", SocketReadinessWaiter.class);
+                runHandshake.setAccessible(true);
+
+                AtomicInteger waits = new AtomicInteger();
+                SocketReadinessWaiter waiter = op -> waits.incrementAndGet();
+
+                runHandshake.invoke(socket, waiter); // must return normally (handshake finished)
+
+                Assert.assertEquals(
+                        "a handshake that makes progress must not wait on socket readiness",
+                        0, waits.get());
+            }
+        });
+    }
+
+    /**
+     * Regression guard for the NOT_HANDSHAKING loop exit. Per the JSSE
+     * contract, {@code getHandshakeStatus()} never returns FINISHED -- once a
+     * delegated task is the TERMINAL handshake step, the re-polled status is
+     * NOT_HANDSHAKING. runHandshake must treat that as completion; without
+     * the explicit NOT_HANDSHAKING exit clause the status matches no switch
+     * case and the loop busy-spins forever with no deadline escape (this
+     * method's timeout is the tripwire).
+     */
+    @Test(timeout = 30_000)
+    public void testHandshakeExitsOnNotHandshakingAfterTerminalDelegatedTask() throws Exception {
+        TestUtils.assertMemoryLeak(() -> {
+            try (JavaTlsClientSocket socket = newSocket()) {
+                invoke(socket, "prepareInternalBuffers");
+                TerminalDelegatedTaskSslEngine engine = new TerminalDelegatedTaskSslEngine();
+                setField(socket, "sslEngine", engine);
+                // Mark the session as TLS so try-with-resources close() frees the internal buffers
+                // allocated above. Without this the socket stays STATE_EMPTY and close() returns early,
+                // leaking the 3x256KB NATIVE_TLS_RSS buffers.
+                setIntField(socket, "state", 2);
+
+                Method runHandshake = JavaTlsClientSocket.class.getDeclaredMethod(
+                        "runHandshake", SocketReadinessWaiter.class);
+                runHandshake.setAccessible(true);
+
+                AtomicInteger waits = new AtomicInteger();
+                SocketReadinessWaiter waiter = op -> waits.incrementAndGet();
+
+                runHandshake.invoke(socket, waiter); // must return: NOT_HANDSHAKING == done
+
+                Assert.assertEquals("the terminal delegated task must run exactly once",
+                        1, engine.tasksRun.get());
+                Assert.assertEquals(
+                        "completion via NOT_HANDSHAKING must not park on socket readiness",
+                        0, waits.get());
+            }
+        });
+    }
+
     private static void assertBytes(String expected, long ptr, int len) {
         Assert.assertEquals(expected.length(), len);
         for (int i = 0; i < len; i++) {
@@ -333,6 +467,80 @@ public SSLEngineResult unwrap(ByteBuffer src, ByteBuffer[] dsts, int offset, int
         }
     }
 
+    private static final class DeadlineReached extends RuntimeException {
+    }
+
+    private static final class ProgressingUnwrapSslEngine extends StubSslEngine {
+        @Override
+        public SSLEngineResult.HandshakeStatus getHandshakeStatus() {
+            return SSLEngineResult.HandshakeStatus.NEED_UNWRAP;
+        }
+
+        @Override
+        public SSLEngineResult unwrap(ByteBuffer src, ByteBuffer[] dsts, int offset, int length) {
+            // A complete record was available: consume it and finish the
+            // handshake, so the loop exits without waiting.
+            return new SSLEngineResult(
+                    SSLEngineResult.Status.OK,
+                    SSLEngineResult.HandshakeStatus.FINISHED,
+                    0,
+                    0
+            );
+        }
+    }
+
+    private static final class StallingUnwrapSslEngine extends StubSslEngine {
+        @Override
+        public SSLEngineResult.HandshakeStatus getHandshakeStatus() {
+            return SSLEngineResult.HandshakeStatus.NEED_UNWRAP;
+        }
+
+        @Override
+        public SSLEngineResult unwrap(ByteBuffer src, ByteBuffer[] dsts, int offset, int length) {
+            // No complete TLS record buffered yet: ask for more bytes from the
+            // socket. The stalled peer never sends them, so the handshake must
+            // wait on readability rather than spin.
+            return new SSLEngineResult(
+                    SSLEngineResult.Status.BUFFER_UNDERFLOW,
+                    SSLEngineResult.HandshakeStatus.NEED_UNWRAP,
+                    0,
+                    0
+            );
+        }
+    }
+
+    /**
+     * Models the JSSE terminal-delegated-task shape: NEED_TASK until the
+     * handed-out task has run, then NOT_HANDSHAKING (never FINISHED --
+     * getHandshakeStatus() cannot return it per the JSSE contract).
+     */
+    private static final class TerminalDelegatedTaskSslEngine extends StubSslEngine {
+        final AtomicInteger tasksRun = new AtomicInteger();
+        private boolean taskHandedOut;
+
+        @Override
+        public Runnable getDelegatedTask() {
+            if (taskHandedOut) {
+                return null;
+            }
+            taskHandedOut = true;
+            return tasksRun::incrementAndGet;
+        }
+
+        @Override
+        public SSLEngineResult.HandshakeStatus getHandshakeStatus() {
+            return tasksRun.get() == 0
+                    ? SSLEngineResult.HandshakeStatus.NEED_TASK
+                    : SSLEngineResult.HandshakeStatus.NOT_HANDSHAKING;
+        }
+
+        @Override
+        public SSLEngineResult unwrap(ByteBuffer src, ByteBuffer[] dsts, int offset, int length) {
+            throw new IllegalStateException(
+                    "NEED_TASK -> NOT_HANDSHAKING completion must not unwrap");
+        }
+    }
+
     private static abstract class StubSslEngine extends SSLEngine {
         @Override
         public void beginHandshake() {
diff --git a/core/src/test/java/io/questdb/client/test/network/NetConnectTimeoutTest.java b/core/src/test/java/io/questdb/client/test/network/NetConnectTimeoutTest.java
new file mode 100644
index 00000000..b5d2c5d0
--- /dev/null
+++ b/core/src/test/java/io/questdb/client/test/network/NetConnectTimeoutTest.java
@@ -0,0 +1,118 @@
+/*+*****************************************************************************
+ *     ___                  _   ____  ____
+ *    / _ \ _   _  ___  ___| |_|  _ \| __ )
+ *   | | | | | | |/ _ \/ __| __| | | |  _ \
+ *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
+ *    \__\_\\__,_|\___||___/\__|____/|____/
+ *
+ *  Copyright (c) 2014-2019 Appsicle
+ *  Copyright (c) 2019-2026 QuestDB
+ *
+ *  Licensed under the Apache License, Version 2.0 (the "License");
+ *  you may not use this file except in compliance with the License.
+ *  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ *
+ ******************************************************************************/
+
+package io.questdb.client.test.network;
+
+import io.questdb.client.network.NetworkFacade;
+import io.questdb.client.network.NetworkFacadeImpl;
+import org.junit.Assert;
+import org.junit.Assume;
+import org.junit.Test;
+
+import java.net.InetSocketAddress;
+import java.net.ServerSocket;
+
+/**
+ * Exercises the native non-blocking connect-with-timeout primitive
+ * ({@link NetworkFacade#connectAddrInfoTimeout}).
+ */
+public class NetConnectTimeoutTest {
+
+    private static final NetworkFacade NF = NetworkFacadeImpl.INSTANCE;
+
+    @Test
+    public void testConnectRefusedReturnsErrorNotTimeout() throws Exception {
+        // Bind then immediately close to obtain a port with no listener; a
+        // connect to it is refused (RST) rather than timed out.
+        int port;
+        try (ServerSocket ss = new ServerSocket()) {
+            ss.bind(new InetSocketAddress("127.0.0.1", 0));
+            port = ss.getLocalPort();
+        }
+
+        long addrInfo = NF.getAddrInfo("127.0.0.1", port);
+        Assert.assertNotEquals(-1, addrInfo);
+        int fd = NF.socketTcp(true);
+        try {
+            int rc = NF.connectAddrInfoTimeout(fd, addrInfo, 5_000);
+            Assert.assertNotEquals("refused connect must not report success", 0, rc);
+            Assert.assertNotEquals("refused connect must not be reported as a timeout",
+                    NetworkFacade.CONNECT_TIMEOUT, rc);
+        } finally {
+            NF.freeAddrInfo(addrInfo);
+            NF.close(fd);
+        }
+    }
+
+    @Test
+    public void testConnectSucceedsWithinTimeout() throws Exception {
+        try (ServerSocket ss = new ServerSocket()) {
+            ss.bind(new InetSocketAddress("127.0.0.1", 0));
+            int port = ss.getLocalPort();
+
+            long addrInfo = NF.getAddrInfo("127.0.0.1", port);
+            Assert.assertNotEquals(-1, addrInfo);
+            int fd = NF.socketTcp(true);
+            try {
+                int rc = NF.connectAddrInfoTimeout(fd, addrInfo, 5_000);
+                Assert.assertEquals("loopback connect should succeed", 0, rc);
+            } finally {
+                NF.freeAddrInfo(addrInfo);
+                NF.close(fd);
+            }
+        }
+    }
+
+    @Test
+    public void testConnectToBlackholeTimesOut() {
+        // 192.0.2.0/24 is TEST-NET-1 (RFC 5737); packets are silently dropped on
+        // a normal network, so the SYN goes unanswered and the timeout fires
+        // instead of the (much longer) OS connect timeout.
+        long addrInfo = NF.getAddrInfo("192.0.2.1", 9009);
+        Assert.assertNotEquals(-1, addrInfo);
+        int fd = NF.socketTcp(true);
+        try {
+            long start = System.nanoTime();
+            int rc = NF.connectAddrInfoTimeout(fd, addrInfo, 500);
+            long elapsedMs = (System.nanoTime() - start) / 1_000_000L;
+
+            // Whatever the outcome, the key guarantee is that we never blocked
+            // on the (multi-minute) OS connect timeout.
+            Assert.assertTrue("connect must return near the budget, was " + elapsedMs + "ms", elapsedMs < 5_000);
+
+            // The deterministic outcome depends on the runner's routing for
+            // TEST-NET-1: a dropped SYN yields a real timeout (the path under
+            // test), while a runner with no route to 192.0.2.0/24 fails fast
+            // with ENETUNREACH/EHOSTUNREACH (rc == -1) and a rare appliance may
+            // even accept it (rc == 0). Only the timeout case is assertable; the
+            // others can't exercise the timeout, so skip rather than flake.
+            Assume.assumeTrue("no route to blackhole on this runner (rc=" + rc + ")",
+                    rc == NetworkFacade.CONNECT_TIMEOUT);
+            Assert.assertEquals("blackhole connect should time out", NetworkFacade.CONNECT_TIMEOUT, rc);
+        } finally {
+            NF.freeAddrInfo(addrInfo);
+            NF.close(fd);
+        }
+    }
+}
diff --git a/core/src/test/java/io/questdb/client/test/tools/TestUtils.java b/core/src/test/java/io/questdb/client/test/tools/TestUtils.java
index 60cbc9ef..270311cf 100644
--- a/core/src/test/java/io/questdb/client/test/tools/TestUtils.java
+++ b/core/src/test/java/io/questdb/client/test/tools/TestUtils.java
@@ -266,20 +266,23 @@ public void close() {
                 return;
             }
 
-            // Checks that the same tag used for allocation and freeing native memory
+            // Every tag must return to its baseline. The previous shape
+            // (ported from upstream, which exempts NATIVE_SQL_COMPILER only)
+            // absorbed any growth confined to a single tag into a tolerated
+            // diff, so a lone-tag leak (e.g. NATIVE_DEFAULT) passed the check.
+            // This client has no SQL-compiler tag, so no exemption applies:
+            // assert strict per-tag equality, then total equality.
             long memAfter = Unsafe.getMemUsed();
-            long memNativeSqlCompilerDiff = 0;
             Assert.assertTrue(memAfter > -1);
-            if (mem != memAfter) {
-                for (int i = MemoryTag.MMAP_DEFAULT; i < MemoryTag.SIZE; i++) {
-                    long actualMemByTag = Unsafe.getMemUsedByTag(i);
-                    if (memoryUsageByTag[i] != actualMemByTag) {
-                        Assert.assertTrue(actualMemByTag >= memoryUsageByTag[i]);
-                        memNativeSqlCompilerDiff = actualMemByTag - memoryUsageByTag[i];
-                    }
+            for (int i = MemoryTag.MMAP_DEFAULT; i < MemoryTag.SIZE; i++) {
+                long actualMemByTag = Unsafe.getMemUsedByTag(i);
+                if (memoryUsageByTag[i] != actualMemByTag) {
+                    Assert.assertEquals(
+                            "native memory leaked or over-freed under tag " + MemoryTag.nameOf(i),
+                            memoryUsageByTag[i], actualMemByTag);
                 }
-                Assert.assertEquals(mem + memNativeSqlCompilerDiff, memAfter);
             }
+            Assert.assertEquals("total native memory", mem, memAfter);
         }
 
         public void skipChecks() {
diff --git a/core/src/test/java/io/questdb/client/test/tools/TlsProxy.java b/core/src/test/java/io/questdb/client/test/tools/TlsProxy.java
deleted file mode 100644
index 69511007..00000000
--- a/core/src/test/java/io/questdb/client/test/tools/TlsProxy.java
+++ /dev/null
@@ -1,248 +0,0 @@
-/*+*****************************************************************************
- *     ___                  _   ____  ____
- *    / _ \ _   _  ___  ___| |_|  _ \| __ )
- *   | | | | | | |/ _ \/ __| __| | | |  _ \
- *   | |_| | |_| |  __/\__ \ |_| |_| | |_) |
- *    \__\_\\__,_|\___||___/\__|____/|____/
- *
- *  Copyright (c) 2014-2019 Appsicle
- *  Copyright (c) 2019-2026 QuestDB
- *
- *  Licensed under the Apache License, Version 2.0 (the "License");
- *  you may not use this file except in compliance with the License.
- *  You may obtain a copy of the License at
- *
- *  http://www.apache.org/licenses/LICENSE-2.0
- *
- *  Unless required by applicable law or agreed to in writing, software
- *  distributed under the License is distributed on an "AS IS" BASIS,
- *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- *  See the License for the specific language governing permissions and
- *  limitations under the License.
- *
- ******************************************************************************/
-
-package io.questdb.client.test.tools;
-
-import javax.net.SocketFactory;
-import javax.net.ssl.KeyManagerFactory;
-import javax.net.ssl.SSLContext;
-import javax.net.ssl.SSLServerSocketFactory;
-import javax.net.ssl.TrustManagerFactory;
-import java.io.Closeable;
-import java.io.IOException;
-import java.io.InputStream;
-import java.io.OutputStream;
-import java.net.ServerSocket;
-import java.net.Socket;
-import java.security.KeyStore;
-import java.security.SecureRandom;
-import java.util.Collections;
-import java.util.Iterator;
-import java.util.Set;
-import java.util.concurrent.ConcurrentHashMap;
-import java.util.concurrent.atomic.AtomicInteger;
-
-public final class TlsProxy {
-    private final String dstHost;
-    private final int dstPort;
-    private final String keystore;
-    private final char[] keystorePassword;
-    private final Set<Link> links = Collections.newSetFromMap(new ConcurrentHashMap<>());
-    private Thread acceptorThread;
-    private volatile boolean killAfterAccepting;
-    private ServerSocket serverSocket;
-    private volatile boolean shutdownRequested;
-
-    public TlsProxy(String dstHost, int dstPort, String keystore, char[] keystorePassword) {
-        this.dstHost = dstHost;
-        this.dstPort = dstPort;
-        this.keystore = keystore;
-        this.keystorePassword = keystorePassword;
-    }
-
-    public synchronized void killAfterAccepting() {
-        killAfterAccepting = true;
-    }
-
-    public synchronized void killConnections() {
-        Iterator<Link> iterator = links.iterator();
-        while (iterator.hasNext()) {
-            Link link = iterator.next();
-            link.kill();
-            iterator.remove();
-        }
-    }
-
-    public int start() {
-        return TestUtils.unchecked(() -> {
-            SSLContext sslContext = SSLContext.getInstance("TLS");
-            TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
-            tmf.init(KeyStore.getInstance(KeyStore.getDefaultType()));
-
-            KeyStore myKeyStore = KeyStore.getInstance(KeyStore.getDefaultType());
-            myKeyStore.load(TlsProxy.class.getResourceAsStream(keystore), keystorePassword);
-
-            KeyManagerFactory kmf = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm());
-            kmf.init(myKeyStore, keystorePassword);
-            sslContext.init(kmf.getKeyManagers(), tmf.getTrustManagers(), new SecureRandom());
-            SSLServerSocketFactory factory = sslContext.getServerSocketFactory();
-            serverSocket = factory.createServerSocket();
-            serverSocket.bind(null);
-
-            acceptorThread = new Thread(() -> acceptorLoop(serverSocket));
-            acceptorThread.start();
-            return serverSocket.getLocalPort();
-        });
-    }
-
-    public synchronized void stop() {
-        shutdownRequested = true;
-        TestUtils.unchecked(() -> serverSocket.close());
-        acceptorThread.interrupt();
-        TestUtils.unchecked(() -> acceptorThread.join());
-        for (Link link : links) {
-            link.shutDown();
-        }
-    }
-
-    private static void closeQuietly(Closeable closeable) {
-        if (closeable != null) {
-            try {
-                closeable.close();
-            } catch (IOException e) {
-                // whatever
-            }
-        }
-    }
-
-    private void acceptorLoop(ServerSocket socket) {
-        while (!shutdownRequested) {
-            Socket frontendSocket = null;
-            Socket backendSocket;
-            try {
-                frontendSocket = socket.accept();
-                backendSocket = SocketFactory.getDefault().createSocket(dstHost, dstPort);
-            } catch (IOException e) {
-                if (shutdownRequested) {
-                    return;
-                }
-                closeQuietly(frontendSocket);
-                continue;
-            }
-            synchronized (this) {
-                if (shutdownRequested) {
-                    closeQuietly(frontendSocket);
-                    closeQuietly(backendSocket);
-                    return;
-                }
-                if (killAfterAccepting) {
-                    closeQuietly(frontendSocket);
-                    closeQuietly(backendSocket);
-                    continue;
-                }
-                Link link = new Link(frontendSocket, backendSocket);
-                links.add(link);
-                link.start();
-            }
-        }
-    }
-
-    private static class Link {
-        private final Socket backend;
-        private final Pump backendToFrontend;
-        private final Socket frontend;
-        private final Pump frontendToBackend;
-
-        private Link(Socket frontend, Socket backend) {
-            AtomicInteger race = new AtomicInteger(2);
-            this.frontend = frontend;
-            this.backend = backend;
-            frontendToBackend = TestUtils.unchecked(() -> new Pump(frontend.getInputStream(), backend.getOutputStream(), race, "front->backend"));
-            backendToFrontend = TestUtils.unchecked(() -> new Pump(backend.getInputStream(), frontend.getOutputStream(), race, "backend->frontend"));
-        }
-
-        private void kill() {
-            closeQuietly(frontend);
-            closeQuietly(backend);
-        }
-
-        private void shutDown() {
-            frontendToBackend.shutdown();
-            backendToFrontend.shutdown();
-        }
-
-        private void start() {
-            Thread frontToBackThread = new Thread(frontendToBackend);
-            frontToBackThread.setName("front-to-back");
-            frontendToBackend.setOwningThread(frontToBackThread);
-            frontToBackThread.start();
-            Thread backToFrontThread = new Thread(backendToFrontend);
-            backToFrontThread.setName("back-to-front");
-            backendToFrontend.setOwningThread(backToFrontThread);
-            backToFrontThread.start();
-        }
-    }
-
-    private static final class Pump implements Runnable {
-        private final InputStream from;
-        private final String name;
-        private final AtomicInteger race;
-        private final OutputStream to;
-        private volatile Thread owningThread;
-        private volatile boolean shutdownRequested;
-
-        private Pump(InputStream from, OutputStream to, AtomicInteger race, String name) {
-            this.from = from;
-            this.to = to;
-            this.race = race;
-            this.name = name;
-        }
-
-        @Override
-        public void run() {
-            byte[] buffer = new byte[1024];
-            long totalRead = 0;
-            long totalWritten = 0;
-            while (!shutdownRequested) {
-                int i;
-                try {
-                    i = from.read(buffer);
-                    if (i < 0) {
-                        break;
-                    }
-                    totalRead += i;
-                } catch (IOException e) {
-                    break;
-                }
-                try {
-                    to.write(buffer, 0, i);
-                    to.flush();
-                    totalWritten += i;
-                } catch (IOException e) {
-                    break;
-                }
-            }
-            try {
-                to.flush();
-            } catch (IOException e) {
-                // already closed, no problem
-            }
-            System.out.println(name + "Total read: " + totalRead + ", Total written: " + totalWritten);
-            if (race.decrementAndGet() == 0) {
-                closeQuietly(from);
-                closeQuietly(to);
-            }
-        }
-
-        public void setOwningThread(Thread owningThread) {
-            this.owningThread = owningThread;
-        }
-
-        private void shutdown() {
-            shutdownRequested = true;
-            owningThread.interrupt();
-            TestUtils.unchecked(() -> owningThread.join());
-        }
-    }
-}
diff --git a/design/qwp-cursor-durability-todo.md b/design/qwp-cursor-durability-todo.md
deleted file mode 100644
index 2598af51..00000000
--- a/design/qwp-cursor-durability-todo.md
+++ /dev/null
@@ -1,126 +0,0 @@
-# Cursor SF — remaining work
-
-Branch: `vi_sf` (off `main`).
-Spec: `design/qwp-cursor-durability.md` (decisions 1–14 locked).
-Memory: project memory `project_sf_self_sufficient_frames.md` documents the "every frame on disk carries full schema" decision — load-bearing for replay/drainer correctness, do not undo without revisiting.
-
-## What's already done on this branch
-
-Every locked spec decision (1–14), every knob in the spec table, every counter accessor, plus four bugs uncovered along the way. Recent commits, newest first:
-
-- `c25773f` background drainer pool — adopt orphan slots and replay them
-- `fa5c838` recovery replays sealed segments from baseSeq, not active (3-bug fix: start-position, ackedFsn-seed, fileGeneration-seed)
-- `520231c` cursor frames are self-sufficient — full schemas, full dict
-- `b9b6e2f` orphan-slot scanner + .failed sentinel + drain_orphans knob
-- `40f9742` initial-connect retry opt-in + replay/attempt counters
-- `f152583` slot directory model — sender_id + advisory exclusive .lock
-- `8828038` cursor reconnect policy — backoff cap + auth-terminal
-
-Test count: 788 in `io.questdb.client.test.cutlass.qwp.client.**`, 0 failures, 1 skipped (pre-existing).
-
-## TODO
-
-### 1. Multi-host failover (HIGH — needs server access)
-
-The connect-string parses `addr=h1:p1,h2:p2,h3:p3` and stores all hosts in `hosts/ports` lists, but `Sender.build()` only passes `hosts.getQuick(0)` and `ports.getQuick(0)` to `QwpWebSocketSender.connect`. Every reconnect, initial-connect retry, and drainer connect uses the same single host. If host A is down for the per-outage cap, host B is never tried.
-
-**What to change:**
-- `QwpWebSocketSender.buildAndConnect()` — currently builds `WebSocketClient` against `host:port` (single string fields). Either:
-  - Take a list of (host, port) pairs and round-robin / try-in-order each attempt, OR
-  - Take a `Supplier<HostPort>` that yields the next endpoint to try and let the sender / loop round-robin externally.
-- The reconnect retry-with-backoff loop in `CursorWebSocketSendLoop.fail()` and the helper `connectWithRetry` should treat each host as one attempt — backoff applies *after* exhausting the host list once.
-- `Sender.build()` plumbs the full list down (don't drop hosts 1..n).
-- `BackgroundDrainer` inherits the same failover via the `ReconnectFactory` it gets from the sender.
-- Auth-terminal still terminal across all hosts (one host returning 401 means config is wrong; trying others is unlikely to help — but spec doesn't pin this; could be argued either way).
-
-**Why server access matters:** to verify failover actually crosses hosts, you want a real multi-server setup (or two `TestWebSocketServer` instances on different ports) with one going down mid-stream and traffic landing on the other. The existing `TestWebSocketServer` is fine for this — but server-side validation that frames arrive intact and dedup-by-messageSequence handles cross-host duplicates is the value-add of the server-side environment.
-
-**Tests to add:**
-- 3 hosts, kill the first connected one, expect reconnect to land on host 2 inside the cap.
-- All hosts down at startup → init-connect retry exhausts → terminal.
-- Auth failure on host 1 — does it fall through to host 2 or stay terminal? (Spec ambiguity; pick one and document.)
-
-### 2. `sf_durability=flush` and `sf_durability=append` (deferred per spec)
-
-Cursor today only supports `sf_durability=memory` (page cache) and rejects `flush`/`append` at build time. Spec line 1001:
-
-```java
-if (sfDurability != SfDurability.MEMORY) {
-    throw new LineSenderException(... + "is not yet supported (deferred follow-up; use sf_durability=memory)");
-}
-```
-
-**What to change:**
-- `flush` semantics: producer returns from `flush()` only after the engine has called `Files.fsync(fd)` on the active segment up to the just-published cursor position.
-- `append` semantics: every `appendBlocking` call fsyncs before returning the FSN.
-- Plumb a per-segment `fsync()` method on `MmapSegment` (low-level Files.fsync wrapper exists already).
-- Backpressure cost is significant — fsync per-batch (`flush`) is acceptable; fsync per-frame (`append`) is the slow setting.
-- Re-enable the rejected paths in `Sender.build()`.
-
-**Tests:**
-- After `flush()` returns and a `kill -9` of the JVM, recovery picks up every flushed frame. Hard to write portably; a soft equivalent: after `flush()`, the file's `fsync` was called (instrumented).
-- Throughput regression test for `append` mode (10x slowdown is expected).
-
-### 3. Drainer + terminal upgrade error e2e test
-
-Today the drainer's "exhausts cap → drops `.failed`" path is exercised only by unit-level reasoning. There's a synthetic `OrphanScanner.markFailed()` test, but no integration test where:
-1. Ghost slot has data,
-2. Drainer's connect attempts hit a 401-emitting fixture (or unreachable host),
-3. Cap exhausts,
-4. `.failed` sentinel ends up in the slot,
-5. Future foreground scans skip it.
-
-The blocker today: the drainer inherits its `ReconnectFactory` from the foreground sender, so they share a target host. To exercise the drainer-fails-while-foreground-succeeds path, the drainer needs a configurable `ReconnectFactory` distinct from the foreground's. OR: stand up two servers on different ports and have the foreground point at the live one while the drainer is wired to point at the dead one.
-
-This is small once the multi-host failover work clarifies how connection params flow through the drainer.
-
-### 4. Run the full `core` test suite
-
-Only `io.questdb.client.test.cutlass.qwp.client.**` was run after each commit. A `mvn -pl core test` end-to-end would catch any unrelated regressions in non-QWP code paths. Last run before this branch: presumably clean (the changes are confined to QWP).
-
-### 5. JMH benchmark sanity check
-
-`core/src/test/java/io/questdb/client/test/cutlass/qwp/client/QwpIngressLatencyBenchmark.java` exists. Self-sufficient frames bloat per-batch bytes vs the prior delta-encoded format — the perf delta should be measured. Run, compare to a baseline from before commit `520231c`, document the result.
-
-### 6. Cleanups (LOW)
-
-- `connectionGeneration` retry loop in `QwpWebSocketSender.flushPendingRows` is now dead code — the race it guarded (encode using stale schema state mid-reconnect) can't fire because encode no longer reads `maxSentSchemaId` / `maxSentSymbolId`. Worth ripping out to shrink surface area, but it's harmless as-is (one volatile read per encode).
-- `OrphanScanner.hasAnySegmentFile` reports a slot as a candidate orphan if any `.sfa` file exists, including stale empty hot-spares. The drainer no-ops on empty slots (engine.publishedFsn = -1 → ackedFsn already past), but log noise. Filter on actual frame content via a header read.
-- README / public-API docs untouched. New connect-string keys, new builder methods, new accessors all have Javadoc but no top-level doc reference.
-
-### 7. Spec coverage check
-
-`design/qwp-cursor-durability.md` decision table claims `max_backoff_millis` is "reuse existing". I added `reconnect_max_backoff_millis` as a new key. If `max_backoff_millis` already exists somewhere in the codebase (likely for HTTP retries elsewhere), align names — either rename mine to match, or document that they're distinct.
-
-## How to run things
-
-```bash
-# Compile everything
-mvn -pl core compile test-compile
-
-# QWP-only suite (fast, ~30s)
-mvn -pl core test -Dtest='io.questdb.client.test.cutlass.qwp.client.**'
-
-# Single test
-mvn -pl core test -Dtest=ReconnectTest
-
-# Full core suite
-mvn -pl core test
-```
-
-Native lib for macOS-aarch64 is already in the repo
-(`core/src/main/resources/io/questdb/client/bin/darwin-aarch64/libquestdb.dylib`);
-no rebuild needed unless touching `Files.java` natives.
-
-## Files to know
-
-- `core/src/main/java/io/questdb/client/Sender.java` — top-level builder + connect-string parser. Scroll to `LineSenderBuilder` (line ~571) for the builder, `build()` for the WS branch (line ~989), and the connect-string switch (line ~2330).
-- `core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java` — main sender. `buildAndConnect()` is the host:port-bound connect path (line ~1408 area).
-- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java` — I/O thread, reconnect retry loop, replay positioning.
-- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java` — engine + slot lock + recovery.
-- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/BackgroundDrainer.java` and `BackgroundDrainerPool.java` — orphan adoption.
-- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/OrphanScanner.java` and `SlotLock.java` — slot model.
-
-## Notes on the testing environment
-
-The QWP test suite uses `TestWebSocketServer` (in-process, hand-rolled WS server) for everything. It receives binary frames as opaque bytes — does NOT parse the QWP wire format. So tests assert wire behavior (frame counts, byte equivalence, connection lifecycle) but cannot assert server-side semantic correctness (does the server accept these schemas? are messageSequence dedups working?). Validating the wire-protocol bytes against a real QuestDB server is the part that needs the server-code repo.
diff --git a/design/qwp-cursor-durability.md b/design/qwp-cursor-durability.md
deleted file mode 100644
index 686a2687..00000000
--- a/design/qwp-cursor-durability.md
+++ /dev/null
@@ -1,174 +0,0 @@
-# QWP WebSocket sender — durability & reconnect spec
-
-Status: **draft v3**, working notes for the cursor SF refactor on `vi_sf`.
-
-## Goals
-- **Reduce data loss.** SF mode preserves every batch the producer has handed to the engine until the server has ACK'd it, surviving JVM crashes, process restarts, and transient network outages.
-- Memory mode (`ws::addr=...;` no `sf_dir`) is reliable enough for typical use under transient network blips.
-- SF mode (`ws::...;sf_dir=...`) survives process restarts and JVM crashes; disk does not grow under steady-state traffic (only ACK'd data is trimmed).
-- Failure surfaces are loud and distinguishable: "server slow" ≠ "server unreachable" ≠ "data refused".
-
-## Modes
-| | Memory | SF |
-|---|---|---|
-| Storage | malloc'd ring | mmap'd files under sender's slot dir |
-| Cap | `sf_max_total_bytes` (default 128 MiB) | `sf_max_total_bytes` (default 10 GiB) |
-| Cap-full behavior | Producer's `flush()`/`at()` blocks up to `sf_append_deadline_millis`, then throws | Same |
-| Survives JVM exit | No | Yes (recovered on next startup; orphans optionally drained by another sender) |
-| Reconnect retries | Yes | Yes |
-
-## flush() contract
-- Encodes accumulated rows into the cursor engine.
-- Returns when data is **published into the engine** (in-RAM for memory mode, on-disk for SF). **Never** waits for server ACK — ACKs are asynchronous and not every flush correlates to one.
-- The I/O loop drains in the background and retries on reconnect until either ACK or the cap forces backpressure → hard error to the producer.
-
-## close() contract
-- One knob: `close_flush_timeout_millis`.
-  - **Default `5000`**: close() blocks waiting for `engine.ackedFsn() >= engine.publishedFsn()` (server ACK'd everything published) for up to 5 s, then logs WARN and proceeds with stop.
-  - **`0` or `-1`**: close() does not flush at all — fast exit. Pending data is lost (memory mode) or recovered by next sender (SF mode).
-  - Any other positive value: that timeout in millis.
-
-## Reconnect policy (both modes)
-- I/O loop catches any wire error (send fail, recv fail, server close, ACK timeout). Logs WARN and enters reconnect.
-- Backoff: exponential with jitter. Reuse `LineSenderBuilder.maxBackoffMillis` (initial 100 ms, cap as configured).
-- **Budget: `reconnect_max_duration_millis`** — per-outage time cap (resets on each successful reconnect). Once total elapsed time since the first failure of *this* outage exceeds the cap, the I/O loop gives up.
-  - **Default 300_000 ms (5 min).** Long enough to ride out most server restarts and brief outages where the cause needs investigation; short enough that a permanently-gone server surfaces within minutes.
-- **Auth failure on reconnect (401, 403, non-101 upgrade reject) is terminal** — don't burn the retry budget on errors that won't fix themselves.
-- On successful reconnect: I/O loop restarts `nextWireSeq=0`, sets `fsnAtZero = engine.ackedFsn() + 1`, walks segments forward from there, and replays. Producer thread is signaled (volatile counter bump) so the next encoded batch carries full schema definitions instead of refs.
-- On budget exhaustion: connection error recorded → next user-thread API call throws.
-
-### Initial connect
-- **Default: terminal.** Initial-connect failures (DNS, refused, bad auth, version mismatch) usually mean misconfig; throw immediately so the user sees the error, not a 5-minute hang.
-- **Opt-in: `initial_connect_retry=true`** uses the same backoff + `reconnect_max_duration_millis` cap as reconnect. Useful for "publisher comes up before server" scenarios (k8s ordering, dev environments).
-
-### Logging cadence
-- WARN at first failure of an outage: `"disconnected from <addr>, reconnecting"`.
-- WARN throttled to once per `BACKPRESSURE_LOG_THROTTLE_NANOS` (5 s) during the retry storm — not one per backoff sleep, otherwise a 5-min outage at 100 ms backoff = 3000 lines.
-- INFO on each successful reconnect: `"reconnected to <addr> after <Xms>, <Y> attempts"`.
-- ERROR on budget exhaustion: `"giving up reconnecting to <addr> after <Xs>, <Y> attempts"`.
-
-## Backpressure semantics
-- Engine cap full → `appendBlocking` spins for `sf_append_deadline_millis` (default 30 s) → throws.
-- Error message must distinguish:
-  - `"backpressured for Xms — wire path is not draining (server slow?)"` (engine published, but server hasn't ACKed)
-  - `"backpressured for Xms — Y reconnect attempts in progress (server unreachable since Z)"` (the I/O loop is in retry-backoff)
-
-## Schema state on reconnect
-- Single volatile counter, single writer (I/O thread), shared across two roles:
-  ```java
-  private volatile long connectionGeneration;  // bumped by I/O loop on every successful reconnect AND on initial recovery from disk
-  ```
-- Producer's `flushPendingRows` does:
-  ```java
-  int retries = 0;
-  while (true) {
-      long genBefore = connectionGeneration;
-      if (genBefore != lastSeenGeneration) {
-          resetSchemaStateForNewConnection();
-          lastSeenGeneration = genBefore;
-      }
-      encoder.beginMessage(...); /* encode all tables */
-      int messageSize = encoder.finishMessage();
-      if (connectionGeneration == genBefore) break;   // common case
-      if (++retries >= MAX_SCHEMA_RACE_RETRIES /* =10 */) throw new LineSenderException("schema-reset race exceeded retry limit");
-      // gen advanced mid-encode → bytes are poisoned, discard + loop.
-      // Table buffers are NOT reset until after this loop, so source rows are intact.
-  }
-  ```
-- **On initial open with on-disk recovery** (SF mode, non-empty slot): `connectionGeneration` starts at 1, not 0. Recovered FSNs were never seen by *this* server connection, so the first batch must publish full schemas.
-
-## Slot directory model
-
-**`sf_dir` is a parent (group root)**, not a slot. The actual slot is `<sf_dir>/<sender_id>/`.
-
-### Identity
-- **`sender_id` defaults to `"default"`.** Single-sender users get zero-config: their slot is `<sf_dir>/default/`.
-- **Multi-sender users must set `sender_id` explicitly.** Two senders trying to use the default name will collide on the lock — surfaced loudly as `"sf slot already in use by PID X"`.
-- The slot dir holds segments + `.lock` (advisory exclusive `FileChannel.tryLock`).
-- Lock released on `engine.close()` or OS-level process exit (kernel releases `fcntl`/`LockFileEx` locks automatically on crash).
-
-### Foreground sender
-- Locks `<sf_dir>/<sender_id>/.lock`.
-- Recovers segments via `SegmentRing.openExisting`. Recovery is per-slot, in baseSeq order — preserves publishing order trivially.
-- Seeds `SegmentManager.fileGeneration` to `max(existing sf-<gen>.sfa hex) + 1` to avoid filename collisions with recovered files.
-
-### Background drainers (orphan adoption)
-- **Opt-in: `drain_orphans=true`** (default false).
-- At foreground sender startup, scan `<sf_dir>/*/` for sibling slots that are (a) unlocked and (b) contain unacked segments.
-- For each orphan, spawn a background drainer:
-  - Locks the orphan's `.lock`
-  - Opens its own `WebSocketClient` (separate connection from the foreground sender)
-  - Recovers segments, drains them in baseSeq order
-  - Releases lock and exits when the slot is fully ACK'd and empty
-- **Drain-only**: no user appends, no public API for writing.
-- **Cap concurrent drainers: `max_background_drainers=4`** (default). Excess orphans are queued and started as earlier drainers finish.
-- **Drain failure policy**: drainer's reconnect cap exhausts, or auth fails, or segments are corrupt → drainer drops a `.failed` sentinel in the slot, releases the lock, exits. Future foreground startups skip slots with `.failed` until the user clears the sentinel manually. Bounded automatic retry, then human-in-the-loop.
-- **No automatic cleanup of empty slot dirs.** Goal is data preservation; only ACK'd data is trimmed (within a slot, by the segment manager). Empty slot dirs are cheap and stay forever unless the user removes them.
-
-### Visibility
-- Three WS-only counter accessors on `QwpWebSocketSender`:
-  - `getActiveBackgroundDrainers()` — current count of running drainers
-  - `getTotalBackgroundDrainersSucceeded()` — cumulative since startup
-  - `getTotalBackgroundDrainersFailed()` — cumulative since startup
-- Per-drainer event observation goes through the existing
-  `BackgroundDrainerListener` callback. The pool's `.failed` sentinels
-  on disk remain the canonical record of giveup events; the three
-  counters are for dashboards and post-startup health checks.
-
-### Per-sender threading cost
-- Each engine (foreground + each background drainer) has its own `SegmentManager`. That's 1 manager thread + 1 I/O thread per engine. With `max_background_drainers=4`, worst case is 1 (foreground) + 4 (drainers) = 5 engines = 10 threads + 5 sockets per `Sender.fromConfig` call. Acceptable for typical deployments; users with hundreds of senders per JVM should set `max_background_drainers` low.
-
-## Configuration knobs (connect string)
-| Key | Default | Mode | Status |
-|---|---|---|---|
-| `sf_dir` | unset | both | existing (semantics: now a parent dir) |
-| `sender_id` | `"default"` | SF | **NEW** |
-| `sf_max_bytes` | 4 MiB | both | existing |
-| `sf_max_total_bytes` | 128 MiB / 10 GiB | both | existing |
-| `sf_durability` | `memory` | SF | existing (`flush`/`append` reserved) |
-| `sf_append_deadline_millis` | 30000 | both | **NEW** (currently a constant) |
-| `reconnect_max_duration_millis` | 300000 | both | **NEW** |
-| `reconnect_initial_backoff_millis` | 100 | both | **NEW** |
-| `max_backoff_millis` | already exists | both | reuse existing |
-| `initial_connect_retry` | `false` | both | **NEW** |
-| `close_flush_timeout_millis` | 5000 (0/-1 = fast close) | both | **NEW** |
-| `drain_orphans` | `false` | SF | **NEW** |
-| `max_background_drainers` | 4 | SF | **NEW** |
-
-Each new knob also gets a `LineSenderBuilder` setter.
-
-## Counter accessors (WS-only, on QwpWebSocketSender)
-- `getTotalBackpressureStalls()`
-- `getTotalReconnectAttempts()`
-- `getTotalReconnectsSucceeded()`
-- `getTotalFramesReplayed()`
-- `getActiveBackgroundDrainers()`
-- `getTotalBackgroundDrainersSucceeded()`
-- `getTotalBackgroundDrainersFailed()`
-
-## Stated assumptions (server contract)
-- Server **dedups** replayed batches by `messageSequence`. Replay-after-reconnect produces duplicates; without server-side dedup, every reconnect = double-write. Legacy code already relied on this; the new design continues to.
-- Server's dedup window must be ≥ a sender's `sf_max_total_bytes` worth of FSNs (else replay = double-write under sustained outage + full cap).
-- Coordination/testing of the recovery + dedup contract is **outside this repo's scope**.
-
-## Self-sufficient frames (locked 2026-04-27)
-Every frame written through the cursor SF path **must carry its full schema definition and the complete symbol-dictionary delta from id 0**. No schema-by-id refs, no incremental delta-dicts. The bytes survive process restart and replay against fresh server connections (post-reconnect, post-restart, drainer adopting an orphan slot) — frames with refs to IDs the new server has never seen are unrecoverable. Costs more bytes per batch; pays for replay correctness across every recovery path. Producer-side `maxSentSchemaId` / `maxSentSymbolId` retention is treated as a no-op for the cursor path; the encode call always passes `confirmedMaxId=-1` and `useSchemaRef=false`.
-
-## Decisions locked
-1. ✅ flush() never waits for ACK (ACKs are async).
-2. ✅ Reconnect cap is per-outage time-based, default 300s.
-3. ✅ close() drains by default with 5s timeout; `close_flush_timeout_millis=0|-1` opts out for fast close.
-4. ✅ Schema-reset is also fired on disk recovery (recovered state == post-reconnect state).
-5. ✅ Encode-mid-reconnect race closed via single volatile `connectionGeneration` counter + retry loop in `flushPendingRows`.
-6. ✅ Slot dir model: `sf_dir` is parent; per-sender slots `<sf_dir>/<sender_id>/`; default `sender_id="default"`.
-7. ✅ Orphan adoption is opt-in (`drain_orphans=true`); foreground sender spawns background drainers per orphan, capped at `max_background_drainers`.
-8. ✅ Drain failure → `.failed` sentinel; bounded retry + human-in-the-loop.
-9. ✅ Initial connect terminal by default; opt-in retry via `initial_connect_retry=true`.
-10. ✅ Auth failures (401/403/non-101) terminal even on reconnect.
-11. ✅ Logging: WARN on outage entry/exit-attempt, INFO on reconnect success, ERROR on budget exhaustion; throttled.
-12. ✅ Counters and orphan-drainer visibility on `QwpWebSocketSender` (WS-only).
-13. ✅ No automatic cleanup of empty slot dirs — preserve goal of data-loss reduction.
-14. ✅ Frames on disk are self-sufficient — every frame carries its full schema + full symbol-dict delta from id 0; refs forbidden.
-
-## Open
-None. Ready to implement.
diff --git a/design/qwp-cursor-error-api-todo.md b/design/qwp-cursor-error-api-todo.md
deleted file mode 100644
index 82e42f4c..00000000
--- a/design/qwp-cursor-error-api-todo.md
+++ /dev/null
@@ -1,234 +0,0 @@
-# Cursor SF — server error API: implementation plan
-
-Branch: `vi_sf` (continues off the cursor SF work).
-Spec: `design/qwp-cursor-error-api.md` (decisions 1–14 locked).
-Depends on: `qwp-cursor-durability.md` (the SF substrate this builds on).
-
-## Shipped on `vi_sf`
-
-| Step | Status | Notes |
-|---|---|---|
-| 1. Public types | ✅ | `SenderError`, `SenderErrorHandler`, `LineSenderServerException` (all in `io.questdb.client`); 11 unit tests in `SenderErrorTest`. |
-| 2. Typed terminal-error stash | ✅ | Sibling `volatile SenderError lastTerminalServerError` on `CursorWebSocketSendLoop`; `recordFatal(Throwable, SenderError)` overload; `getLastTerminalServerError()` on the loop, `getLastTerminalError()` on `QwpWebSocketSender`. |
-| 3. Wire-byte classification + DROP/HALT branches | ✅ | `classify()`, `defaultPolicyFor()`, `handleServerRejection()` in `CursorWebSocketSendLoop`; HALT routes through typed `LineSenderServerException`, DROP advances `engine.acknowledge` and continues. 12 tests in `CursorWebSocketSendLoopErrorClassificationTest`. |
-| 4. WS close-frame routing | ✅ | `isTerminalCloseCode()` splits PROTOCOL_ERROR/UNSUPPORTED_DATA/INVALID_PAYLOAD_DATA/POLICY_VIOLATION/MESSAGE_TOO_BIG/MANDATORY_EXTENSION as terminal `PROTOCOL_VIOLATION`; reconnect-eligible codes preserve existing `fail()` retry. Auth-terminal upgrade and reconnect-budget exhaustion now stash typed `SenderError` payloads. |
-| 5. Bounded inbox + dispatcher daemon | ✅ | `SenderErrorDispatcher` (lazy-start daemon, bounded `ArrayBlockingQueue`, idempotent close, drained handler exceptions). 11 tests in `SenderErrorDispatcherTest`. |
-| 6. Default error handler | ✅ | `DefaultSenderErrorHandler.INSTANCE` — ERROR for HALT, WARN for DROP, full structured payload in the log line. |
-| 7. Builder + connect-string knobs | ✅ (partial) | Builder: `errorHandler(SenderErrorHandler)`, `errorInboxCapacity(int)` — both gated to WebSocket. Connect string: `error_inbox_capacity=N`. **Per-category policy override (`errorPolicy(Category, Policy)`, `errorPolicyResolver(...)`, `on_*_error` keys) deferred — see § Deferred follow-ups.** 9 tests in `SenderBuilderErrorApiTest`. |
-| 8. New `Sender` API | ✅ (partial) | `flushAndGetSequence(): long`, `getLastTerminalError()`, `getTotalServerErrors()`, `getDroppedErrorNotifications()`, `getTotalErrorNotificationsDelivered()`. **`resumeAfterHalt()` deferred** — the I/O loop is one-shot today; restart primitive is non-trivial. Workaround: close + rebuild the sender. |
-| 9. End-to-end per-category integration tests | ⏭️ deferred | Lands in the `questdb` repo (`TestWebSocketServer` doesn't parse QWP wire format, so it cannot be scripted to emit category-specific frames in this repo without significant fixture work). |
-| 10. `tableName` wiring | ✅ | Best-effort: populates `tableName` from `response.tableNames` when single-table; null otherwise. Today the response parser does not populate `tableNames` on error frames (only on STATUS_OK), so `tableName` is null on error frames until both client parser and server are extended. The wiring is forward-compatible. |
-| 11. Docs | this doc | Spec + this implementation log. README/javadoc updates pending. |
-
-Test totals on `vi_sf`: 154 non-mmap tests pass on linux x86_64. (`Files.mmap0` UnsatisfiedLinkError on linux — pre-existing, repo only ships macOS-aarch64 native lib. The mmap-dependent tests will run green on macOS / when the linux native lib is added.)
-
-## Deferred follow-ups (not blocking)
-
-1. **Per-category policy override** (`errorPolicy(Category, Policy)` + `errorPolicyResolver(...)`). Spec § "User overrides — one knob, two grains" describes the resolver composition (programmatic resolver > per-category map > global default). Today every category uses `defaultPolicyFor` baked into the loop. The most-asked variant — strict-mode `on_server_error=halt` — needs the connect-string parser side too. Moderate-sized addition; fits in a focused commit.
-2. **`resumeAfterHalt()` escape hatch.** The cursor I/O loop today is one-shot (`running` is volatile boolean, no restart primitive). To resume, the loop needs: clear `lastError` / `lastTerminalServerError`, reopen the wire client via the reconnect factory, restart the thread. Today's workaround: close + rebuild the sender; SF data on disk survives. Document that.
-3. **End-to-end integration tests in the `questdb` repo.** Use a real `ServerMain` to drive each `STATUS_*` byte against this client, asserting category, policy, FSN span, callback delivery, and producer-thread typed throw.
-4. **Server-side gaps tracked in the spec § "Server-side follow-ups"**: split `0x06`/`0x09` for retry semantics, add retryable bit, per-table attribution. Each unblocks a corresponding client follow-up — e.g. retryable bit unblocks `RETRY_TRANSIENT` policy and full strict-ETL semantics.
-5. **README + public Javadoc.** Document the new connect-string keys, builder methods, and accessor surface. The spec is locked but user-facing docs aren't yet.
-
-## Context
-
-The cursor SF send loop today (`CursorWebSocketSendLoop.ResponseHandler.onBinaryMessage`, line 712 onward) classifies inbound frames as `STATUS_OK` (advance ackedFsn) vs everything-else (always terminal via `recordFatal`). The "everything-else" branch is what we're refining: classify by status byte → category, resolve policy, surface to user via callback (async) and / or typed exception (next API call).
-
-Wire codes already exist (`WebSocketResponse.java:74-83`, `WebSocketResponse.getStatusName()`). Nothing new on the wire.
-
-## Discrete deliverables
-
-### 1. Public API surfaces (do first, in isolation)
-New types in `core/src/main/java/io/questdb/client/`:
-- `SenderError.java` — immutable, public. Fields per spec § "SenderError". Include `Category` and `Policy` as nested public enums.
-- `SenderErrorHandler.java` — `@FunctionalInterface` with `void onError(SenderError)`.
-- `LineSenderServerException.java` — `extends LineSenderException`. Single field `SenderError serverError`; `getServerError()` accessor; `getMessage()` synthesizes from category + FSN span + serverMessage.
-
-These are leaf types — write them and their unit tests first; nothing else depends on internals.
-
-### 2. Typed terminal-error stash on the I/O loop
-**Note:** the `connectionGeneration` field described in `qwp-cursor-durability.md` is an idealization — it didn't ship. The actual code already has the producer-side latch infrastructure:
-- `CursorWebSocketSendLoop.lastError` (`volatile Throwable`, line 122) — terminal error, set by `recordFatal(...)`.
-- `QwpWebSocketSender.connectionError` (`AtomicReference<LineSenderException>`, line 119) — connection-level latch.
-- `QwpWebSocketSender.checkConnectionError()` (line 1417) polls both on every public API entry.
-
-So the cache-line / `@Contended` extraction is unnecessary — the volatile that the producer thread already reads on every API call is the latch we need. What's left:
-
-- Add `private volatile SenderError lastTerminalServerError` on `CursorWebSocketSendLoop`, sibling to `lastError`. Null in steady state.
-- Overload `recordFatal(Throwable t)` → `recordFatal(Throwable t, SenderError serverError)`. Existing callers (wire-level failures) call the original signature with implicit `null`. Server-rejection callers (deliverable #3) pass the `SenderError`. Idempotent — only the first failure wins.
-- Add `public SenderError getLastTerminalServerError()` accessor on the loop.
-- Add `public SenderError getLastTerminalError()` on `QwpWebSocketSender`, delegating to the loop (with the standard `cursorSendLoop == null ? null` guard used by other accessors).
-
-That's the whole change for #2. The producer-thread typed throw lands automatically once #3 starts stuffing `LineSenderServerException` (which extends `LineSenderException`) into `lastError` — `checkError()` already throws whatever `lastError` is; user code can `instanceof LineSenderServerException` to unpack the typed payload.
-
-### 3. Error frame classification (`CursorWebSocketSendLoop.ResponseHandler.onBinaryMessage`)
-Replace the current `else` branch (lines ~734-751) with classification:
-```java
-SenderError.Category category = classify(response.getStatus());   // wire byte → enum
-SenderError.Policy policy = policyResolver.resolve(category);     // user override > per-cat > default
-String tableName = response.getTableEntryCount() == 1
-        ? response.getTableName(0)
-        : null;
-long fromFsn = fsnAtZero + Math.max(0, response.getSequence());   // single-frame span today
-long toFsn = fromFsn;
-SenderError err = new SenderError(category, policy, response.getStatus(),
-        response.getErrorMessage(), response.getSequence(),
-        fromFsn, toFsn, tableName, System.nanoTime());
-totalServerErrors.incrementAndGet();
-lastTerminalError = (policy == HALT) ? err : lastTerminalError;
-
-if (policy == HALT) {
-    signal.terminalError = err;     // memory-ordered write before inbox offer
-    errorInbox.offer(err);           // non-blocking; drop+count if full
-    recordFatal(new LineSenderServerException(err));   // breaks the loop; existing path
-} else { // DROP_AND_CONTINUE
-    errorInbox.offer(err);
-    engine.acknowledge(fromFsn);    // advance past the rejected span
-    totalAcks.incrementAndGet();    // for parity with success path counters
-}
-```
-- Keep the success path untouched.
-- Verify `WebSocketResponse` already exposes the error message after parsing a non-OK status (the `errorMessage` field is read by `getErrorMessage()` — confirm parser populates it on the error path).
-- `STATUS_DURABLE_ACK` (0x02) handling stays as-is; it is not an error.
-
-Helper:
-```java
-private static SenderError.Category classify(byte status) {
-    switch (status) {
-        case STATUS_SCHEMA_MISMATCH: return Category.SCHEMA_MISMATCH;
-        case STATUS_PARSE_ERROR:     return Category.PARSE_ERROR;
-        case STATUS_INTERNAL_ERROR:  return Category.INTERNAL_ERROR;
-        case STATUS_SECURITY_ERROR:  return Category.SECURITY_ERROR;
-        case STATUS_WRITE_ERROR:     return Category.WRITE_ERROR;
-        default: return Category.UNKNOWN;
-    }
-}
-```
-
-### 4. WS close-frame routing
-`ResponseHandler.onClose(int code, String reason)` (line 708) currently builds a `LineSenderException` directly and calls `fail(...)` → reconnect. Two cases:
-- **Reconnect-eligible close** (server idle close, network blip): keep existing behavior — `fail(...)` enters reconnect loop.
-- **Terminal close** (PROTOCOL_ERROR 1002, UNSUPPORTED_DATA 1003, MESSAGE_TOO_BIG 1009, policy violation 1008, custom server reason that asserts terminal): build a `SenderError(category=PROTOCOL_VIOLATION, status=-1, seq=-1, message="ws-close[<code>]: " + reason, fsn=ackedFsn+1..publishedFsn, tableName=null, policy=HALT)`, write `signal.terminalError`, inbox, then `recordFatal`.
-
-Decision boundary between the two: the existing reconnect logic already differentiates terminal codes (see auth-terminal handling in commit `8828038`). Mirror that taxonomy here — anything currently treated as terminal becomes a `PROTOCOL_VIOLATION` with the same FSN span.
-
-### 5. Bounded inbox + dispatcher daemon
-- Implement as `ArrayBlockingQueue<SenderError>` for v1 (single producer = I/O thread; single consumer = dispatcher; capacity from builder). Project idiom prefers `QwpSpscQueue` — use it if a generic version exists, else `ArrayBlockingQueue` is fine for the off-hot-path side channel.
-- Dispatcher thread: lazy-start on first `inbox.offer` success. Daemon, named `qwp-error-dispatcher-<senderId>`. Loop: `take()` → `try { handler.onError(err); } catch (Throwable t) { LOG.error(...); }`. Stops when `engine.close()` interrupts it; drains remaining queue entries on stop with a short deadline (~100ms) before giving up.
-- Overflow handling on `offer`: returns false; I/O thread bumps `droppedErrorNotifications` and continues. Never block.
-
-### 6. Default error handler
-```java
-class DefaultErrorHandler implements SenderErrorHandler {
-    public void onError(SenderError e) {
-        LogRecord r = (e.appliedPolicy == HALT) ? LOG.error() : LOG.advisory();
-        r.$("server error: ").$(e.category)
-         .$(" status=0x").$hex(e.serverStatusByte)
-         .$(" fsn=[").$(e.fromFsn).$(',').$(e.toFsn).$(']')
-         .$(" table=").$(e.tableName != null ? e.tableName : "(multi)")
-         .$(" msg=").$(e.serverMessage)
-         .$();
-    }
-}
-```
-Wire as the default if the user does not call `errorHandler(...)` on the builder. Match the project's logging idioms (use `LogFactory.getLog`, etc).
-
-### 7. Builder + connect-string knobs
-- `LineSenderBuilder.errorHandler(SenderErrorHandler)`, `errorPolicy(Category, Policy)`, `errorPolicyResolver(...)`, `errorInboxCapacity(int)`.
-- Connect-string parser additions in `Sender.fromConfig` / `LineSenderBuilder.fromConfig`:
-  - `on_server_error` (auto/halt/drop)
-  - `on_schema_error`, `on_parse_error`, `on_internal_error`, `on_security_error`, `on_write_error` (halt/drop)
-  - `error_inbox_capacity` (int)
-- Internal `PolicyResolver`: composes user resolver (highest) → per-category map → global → per-spec defaults. Single method `Policy resolve(Category)`.
-
-### 8. New public API methods on `Sender` / `QwpWebSocketSender`
-- `Sender.flushAndGetSequence(): long` — returns `engine.publishedFsn()` after the publish, before returning. The existing `flush()` keeps `void` return — call the new method internally or have `flush()` discard the return.
-- `Sender.resumeAfterHalt()` — only meaningful on QWP WS sender; default impl on `Sender` interface throws `UnsupportedOperationException("only WS senders support resumeAfterHalt")`. Implementation:
-  ```java
-  signal.terminalError = null;
-  loop.requestReconnect();   // existing primitive used by reconnect path
-  LOG.warn("resumeAfterHalt: clearing terminal error and restarting I/O loop");
-  ```
-- WS-only accessors on `QwpWebSocketSender`: `getTotalServerErrors()`, `getDroppedErrorNotifications()`, `getLastTerminalError()`. Match the existing accessor style (see § "Counter accessors" in `qwp-cursor-durability.md`).
-
-### 9. Tests (mirror existing `io.questdb.client.test.cutlass.qwp.client.**` layout)
-
-Per category:
-- `ServerErrorSchemaMismatchTest` — `TestWebSocketServer` is augmented to send a `STATUS_SCHEMA_MISMATCH` frame; assert callback fires, FSN span correct, ackedFsn advances (DROP), `flush()` does NOT throw, error counter increments.
-- `ServerErrorParseErrorTest` — same with `STATUS_PARSE_ERROR`; assert HALT, terminal latched, next `flush()` throws `LineSenderServerException` with correct `getServerError()`.
-- `ServerErrorInternalErrorTest`, `ServerErrorSecurityErrorTest`, `ServerErrorWriteErrorTest` — similar.
-- `ServerErrorUnknownStatusTest` — server sends 0xFF; assert `Category.UNKNOWN` + HALT.
-- `ServerErrorWsCloseTest` — server sends WS close 1002; assert `Category.PROTOCOL_VIOLATION`, FSN span = unacked window.
-
-Behavioral:
-- `ErrorPolicyOverrideTest` — connect string `on_schema_error=halt` flips SCHEMA_MISMATCH default; assert HALT.
-- `ErrorPolicyResolverTest` — programmatic resolver returns DROP for everything; assert no terminal latch even on PARSE_ERROR.
-- `ErrorInboxOverflowTest` — slow handler + flood of errors; assert `droppedErrorNotifications > 0`, no I/O thread stall.
-- `ResumeAfterHaltTest` — induce HALT, call `resumeAfterHalt()`, send fresh batch, assert it lands.
-- `FlushAndGetSequenceTest` — assert returned FSN matches the FSN span surfaced in a synthesized rejection.
-
-Hot-path:
-- `ErrorPathHotPathBenchmark` (JMH, sibling of `QwpIngressLatencyBenchmark`) — measure per-batch publish latency with no errors before/after the change. Target: zero measurable regression.
-
-Concurrency:
-- `ErrorRaceTest` — fire HALT and a producer `flush()` simultaneously, repeat 10k times, assert: producer always sees the latch, never observes "callback fired but flush passed" or vice versa.
-
-### 10. Wire `SenderError.tableName` from existing response state
-`WebSocketResponse` already carries `tableNames` (list, see line 224 area). When the response has exactly 1 entry, we have a single-table batch; pass it as `tableName`. Multi-entry → null per spec. Verify the parser populates `tableNames` even on error frames (it might only populate on `STATUS_OK` today — if so, that's a server-side gap and `tableName` will always be null on the error path until both sides extend it).
-
-### 11. README / public-API docs
-- Connect-string reference table needs the new keys.
-- New `LineSenderBuilder` setters documented.
-- Worked example in javadoc of `SenderErrorHandler`: dead-letter to file from an error callback.
-
-## Order of work
-
-Recommended sequence (each step compiles + tests pass independently):
-
-1. Public types (#1) — pure leaves, no risk.
-2. ProducerSignal refactor (#2) — internal, behavior-preserving.
-3. Default handler + dispatcher + inbox (#5, #6) — wire as plumbing; not yet hooked.
-4. Classification + DROP/HALT branches in `ResponseHandler.onBinaryMessage` (#3) — flips behavior.
-5. WS close routing (#4).
-6. Builder + connect-string knobs (#7).
-7. Public methods on `Sender` (#8).
-8. Tests (#9), per category as you implement.
-9. `tableName` wiring (#10) — last, depends on parser audit.
-10. Docs (#11).
-
-## How to run things
-
-```bash
-# QWP-only suite (fast, ~30s)
-mvn -pl core test -Dtest='io.questdb.client.test.cutlass.qwp.client.**'
-
-# Single test
-mvn -pl core test -Dtest=ServerErrorSchemaMismatchTest
-
-# Full core suite (run before merge)
-mvn -pl core test
-
-# Hot-path benchmark
-mvn -pl core test -Dtest=ErrorPathHotPathBenchmark
-```
-
-## Files to know
-
-Existing:
-- `core/src/main/java/io/questdb/client/cutlass/qwp/client/WebSocketResponse.java` — status-byte constants, error frame parser (`readFrom`, `getStatusName`, `getErrorMessage`, `getSequence`).
-- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorWebSocketSendLoop.java` — I/O thread, ResponseHandler at line 706, current terminal-on-error path at line 734.
-- `core/src/main/java/io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java` — the Sender impl. Holds `connectionGeneration`, `flushPendingRows` is the producer entry point.
-- `core/src/main/java/io/questdb/client/Sender.java` — top-level interface + `LineSenderBuilder` + connect-string parser.
-- `core/src/main/java/io/questdb/client/cutlass/qwp/client/sf/cursor/CursorSendEngine.java` — `engine.acknowledge(fsn)` is the trim hook used by DROP path.
-
-New (per #1):
-- `core/src/main/java/io/questdb/client/SenderError.java`
-- `core/src/main/java/io/questdb/client/SenderErrorHandler.java`
-- `core/src/main/java/io/questdb/client/LineSenderServerException.java`
-
-## Notes on the testing environment
-
-`TestWebSocketServer` (in-process, hand-rolled) does NOT parse QWP wire format — it sees opaque binary frames. To test server error frames we need to extend it with a small "responder" hook: `setNextResponse(byte status, long seq, String msg)` that builds a synthetic error frame and sends it on the next inbound batch. Match the binary layout from `WebSocketResponse.readFrom` (line 256 onward). One such helper covers all category tests.
-
-## Open
-None. Ready to implement step 1.
diff --git a/design/qwp-cursor-error-api.md b/design/qwp-cursor-error-api.md
deleted file mode 100644
index eae99bc0..00000000
--- a/design/qwp-cursor-error-api.md
+++ /dev/null
@@ -1,220 +0,0 @@
-# QWP cursor SF — server error API spec
-
-Status: **draft v1**, follow-on to `qwp-cursor-durability.md`. Targets branch `vi_sf`.
-
-## Goals
-- **Surface server-side rejections** (schema mismatch, parse, security, write, internal) to user code without compromising the async `flush()` contract.
-- **Match the wire**: client categories align 1:1 with the stable status bytes already shipped by the server (`WebSocketResponse` + `QwpProcessorState` mapping). No client-side category the wire can't actually distinguish.
-- **Zero hot-path cost** in the no-error case. One volatile load per batch boundary, no allocations, no locks.
-- **Two surfacing paths**: builder-registered `errorHandler` for async dead-lettering, typed exception on next API call for connect-string-only users. Both deliver the same `SenderError` payload.
-- **Loud defaults** — silence is forbidden. The default handler logs ERROR for HALT and WARN for DROP, with category + FSN span + table.
-
-## Non-goals (this spec)
-- Retryable / transient distinction. Server does not ship a retryable bit today; everything potentially transient is folded into `STATUS_INTERNAL_ERROR (0x06)` / `STATUS_WRITE_ERROR (0x09)`. The `RETRY_TRANSIENT` policy is reserved but not implemented; revisit when the server splits codes.
-- Per-table attribution in multi-table batches. Server NACKs the whole batch atomically; `tableName` is best-effort and may be null.
-- Per-row attribution (which row in the batch was bad). Out of scope until the wire format grows a row index field.
-
-## Wire anchor (server-side, already shipped)
-Server error frame layout (binary, **not** a WS close frame):
-```
-1 byte  status
-8 byte  messageSequence (LE) — server's per-frame counter, mirrored back
-2 byte  message length    (LE)
-≤1024 byte UTF-8 message
-```
-Source: `QwpWebSocketUpgradeProcessor.java:895-956` (server repo).
-
-Stable status bytes (`WebSocketResponse.java:74-83`, mirrored from server `QwpConstants.java:174-190`):
-
-| Code | Constant | Server triggers |
-|---|---|---|
-| 0x00 | `STATUS_OK` | accepted |
-| 0x02 | `STATUS_DURABLE_ACK` | post-fsync ack (per-table) |
-| 0x03 | `STATUS_SCHEMA_MISMATCH` | `QwpParseException.SCHEMA_MISMATCH` |
-| 0x05 | `STATUS_PARSE_ERROR` | other `QwpParseException` |
-| 0x06 | `STATUS_INTERNAL_ERROR` | `CairoException.isCritical()` + catch-all `Throwable` |
-| 0x08 | `STATUS_SECURITY_ERROR` | `CairoException.isAuthorizationError()` |
-| 0x09 | `STATUS_WRITE_ERROR` | non-critical Cairo errors / table not accepting writes |
-
-WS-level violations (fragmented binary, text frame, oversized payload, malformed header) come as **WebSocket close frames** with codes PROTOCOL_ERROR / UNSUPPORTED_DATA / MESSAGE_TOO_BIG, not QWP error frames. These need to be funnelled into the same surface.
-
-## Client `Category` enum
-
-```java
-public enum Category {
-    SCHEMA_MISMATCH,    // 0x03
-    PARSE_ERROR,        // 0x05  — QWP-level malformed payload (likely client bug)
-    INTERNAL_ERROR,     // 0x06  — catch-all server fault; bundles resource/transient
-    SECURITY_ERROR,     // 0x08  — auth / ACL
-    WRITE_ERROR,        // 0x09  — table not accepting writes; bundles rate-limit-style
-    PROTOCOL_VIOLATION, // n/a   — WS-level close frame
-    UNKNOWN             // forward-compat for any new server status byte
-}
-```
-
-Forward-compat: unknown bytes map to `UNKNOWN`, the raw byte is preserved on `SenderError.serverStatusByte` for debugging.
-
-## `Policy` enum
-
-```java
-public enum Policy {
-    DROP_AND_CONTINUE, // ackedFsn advances past the bad span; loop keeps draining
-    HALT               // terminalError latched; next producer API call throws
-}
-```
-
-`RETRY_TRANSIENT` is **not** implemented — the wire has no retryable bit to drive it. The enum is binary today; expand later.
-
-## Default category → policy
-
-| Category | Default | Reasoning |
-|---|---|---|
-| SCHEMA_MISMATCH | DROP_AND_CONTINUE | Replay reproduces the same rejection; halting blocks unrelated tables on the same connection. |
-| PARSE_ERROR | HALT | Almost certainly a client bug (we sent malformed bytes). Halt preserves the on-disk frames for postmortem. |
-| INTERNAL_ERROR | HALT | Catch-all server fault; conservatively halt — could be transient, could be poison. Without a retryable bit we cannot tell. |
-| SECURITY_ERROR | HALT | Misconfig; loud failure wanted. |
-| WRITE_ERROR | DROP_AND_CONTINUE | "Non-critical Cairo errors / table not accepting writes" — per-batch in character. Halting blocks other tables. **Debatable; revisit once server splits 0x09 into transient vs permanent.** |
-| PROTOCOL_VIOLATION | HALT (forced) | Connection is gone — no choice. |
-| UNKNOWN | HALT | Never silently drop something we don't understand. |
-
-User overrides via builder (`errorPolicy(Category, Policy)` or full `errorPolicyResolver`) and via connect-string knobs (see below).
-
-## `SenderError` (public, immutable)
-
-```java
-/**
- * @param appliedPolicy  what the loop actually did
- * @param serverStatusByte  raw byte (0x03/0x05/...); -1 for PROTOCOL_VIOLATION
- * @param serverMessage  ≤1024 UTF-8 from frame, or WS close reason
- * @param messageSequence  server's per-frame seq (mirrors what server logs); -1 for PROTOCOL_VIOLATION
- * @param fromFsn  client-side FSN span — load-bearing for correlation
- * @param toFsn  inclusive
- * @param tableName  best-effort; null if multi-table batch
- * @param detectedAtNanos  System.nanoTime() at I/O thread receipt */
-public record SenderError(Category category, Policy appliedPolicy, int serverStatusByte, String serverMessage,
-                          long messageSequence, long fromFsn, long toFsn, String tableName, long detectedAtNanos) {
-    // accessors only; no mutation
-}
-```
-
-**Load-bearing fields**: `[fromFsn, toFsn]` and `appliedPolicy`. The FSN span is what the user joins to their producer-side log to identify the rejected data. `appliedPolicy` tells the user whether the data was dropped (must dead-letter) or halted (will be re-throw on next call) or — when retry lands — observed only.
-
-`messageSequence` is preserved for cross-team debugging (server-side ops think in `messageSequence`).
-
-## Mechanism — surfacing paths
-
-### Path 1: async callback
-- Builder-time `errorHandler(SenderErrorHandler)`. Default impl: ERROR log for HALT, WARN log for DROP, both with `category`, `[fromFsn, toFsn]`, `tableName`, `serverMessage`. Bumps a counter.
-- I/O thread, on rejection frame, builds `SenderError` and `errorInbox.offer(err)` on a bounded SPSC queue.
-- Bounded inbox: default cap 256. Overflow → drop the notification, bump `droppedErrorNotifications` counter, never block the I/O thread.
-- Dispatcher daemon thread (`QwpSender-error-dispatcher-<id>`, lazy-start on first error) does `take()` + invokes user handler; catches `Throwable` so a buggy handler can't poison the dispatcher.
-
-### Path 2: producer-side typed throw
-- Single volatile field on the existing producer-signal object (the one that already holds `connectionGeneration`):
-  ```java
-  @Contended
-  final class ProducerSignal {
-      volatile long connectionGeneration;   // existing
-      volatile SenderError terminalError;   // new
-  }
-  ```
-- I/O thread, on a HALT-policy error (or PROTOCOL_VIOLATION, or UNKNOWN), writes `signal.terminalError = err` **before** `errorInbox.offer(err)`. Ordering matters: producer must see the latch no later than the dispatcher delivers, otherwise a `flush()` post-callback could still pass.
-- Producer: `flushPendingRows` reads `signal.terminalError` once at batch entry (same cache line as `connectionGeneration` — single load-acquire). If non-null, throws `LineSenderServerException` carrying the `SenderError`.
-
-### Producer hot path
-- Per `at()` / `column*()`: zero change.
-- Per batch boundary (`flush()` or implicit batch publish): one volatile load that piggybacks on the existing `connectionGeneration` read. Same cache line. In steady state the line stays in producer L1; the I/O thread does not write to it on the ACK path.
-
-### I/O thread allocation
-- Per ACK (common case): zero change.
-- Per rejection: one `SenderError`, one queue node. NACK rate is bounded by batch rate, not row rate, and is rare in steady state. Pooling not justified.
-
-## WS close frames
-
-WS-level violations from `WebSocketCloseCode`-style paths (PROTOCOL_ERROR, UNSUPPORTED_DATA, MESSAGE_TOO_BIG, generic close-with-reason) surface as a `SenderError` with:
-- `category = PROTOCOL_VIOLATION`
-- `serverStatusByte = -1`
-- `messageSequence = -1`
-- `serverMessage = "ws-close[<code>]: <reason>"` or whatever `onClose(code, reason)` was given
-- `appliedPolicy = HALT` (always — the connection is gone)
-- FSN span = `[engine.ackedFsn() + 1, engine.publishedFsn()]` (the unacked window at close time)
-
-This routes the existing `ResponseHandler.onClose` through the new sink instead of just calling `fail(...)`.
-
-## Configuration knobs (connect string)
-
-| Key | Default | Values | Notes |
-|---|---|---|---|
-| `on_server_error` | `auto` | `auto` \| `halt` \| `drop` | global default; `auto` uses per-category table |
-| `on_schema_error` | `drop` | `halt` \| `drop` | overrides global for SCHEMA_MISMATCH |
-| `on_parse_error` | `halt` | `halt` \| `drop` | |
-| `on_internal_error` | `halt` | `halt` \| `drop` | |
-| `on_security_error` | `halt` | `halt` \| `drop` | |
-| `on_write_error` | `drop` | `halt` \| `drop` | |
-| `error_inbox_capacity` | `256` | int ≥ 16 | bounded SPSC capacity |
-
-PROTOCOL_VIOLATION and UNKNOWN are not user-configurable — both forced HALT.
-
-Per-category knob takes precedence over `on_server_error` if both are set.
-
-## Builder additions (`LineSenderBuilder`)
-
-```java
-.errorHandler(SenderErrorHandler)              // default: log ERROR/WARN + counter
-.errorPolicy(Category, Policy)                 // overrides for one category
-.errorPolicyResolver(SenderError -> Policy)    // full programmatic control; takes precedence
-.errorInboxCapacity(int)
-```
-
-## Public API surface
-
-- `SenderError` — public, final, immutable, in `io.questdb.client` package.
-- `SenderError.Category`, `SenderError.Policy` — public enums on `SenderError`.
-- `SenderErrorHandler` — `@FunctionalInterface` with `void onError(SenderError)`.
-- `LineSenderServerException extends LineSenderException` — `getServerError(): SenderError` accessor.
-- `Sender.flushAndGetSequence(): long` — returns FSN published; existing `flush()` kept verbatim. The returned FSN is the user's correlation handle for matching against `SenderError.fromFsn`.
-- `Sender.resumeAfterHalt()` — opt-in escape hatch: clears `terminalError`, restarts I/O loop reconnect, logs WARN. No auto-resume.
-- WS-only counter accessors on `QwpWebSocketSender`:
-  - `getTotalServerErrors(): long`
-  - `getDroppedErrorNotifications(): long`
-  - `getLastTerminalError(): SenderError` (snapshot; null if none).
-
-## Interaction with existing reconnect / ack paths
-
-- `CursorWebSocketSendLoop.ResponseHandler.onBinaryMessage` (line 712 onward, current branch): currently routes any non-`STATUS_OK` to `recordFatal(...)`, always terminal. New behavior: classify by status byte → category, resolve policy, build `SenderError`, then either:
-  - `DROP_AND_CONTINUE`: call `engine.acknowledge(fsnAtZero + wireSeq)` to advance past the bad span (the server already rejected it; we're not going to land it), inbox the error, continue.
-  - `HALT`: write `terminalError`, inbox the error, then call `recordFatal(...)` to break the loop. The `LineSenderException` raised by `recordFatal` carries the `SenderError` via `LineSenderServerException`.
-- `STATUS_DURABLE_ACK` (0x02) is unchanged — it's an upload-confirmation, not an error, and the existing handler already keeps it separate.
-- Reconnect budget exhaustion remains terminal (existing behavior). Surfaces as a synthesized `SenderError` with `category = PROTOCOL_VIOLATION` and FSN span = unacked window at giveup time.
-- Auth-terminal on reconnect (existing) is preserved as `category = SECURITY_ERROR` for consistency.
-
-## DROP_AND_CONTINUE: what about the disk?
-
-When the loop drops a rejected batch, the on-disk segment for that FSN range becomes garbage from the server's perspective — but the bytes are still there. Trim happens via the existing `engine.acknowledge(...)` → `SegmentManager.trim` path. Calling `acknowledge` with the rejected wireSeq advances `ackedFsn` past the bad batch, which trims it from disk on the next maintenance pass.
-
-This means the dropped bytes are **lost forever** from the sender's perspective. The user must dead-letter via `errorHandler` if they want a record. This is by design: SF preserves data until the server acks; once the server has explicitly rejected, the data is no longer the sender's responsibility.
-
-## Decisions locked
-1. ✅ 6 wire-aligned categories + `PROTOCOL_VIOLATION` + `UNKNOWN`. No abstracted-up category not distinguishable on the wire.
-2. ✅ Two policies only: `DROP_AND_CONTINUE`, `HALT`. `RETRY_TRANSIENT` reserved for post-server-split.
-3. ✅ Defaults per the table above. WRITE_ERROR is DROP (debatable; revisit when server splits).
-4. ✅ `SenderError` is public API, immutable, carries both `messageSequence` and `[fromFsn, toFsn]`.
-5. ✅ Multi-table batches: `tableName` may be null; user correlates via FSN span.
-6. ✅ WS close frames surface as `PROTOCOL_VIOLATION` with `serverStatusByte = -1`, `messageSequence = -1`, always HALT.
-7. ✅ Connect string carries policy knobs + inbox capacity. Callbacks require builder. Typed exception covers connect-string-only users.
-8. ✅ Producer hot path: zero allocations, one volatile load per batch (piggybacks `connectionGeneration` cache line).
-9. ✅ I/O thread never invokes user code. Bounded inbox + lazy-start dispatcher daemon. Inbox overflow drops + counts.
-10. ✅ Default handler is loud (ERROR for HALT, WARN for DROP). Silence forbidden.
-11. ✅ Counters and `getLastTerminalError()` accessor for ops visibility.
-12. ✅ `resumeAfterHalt()` is opt-in escape hatch; never auto-resume.
-13. ✅ `DROP_AND_CONTINUE` advances `ackedFsn` past the rejected span; data is dropped from disk via existing trim path.
-14. ✅ `flush()` signature unchanged. New `flushAndGetSequence()` returns FSN for user-side correlation.
-
-## Server-side follow-ups (track separately, not blocking client work)
-1. Split `0x06` and `0x09` to add explicit `RESOURCE_EXHAUSTED`, `RATE_LIMITED`, `TRANSIENT` codes — unblocks `RETRY_TRANSIENT` client policy.
-2. Or: add an explicit retryable bit (1 reserved byte in the error frame) — alternative to (1).
-3. Per-table attribution in multi-table batch errors — extend the error frame with an optional table index (`-1` = batch-level).
-4. Document whether rejected `messageSequence` values count toward the server's dedup window or are excluded.
-
-## Open
-None. Ready to implement.