Skip to content

IN LIST: add UInt16 bitmap filter#23012

Draft
geoffreyclaude wants to merge 5 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_bitmap_u16_filter
Draft

IN LIST: add UInt16 bitmap filter#23012
geoffreyclaude wants to merge 5 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_bitmap_u16_filter

Conversation

@geoffreyclaude

@geoffreyclaude geoffreyclaude commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

#23011 uses a bitmap checklist for UInt8, where there are 256 possible values. UInt16 is the same idea with a larger value range: 0 through 65,535.

That is still small enough to represent directly. A UInt16 bitmap needs one bit for each possible value:

  • 65,536 possible values
  • 65,536 bits total
  • 8 KB of memory

Then a lookup is still simple: use the input value as the bit position and check whether that bit is set. For example, if the list contains 42, bit 42 is set, and every input row with value 42 can be recognized with one bit test.

This PR keeps the scope narrow: it adds the unsigned 2-byte bitmap path. Signed 1-byte and 2-byte types reuse these bitmaps in the next PR.

What changes are included in this PR?

  • Generalizes the bitmap filter so different integer widths can provide their own storage and indexing behavior.
  • Adds UInt16BitmapConfig, backed by a heap-allocated 65,536-bit bitmap.
  • Routes UInt16 constant-list filtering to the bitmap path.
  • Keeps the same IN / NOT IN null behavior as the generic path.
  • Adds focused coverage for UInt16 boundary values, sliced arrays, nulls, and NOT IN.

Are these changes tested?

Yes.

  • cargo fmt --all --check
  • cargo test -p datafusion-physical-expr bitmap_filter_u16 --lib
  • cargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warnings

Are there any user-facing changes?

No. This is an internal performance optimization only.

Benchmark note

No local in_list_strategy numbers are included for this PR because the benchmark harness does not currently include a direct UInt16 case. The available i16 rows measure the signed reinterpretation path added in #23013, not this PR's unsigned UInt16 bitmap filter.

Replaces HashSet<u8> with a 32-byte stack-allocated bitmap. Provides O(1) membership testing via bit-shifting, significantly reducing memory overhead and improving cache locality. Triggers for UInt8 arrays.
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_bitmap_u16_filter branch from 48e938d to 55f3836 Compare June 18, 2026 07:57
Implements an 8 KB heap-allocated bitmap for UInt16. Maintains O(1) performance while handling the larger value space. Triggers for UInt16 arrays.
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_bitmap_u16_filter branch from 55f3836 to 81ec379 Compare June 18, 2026 08:40
@geoffreyclaude geoffreyclaude changed the title Extend Bitmap Filter to UInt16 (Heap-based) IN LIST: add UInt16 bitmap filter Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant