Skip to content

IN LIST: unify bitmap filter implementations#23035

Draft
geoffreyclaude wants to merge 6 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_unify_bitmap_filters
Draft

IN LIST: unify bitmap filter implementations#23035
geoffreyclaude wants to merge 6 commits into
apache:mainfrom
geoffreyclaude:perf/in_list_unify_bitmap_filters

Conversation

@geoffreyclaude

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

#23011 and #23012 intentionally introduce the UInt8 and UInt16 bitmap filters as concrete implementations. With both widths visible, the shared shape is now clear: each filter builds a fixed-size bitmap from non-null IN list values and probes it with the input value's integer bit pattern.

This PR factors that duplicated bitmap machinery behind a small shared configuration trait. It does not add a new lookup strategy or change which data types use bitmap filters. The next PR uses this shared shape to let same-width signed integers reuse the unsigned bitmap storage.

What changes are included in this PR?

  • Adds BitmapStorage for fixed-size bitmap backing stores.
  • Adds BitmapFilterConfig to describe the Arrow type, native value, storage, and index conversion for each bitmap domain.
  • Replaces the concrete UInt8BitmapFilter and UInt16BitmapFilter implementations with BitmapFilter<C>.
  • Adds UInt8BitmapConfig and UInt16BitmapConfig.
  • Keeps UInt8 and UInt16 routing behavior unchanged.

Are these changes tested?

Yes.

  • cargo fmt --all
  • cargo test -p datafusion-physical-expr bitmap_filter_ --lib
  • cargo test -p datafusion-physical-expr in_list_int_types --lib
  • cargo test -p datafusion-physical-expr test_in_list_from_array_type_combinations --lib
  • cargo test -p datafusion-physical-expr test_in_list_dictionary_types --lib
  • cargo clippy -p datafusion-physical-expr --all-targets --all-features -- -D warnings

Are there any user-facing changes?

No. This is an internal refactor only.

Benchmark note

No local benchmark numbers are included for this PR because it is intended to be a behavior-preserving refactor of the bitmap filter implementation. Benchmarks were not rerun for this stack split.

Replaces HashSet<u8> with a 32-byte stack-allocated bitmap. Provides O(1) membership testing via bit-shifting, significantly reducing memory overhead and improving cache locality. Triggers for UInt8 arrays.
Implements an 8 KB heap-allocated bitmap for UInt16. Maintains O(1) performance while handling the larger value space. Triggers for UInt16 arrays.
@geoffreyclaude geoffreyclaude force-pushed the perf/in_list_unify_bitmap_filters branch from c5f4cbd to 8d99ad0 Compare June 19, 2026 05:55
@geoffreyclaude geoffreyclaude marked this pull request as draft June 19, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant