Skip to content

[python] Support native batch vector search in Lumina reader#8280

Draft
XiaoHongbo-Hope wants to merge 1 commit into
apache:masterfrom
XiaoHongbo-Hope:batch_search_fix
Draft

[python] Support native batch vector search in Lumina reader#8280
XiaoHongbo-Hope wants to merge 1 commit into
apache:masterfrom
XiaoHongbo-Hope:batch_search_fix

Conversation

@XiaoHongbo-Hope

Copy link
Copy Markdown
Contributor

Purpose

Tests

Batch vector search in pypaimon looped single searches (one search_list
call per query vector). The Lumina native binding already exposes a batch
entry: search_list(flat, n, k) / search_with_filter_list, the same native
call the Java path uses. Route batch search through it.

- Add BatchVectorSearch predicate (mirrors Java BatchVectorSearch).
- GlobalIndexReader: default visit_batch_vector_search fans out to single
  search for readers without a native batch path.
- OffsetGlobalIndexReader: offset wrapper for batch.
- Lumina reader: native visit_batch_vector_search flattens the n query
  vectors into one (n * dim) buffer, calls search_list/search_with_filter_list
  once, and slices each query's results from [q * k, q * k + k).
- read_batch: one batch call per split, merged per query vector across
  splits (mirrors Java BatchVectorReadImpl).
- Tests: cover the default fan-out and the native batch path; add a real
  Lumina batch-vs-single equivalence regression test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant