Skip to content

[parquet] add vector support for parquet#8282

Open
steFaiz wants to merge 1 commit into
apache:masterfrom
steFaiz:support_vector_parquet
Open

[parquet] add vector support for parquet#8282
steFaiz wants to merge 1 commit into
apache:masterfrom
steFaiz:support_vector_parquet

Conversation

@steFaiz

@steFaiz steFaiz commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Purpose

Currently parquet format do not recognize vector type. This will cause UnsupportedOperationException if users' do not store vectors in separated format. e.g. Lance, Vortex

Tests

UnitTests

@JingsongLi

Copy link
Copy Markdown
Contributor

Thanks for adding Parquet vector support. One correctness concern: VectorType is fixed-size, but this implementation encodes it as a Parquet LIST and currently accepts whatever list length is present. On the write path, ParquetRowDataWriter writes row.getVector(...).size() without checking it against VectorType.getLength(); on the read path, CastedVectorColumnVector returns a ColumnarVec using the Parquet list length, not the declared vector length. That can let malformed vectors be written/read and then fail later in Spark/DataConverter or vector indexing with less context. Could we validate the vector length at the Parquet boundary, and ideally add a test for mismatched length?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants