Skip to content

Writing v3 Table Metadata #3551

Open
rambleraptor wants to merge 1 commit into
apache:mainfrom
rambleraptor:v3-metadata-write
Open

Writing v3 Table Metadata #3551
rambleraptor wants to merge 1 commit into
apache:mainfrom
rambleraptor:v3-metadata-write

Conversation

@rambleraptor

@rambleraptor rambleraptor commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Rationale for this change

We're far enough into v3 (with a ratified spec!) that we should consider writing v3 table metadata.

Are these changes tested?

Added a unit test and fixed up the places where we were expecting errors

Are there any user-facing changes?

  • Added v3 table metadata writing support

@rambleraptor rambleraptor requested a review from kevinjqliu June 22, 2026 22:04

@ebyhr ebyhr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add an integration test that Spark reads v3 tables written by PyIceberg?

@abnobdoss abnobdoss left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome - I'm excited for v3 table write support! Left a few minor comments; apologies if I misunderstood anything here.

INITIAL_SPEC_ID = 0
DEFAULT_SCHEMA_ID = 0

SUPPORTED_TABLE_FORMAT_VERSION = 2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to change?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR only allow new v3 tables or does it also enable v1/v2 tables to be upgraded to v3?

next_row_id: int | None = Field(alias="next-row-id", default=None)
"""A long higher than all assigned row IDs; the next snapshot's `first-row-id`."""

def model_dump_json(self, exclude_none: bool = True, exclude: Any | None = None, by_alias: bool = True, **kwargs: Any) -> str:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is row lineage fully wired up? If I'm not mistaken, row lineage would be a pre-requisite for v3 enablement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants