[spark] Record the write operation type in snapshot properties#8236
[spark] Record the write operation type in snapshot properties#8236Zouxxyy wants to merge 2 commits into
Conversation
|
I think adding operation as a dedicated nullable field in Snapshot is a better direction than storing it in properties. Compatibility should also be fine:
I would suggest modeling it as a first-class nullable enum or string field, for example Snapshot.Operation, rather than putting it into properties. commitKind describes the physical snapshot change, while operation describes the logical user operation, so both feel like core snapshot metadata. This would also avoid introducing a generic withCommitProperties API just for one standard field, and avoids potential conflicts around the "operation" property key. |
8f80251 to
1c14391
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1c14391 to
2705c1c
Compare
2705c1c to
87c8a42
Compare
Purpose
Add a first-class
operationfield (Snapshot.Operationenum) toSnapshot, recording the logical operation type that produced it. This complements the physicalCommitKind(APPEND/COMPACT/OVERWRITE/ANALYZE) and lets downstream tooling distinguish, e.g., an APPEND from INSERT vs. one from MERGE.Design:
Snapshot.Operationenum:WRITE,OVERWRITE,DELETE,TRUNCATE,UPDATE,MERGE,CREATE_TABLE_AS_SELECT,REPLACE_TABLE_AS_SELECT,CREATE_OR_REPLACE_TABLE_AS_SELECT@JsonInclude(NON_NULL)— old snapshots deserialize asnull, old readers ignore the unknown field via@JsonIgnoreProperties(ignoreUnknown = true)BatchTableCommit.withOperation(Operation)—defaultmethod, no breaking change for existing implementationsFileStoreCommit.withOperation(Operation)— internal API, propagated toSnapshotconstruction inFileStoreCommitImplTRUNCATEis automatically set byTableCommitImpl.truncateTable()/truncatePartitions()in core, so callers don't need to handle itSpark coverage (both v1 and v2 write paths):
WRITEOVERWRITEDELETETRUNCATETRUNCATEUPDATEMERGECREATE_TABLE_AS_SELECTREPLACE_TABLE_AS_SELECT/CREATE_OR_REPLACE_TABLE_AS_SELECTTests
SnapshotTest.testSnapshotWithOperation(paimon-core): JSON serialization round-trip, backward compatibility with old snapshotsSnapshotOperationTest(paimon-spark-ut): all operations under bothuse-v2-write=true/false, including CTAS/RTAS and truncate paths