Fix DuckDB unparse for optimized join projections#23002
Conversation
kosiew
left a comment
There was a problem hiding this comment.
@goutamadwant
Thanks for working on this fix. I think there's still one correctness gap around the symmetric join case, plus a small cleanup opportunity in the rewrite logic.
| None | ||
| }; | ||
|
|
||
| let right_plan = |
There was a problem hiding this comment.
Nice catch on the derived-join-input case for the left side. I think the same invariant applies symmetrically to the right side as well.
Right now unwrap_qualified_passthrough_join_projection is only applied to left_plan. If the optimized plan contains a qualified passthrough Projection(Join) on the right input while already_projected is true, we'll still recurse into that projection, emit it as a derived table, and then join_constraint_to_sql will use join.on columns that refer to aliases defined inside the derived table.
That seems to reintroduce the invalid-scope SQL this PR is fixing, but for right-deep or nested-right join shapes. Could we apply unwrap_qualified_passthrough_join_projection to right_plan as well before building right_relation and add a regression test covering a nested join on the right side?
| self.table_schema | ||
| .qualified_field_with_unqualified_name(&column.name) | ||
| }) { | ||
| Ok((Some(_), field)) => { |
There was a problem hiding this comment.
Small cleanup suggestion: both rewrite arms construct the same replacement column.
It might read a bit more clearly if the match made the rewrite condition explicit, something along the lines of Ok((qualifier, field)) if qualifier.is_some() || self.rewrite_unqualified, and then performed the rewrite once. The remaining Ok((None, _)) | Err(_) arm could stay as the no-op path. That would remove the duplicated Column::new(...) construction.
f714745 to
09a6038
Compare
|
Thanks for the review @kosiew I pushed an update addressing both comments. For the symmetric right-side case, the unparser now emits a qualified passthrough I also added a regression test for the right-side nested join case and simplified Ran locally all the tests and they look fine. Let me know if you any suggestions. thanks! |
Which issue does this PR close?
Rationale for this change
Optimized plans can introduce intermediate projection aliases from common subexpression elimination and can also leave join conditions referencing aliases from a flattened join input.
The SQL unparser was rebasing some unqualified optimizer aliases to a table alias, producing invalid references such as
"o"."__common_expr_1". It could also emit a derived projection around a join input while the outer join condition still referenced aliases inside that derived table, producing SQL that DuckDB rejects.What changes are included in this PR?
TableAliasRewriteruseDFSchemaqualifier information instead of only Arrow field names, with explicit control over whether unqualified fields should be rebound to a table alias.Are these changes tested?
cargo fmt --allcargo test -p datafusion --test core_integration optimized_duckdb_unparse_preserves_derived_table_scopecargo test -p datafusion-sql --test sql_integrationcargo test -p datafusion-sql unparser::rewritecargo clippy -p datafusion-sql --test sql_integration -- -D warningscargo clippy -p datafusion --test core_integration -- -D warningsAre there any user-facing changes?
No public API changes. This fixes SQL generated by the unparser for optimized logical plans.