Skip to content

Fix DuckDB unparse for optimized join projections#23002

Open
goutamadwant wants to merge 1 commit into
apache:mainfrom
goutamadwant:fix/optimized-duckdb-unparse-scope
Open

Fix DuckDB unparse for optimized join projections#23002
goutamadwant wants to merge 1 commit into
apache:mainfrom
goutamadwant:fix/optimized-duckdb-unparse-scope

Conversation

@goutamadwant

@goutamadwant goutamadwant commented Jun 17, 2026

Copy link
Copy Markdown

Which issue does this PR close?

Rationale for this change

Optimized plans can introduce intermediate projection aliases from common subexpression elimination and can also leave join conditions referencing aliases from a flattened join input.

The SQL unparser was rebasing some unqualified optimizer aliases to a table alias, producing invalid references such as "o"."__common_expr_1". It could also emit a derived projection around a join input while the outer join condition still referenced aliases inside that derived table, producing SQL that DuckDB rejects.

What changes are included in this PR?

  • Make TableAliasRewriter use DFSchema qualifier information instead of only Arrow field names, with explicit control over whether unqualified fields should be rebound to a table alias.
  • Keep existing alias rebasing for table-scan filters and window-derived inputs.
  • Avoid wrapping qualified pass-through join projections when they are used as the left input of another already projected join, so referenced aliases remain in scope.
  • Add a regression test for the optimized DuckDB unparse path from the issue.

Are these changes tested?

  • cargo fmt --all
  • cargo test -p datafusion --test core_integration optimized_duckdb_unparse_preserves_derived_table_scope
  • cargo test -p datafusion-sql --test sql_integration
  • cargo test -p datafusion-sql unparser::rewrite
  • cargo clippy -p datafusion-sql --test sql_integration -- -D warnings
  • cargo clippy -p datafusion --test core_integration -- -D warnings

Are there any user-facing changes?

No public API changes. This fixes SQL generated by the unparser for optimized logical plans.

@github-actions github-actions Bot added sql SQL Planner core Core DataFusion crate labels Jun 17, 2026

@kosiew kosiew left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goutamadwant
Thanks for working on this fix. I think there's still one correctness gap around the symmetric join case, plus a small cleanup opportunity in the rewrite logic.

None
};

let right_plan =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch on the derived-join-input case for the left side. I think the same invariant applies symmetrically to the right side as well.

Right now unwrap_qualified_passthrough_join_projection is only applied to left_plan. If the optimized plan contains a qualified passthrough Projection(Join) on the right input while already_projected is true, we'll still recurse into that projection, emit it as a derived table, and then join_constraint_to_sql will use join.on columns that refer to aliases defined inside the derived table.

That seems to reintroduce the invalid-scope SQL this PR is fixing, but for right-deep or nested-right join shapes. Could we apply unwrap_qualified_passthrough_join_projection to right_plan as well before building right_relation and add a regression test covering a nested join on the right side?

Comment thread datafusion/sql/src/unparser/rewrite.rs Outdated
self.table_schema
.qualified_field_with_unqualified_name(&column.name)
}) {
Ok((Some(_), field)) => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small cleanup suggestion: both rewrite arms construct the same replacement column.

It might read a bit more clearly if the match made the rewrite condition explicit, something along the lines of Ok((qualifier, field)) if qualifier.is_some() || self.rewrite_unqualified, and then performed the rewrite once. The remaining Ok((None, _)) | Err(_) arm could stay as the no-op path. That would remove the duplicated Column::new(...) construction.

@goutamadwant goutamadwant force-pushed the fix/optimized-duckdb-unparse-scope branch from f714745 to 09a6038 Compare June 18, 2026 14:32
@goutamadwant

Copy link
Copy Markdown
Author

Thanks for the review @kosiew I pushed an update addressing both comments.

For the symmetric right-side case, the unparser now emits a qualified passthrough Projection(Join) as a parenthesized nested join relation instead of a derived projection. That keeps the inner join aliases visible to the outer join condition without reintroducing the invalid alias scope.

I also added a regression test for the right-side nested join case and simplified TableAliasRewriter so the rewrite condition is expressed once and the replacement column is constructed once.

Ran locally all the tests and they look fine. Let me know if you any suggestions. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimized logical plans produce invalid DuckDB SQL

2 participants