Skip to content

fix: coerce SIMILAR TO operands to a common string type#22992

Open
mvanhorn wants to merge 1 commit into
apache:mainfrom
mvanhorn:fix/22886-similar-to-type-coercion
Open

fix: coerce SIMILAR TO operands to a common string type#22992
mvanhorn wants to merge 1 commit into
apache:mainfrom
mvanhorn:fix/22886-similar-to-type-coercion

Conversation

@mvanhorn

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

SIMILAR TO panics with failed to downcast array whenever its two operands resolve to arrays of different physical string types, for example a Utf8View column matched against a non-literal Utf8 pattern, or a NULL pattern that produces a NullArray. A NULL pattern even panics at plan time during constant folding.

The cause is that, unlike LIKE/ILIKE and the regex operators, the TypeCoercion analyzer never coerces Expr::SimilarTo operands to a common type. Expr::SimilarTo(_) was listed in the "nothing to coerce" no-op arm of TypeCoercionRewriter::f_up, so both operands reached the executing kernel unchanged. That kernel picks the downcast type from the left array and .expect()s the right array to match, which panics when they differ. Literal patterns take the scalar fast path, which is why the common col SIMILAR TO 'pattern' form works and this went unnoticed.

What changes are included in this PR?

  • Removed Expr::SimilarTo(_) from the no-op coercion arm.
  • Added a dedicated Expr::SimilarTo coercion arm in datafusion/optimizer/src/analyzer/type_coercion.rs that mirrors the existing Expr::Like arm: it computes the operand types, finds the common string type via like_coercion, and casts both operands to it (preserving the same Dictionary(_, Utf8) short-circuit the Like arm uses). This guarantees both operands reach the regex kernel as the same physical type and coerces NULL patterns into the common type, fixing both the execution-time and plan-time panics. The no-common-type error message uses the SIMILAR TO operator name.

Are these changes tested?

Yes.

  • A new unit test similar_to_for_type_coercion in type_coercion.rs (next to like_for_type_coercion) covers the literal-pattern, NULL-pattern, and no-common-type-error cases.
  • New end-to-end coverage in datafusion/sqllogictest/test_files/type_coercion.slt exercises a Utf8View column matched against a non-literal Utf8 pattern column and a NULL pattern, both of which previously panicked and now return correct results.

Are there any user-facing changes?

SIMILAR TO queries that previously panicked now plan and execute correctly. There are no API changes.

`SIMILAR TO` panicked with "failed to downcast array" whenever its two
operands resolved to arrays of different physical string types (for
example a Utf8View column matched against a non-literal Utf8 pattern, or a
NULL pattern producing a NullArray). Unlike LIKE and the regex operators,
the TypeCoercion analyzer left `Expr::SimilarTo` in the no-op arm, so the
executing kernel downcast the right array to the left array's type and
panicked. A NULL pattern even panicked at plan time during constant
folding.

Give `SimilarTo` a dedicated coercion arm mirroring the existing `Like`
arm: compute the operand types, find the common string type via
`like_coercion`, and cast both operands to it (preserving the
Dictionary(_, Utf8) short-circuit). This guarantees both operands reach
`regex_match_dyn` as the same physical type and coerces NULL patterns into
the common type, eliminating both panics.

Adds a unit test next to `like_for_type_coercion` and end-to-end
sqllogictest coverage for the Utf8View-vs-Utf8 and NULL-pattern cases.

Fixes apache#22886

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jun 17, 2026
@Jefffrey

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SIMILAR TO panics ('failed to downcast array') when operand types differ (e.g. NULL pattern, Utf8View vs Utf8)

2 participants