fix: coerce SIMILAR TO operands to a common string type#22992
Open
mvanhorn wants to merge 1 commit into
Open
Conversation
`SIMILAR TO` panicked with "failed to downcast array" whenever its two operands resolved to arrays of different physical string types (for example a Utf8View column matched against a non-literal Utf8 pattern, or a NULL pattern producing a NullArray). Unlike LIKE and the regex operators, the TypeCoercion analyzer left `Expr::SimilarTo` in the no-op arm, so the executing kernel downcast the right array to the left array's type and panicked. A NULL pattern even panicked at plan time during constant folding. Give `SimilarTo` a dedicated coercion arm mirroring the existing `Like` arm: compute the operand types, find the common string type via `like_coercion`, and cast both operands to it (preserving the Dictionary(_, Utf8) short-circuit). This guarantees both operands reach `regex_match_dyn` as the same physical type and coerces NULL patterns into the common type, eliminating both panics. Adds a unit test next to `like_for_type_coercion` and end-to-end sqllogictest coverage for the Utf8View-vs-Utf8 and NULL-pattern cases. Fixes apache#22886 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
fyi we have an existing PR for this issue |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
SIMILAR TOpanics withfailed to downcast arraywhenever its two operands resolve to arrays of different physical string types, for example aUtf8Viewcolumn matched against a non-literalUtf8pattern, or aNULLpattern that produces aNullArray. ANULLpattern even panics at plan time during constant folding.The cause is that, unlike
LIKE/ILIKEand the regex operators, theTypeCoercionanalyzer never coercesExpr::SimilarTooperands to a common type.Expr::SimilarTo(_)was listed in the "nothing to coerce" no-op arm ofTypeCoercionRewriter::f_up, so both operands reached the executing kernel unchanged. That kernel picks the downcast type from the left array and.expect()s the right array to match, which panics when they differ. Literal patterns take the scalar fast path, which is why the commoncol SIMILAR TO 'pattern'form works and this went unnoticed.What changes are included in this PR?
Expr::SimilarTo(_)from the no-op coercion arm.Expr::SimilarTocoercion arm indatafusion/optimizer/src/analyzer/type_coercion.rsthat mirrors the existingExpr::Likearm: it computes the operand types, finds the common string type vialike_coercion, and casts both operands to it (preserving the sameDictionary(_, Utf8)short-circuit theLikearm uses). This guarantees both operands reach the regex kernel as the same physical type and coercesNULLpatterns into the common type, fixing both the execution-time and plan-time panics. The no-common-type error message uses theSIMILAR TOoperator name.Are these changes tested?
Yes.
similar_to_for_type_coercionintype_coercion.rs(next tolike_for_type_coercion) covers the literal-pattern,NULL-pattern, and no-common-type-error cases.datafusion/sqllogictest/test_files/type_coercion.sltexercises aUtf8Viewcolumn matched against a non-literalUtf8pattern column and aNULLpattern, both of which previously panicked and now return correct results.Are there any user-facing changes?
SIMILAR TOqueries that previously panicked now plan and execute correctly. There are no API changes.