feat: add date functions to bigframes.bigquery module#17514
Conversation
Used the following prompt: > Update the descriptions and argument names in scripts/data/sql-functions/global-namespace/date.yaml according to the following SQL documentation: > > (Paste from https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/date_functions) > > Also, if there is a natural argument to use for `series_accessor_arg` in this yaml or others, add it.
| "timestamp": "dtypes.TIMESTAMP_DTYPE", | ||
| "decimal<38,9>": "dtypes.NUMERIC_DTYPE", | ||
| "decimal<76,38>": "dtypes.BIGNUMERIC_DTYPE", | ||
| "interval_day": "dtypes.TIMEDELTA_DTYPE", |
There was a problem hiding this comment.
Not strictly true, but OK for now?
There was a problem hiding this comment.
Code Review
This pull request introduces support for BigQuery date operations in BigFrames, including the generation of date functions, series accessors, and corresponding unit tests. The reviewer identified several issues in the date.yaml configuration file where timestamp signatures were incorrectly added to functions that only support DATE or DATETIME expressions in BigQuery (such as DATE_ADD, DATE_BUCKET, DATE_DIFF, DATE_SUB, DATE_TRUNC, FORMAT_DATE, and LAST_DAY). Additionally, the reviewer noted that the EXTRACT function signatures omitting the required part argument are incorrect and should be removed to prevent runtime SQL compilation errors.
| # Signature: date_add:pts_i64_any | ||
| - args: | ||
| - name: "date_expression" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "int64_expression" | ||
| value: i64 | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "date_part" | ||
| value: any1 | ||
| optional: false | ||
| keyword_only: false | ||
| return: timestamp |
There was a problem hiding this comment.
In BigQuery, the DATE_ADD function only supports DATE expressions. For TIMESTAMP expressions, BigQuery provides a separate TIMESTAMP_ADD function. Including the timestamp signature here will cause bigframes to generate invalid SQL (e.g., DATE_ADD(timestamp_col, ...)), which will fail with a compilation error at runtime in BigQuery. Please remove this signature.
| # Signature: date_bucket:pts_iday_pts | ||
| - args: | ||
| - name: "date_in_bucket" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "bucket_width" | ||
| value: interval_day | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "bucket_origin" | ||
| value: timestamp | ||
| optional: true | ||
| keyword_only: false | ||
| return: timestamp |
| # Signature: date_diff:pts_pts_any | ||
| - args: | ||
| - name: "end_date" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "start_date" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "granularity" | ||
| value: any1 | ||
| optional: false | ||
| keyword_only: false | ||
| return: i64 |
| # Signature: date_sub:pts_i64_any | ||
| - args: | ||
| - name: "date_expression" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "int64_expression" | ||
| value: i64 | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "date_part" | ||
| value: any1 | ||
| optional: false | ||
| keyword_only: false | ||
| return: timestamp |
| # Signature: date_trunc:pts_any | ||
| - args: | ||
| - name: "date_value" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "granularity" | ||
| value: any1 | ||
| optional: false | ||
| keyword_only: false | ||
| return: timestamp | ||
| # Signature: date_trunc:pts_any_str | ||
| - args: | ||
| - name: "date_value" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "granularity" | ||
| value: any1 | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "time_zone" | ||
| value: string | ||
| optional: true | ||
| keyword_only: false | ||
| return: timestamp |
| # Signature: extract:pts_str | ||
| - args: | ||
| - name: "date_expression" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "time_zone" | ||
| value: string | ||
| optional: true | ||
| keyword_only: false | ||
| return: time | ||
| # Signature: extract:pts | ||
| - args: | ||
| - name: "date_expression" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| return: time |
There was a problem hiding this comment.
In BigQuery, the EXTRACT function always requires a part argument (e.g., EXTRACT(part FROM expression)). The signatures extract:pts_str and extract:pts omit the part argument, which is incorrect and will cause signature matching issues or invalid SQL generation. Please remove these incorrect signatures.
| # Signature: format_date:str_pts | ||
| - args: | ||
| - name: "format_string" | ||
| value: string | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "date_expr" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| return: string | ||
| # Signature: format_date:str_pts_str | ||
| - args: | ||
| - name: "format_string" | ||
| value: string | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "date_expr" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "time_zone" | ||
| value: string | ||
| optional: true | ||
| keyword_only: false | ||
| return: string |
| # Signature: last_day:pts_any | ||
| - args: | ||
| - name: "date_expression" | ||
| value: timestamp | ||
| optional: false | ||
| keyword_only: false | ||
| - name: "date_part" | ||
| value: any1 | ||
| optional: true | ||
| keyword_only: false | ||
| return: date |
Used the following prompt:
Towards BigQuery SQL API coverage. 🦕