Skip to content

Fix triggerer CrashLoopBackOff when json_logs is enabled#68584

Open
safaehar wants to merge 2 commits into
apache:mainfrom
safaehar:fix/triggerer-json-logs-str-bytes-crash
Open

Fix triggerer CrashLoopBackOff when json_logs is enabled#68584
safaehar wants to merge 2 commits into
apache:mainfrom
safaehar:fix/triggerer-json-logs-str-bytes-crash

Conversation

@safaehar

Copy link
Copy Markdown

What

TriggerRunnerSupervisor._process_log_messages_from_subprocess primes itself by calling airflow.sdk.log.configure_logging() with no arguments. json_output defaults to False, so this call reconfigures structlog globally and installs the text WriteLogger factory — overwriting the bytes BytesLogger factory that startup set up when [logging] json_logs = True.

The stdout/stderr forwarders (_create_log_forwarderforward_to_log) are already wrapped with the JSON (bytes) processor chain but bind their underlying logger lazily. As soon as a trigger subprocess writes to stdout/stderr — e.g. an import-time DeprecationWarning from a provider trigger that imports a heavy client such as KubernetesPodTrigger (kubernetes) or S3KeyTrigger (boto3) — the lazy bind resolves against the now-text factory, and WriteLogger.msg does message + "\n" on bytes produced by the JSON renderer:

TypeError: can't concat str to bytes
  File ".../structlog/_output.py", line 217, in msg
    self._write(message + "\n")

This crashes the TriggerRunnerSupervisor.run loop and the triggerer enters CrashLoopBackOff. It only manifests when json_logs is enabled and the triggerer actually runs a trigger whose subprocess emits to stdout/stderr — idle triggerers and those running only quiet triggers (DateTimeTrigger, etc.) never reach forward_to_log, which is why the crash looks intermittent across deployments.

Fix

Pass json_output from the [logging] json_logs config (the same conf.getboolean("logging", "json_logs", fallback=False) used elsewhere, e.g. airflow/logging_config.py) so the global structlog factory stays consistent with the rest of the process. This matches the already-hardcoded logging_processors(json_output=True) a few lines below.

Tests

Adds a parametrized regression test asserting _process_log_messages_from_subprocess calls configure_logging(json_output=<json_logs>) for both True and False.


^ Add meaningful description above. Read the Pull Request Guidelines for more information.

TriggerRunnerSupervisor._process_log_messages_from_subprocess primes itself
by calling airflow.sdk.log.configure_logging() with no arguments. json_output
defaults to False, so this reconfigures structlog globally and installs the
text WriteLogger factory -- overwriting the bytes BytesLogger factory that
startup set up from json_logs=True.

The stdout/stderr forwarders (_create_log_forwarder -> forward_to_log) were
already wrapped with the JSON (bytes) processor chain but bind their underlying
logger lazily. As soon as a trigger subprocess writes to stdout/stderr -- for
example an import-time warning from a provider trigger that pulls in a heavy
client (kubernetes, boto3) -- the lazy bind resolves against the now-text
factory and WriteLogger.msg does `message + "\n"` on bytes from the JSON
renderer, raising `TypeError: can't concat str to bytes` and crash-looping the
triggerer.

Pass json_output from the logging.json_logs config so the global structlog
factory stays consistent with the rest of the process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant