Fix triggerer CrashLoopBackOff when json_logs is enabled#68584
Open
safaehar wants to merge 2 commits into
Open
Fix triggerer CrashLoopBackOff when json_logs is enabled#68584safaehar wants to merge 2 commits into
safaehar wants to merge 2 commits into
Conversation
TriggerRunnerSupervisor._process_log_messages_from_subprocess primes itself by calling airflow.sdk.log.configure_logging() with no arguments. json_output defaults to False, so this reconfigures structlog globally and installs the text WriteLogger factory -- overwriting the bytes BytesLogger factory that startup set up from json_logs=True. The stdout/stderr forwarders (_create_log_forwarder -> forward_to_log) were already wrapped with the JSON (bytes) processor chain but bind their underlying logger lazily. As soon as a trigger subprocess writes to stdout/stderr -- for example an import-time warning from a provider trigger that pulls in a heavy client (kubernetes, boto3) -- the lazy bind resolves against the now-text factory and WriteLogger.msg does `message + "\n"` on bytes from the JSON renderer, raising `TypeError: can't concat str to bytes` and crash-looping the triggerer. Pass json_output from the logging.json_logs config so the global structlog factory stays consistent with the rest of the process.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
TriggerRunnerSupervisor._process_log_messages_from_subprocessprimes itself by callingairflow.sdk.log.configure_logging()with no arguments.json_outputdefaults toFalse, so this call reconfigures structlog globally and installs the textWriteLoggerfactory — overwriting the bytesBytesLoggerfactory that startup set up when[logging] json_logs = True.The stdout/stderr forwarders (
_create_log_forwarder→forward_to_log) are already wrapped with the JSON (bytes) processor chain but bind their underlying logger lazily. As soon as a trigger subprocess writes to stdout/stderr — e.g. an import-timeDeprecationWarningfrom a provider trigger that imports a heavy client such asKubernetesPodTrigger(kubernetes) orS3KeyTrigger(boto3) — the lazy bind resolves against the now-text factory, andWriteLogger.msgdoesmessage + "\n"on bytes produced by the JSON renderer:This crashes the
TriggerRunnerSupervisor.runloop and the triggerer enters CrashLoopBackOff. It only manifests whenjson_logsis enabled and the triggerer actually runs a trigger whose subprocess emits to stdout/stderr — idle triggerers and those running only quiet triggers (DateTimeTrigger, etc.) never reachforward_to_log, which is why the crash looks intermittent across deployments.Fix
Pass
json_outputfrom the[logging] json_logsconfig (the sameconf.getboolean("logging", "json_logs", fallback=False)used elsewhere, e.g.airflow/logging_config.py) so the global structlog factory stays consistent with the rest of the process. This matches the already-hardcodedlogging_processors(json_output=True)a few lines below.Tests
Adds a parametrized regression test asserting
_process_log_messages_from_subprocesscallsconfigure_logging(json_output=<json_logs>)for bothTrueandFalse.^ Add meaningful description above. Read the Pull Request Guidelines for more information.