Skip to content

feat: OTel context sharing#3970

Open
morrisonlevi wants to merge 26 commits into
masterfrom
levi/otel-thread-context
Open

feat: OTel context sharing#3970
morrisonlevi wants to merge 26 commits into
masterfrom
levi/otel-thread-context

Conversation

@morrisonlevi

@morrisonlevi morrisonlevi commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Description

This adds OTel context sharing using libdatadog. It adds it to both tracing (producer) and profiling (consumer).

This is mostly AI-written with a little human review as I went. It is not ready for merge, but it did pass some local end-to-end experimentation.

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

@morrisonlevi morrisonlevi added ☠️ do-not-merge/WIP profiling Relates to the Continuous Profiler tracing labels Jun 10, 2026
@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Jun 10, 2026

Copy link
Copy Markdown

Pipelines  Tests

Fix all issues with BitsAI

⚠️ Warnings

🚦 10 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-php | appsec integration tests (ssi): [test8.3-release-ssi]   View in Datadog   GitLab

🧪 15 Tests failed

otel process context shared memory has expected metadata and threadlocal attributes() from com.datadog.appsec.php.integration.OtelThreadContextTests   View in Datadog
Assertion failed: 

assert processContext['threadlocal.schema_version'] == 'tlsdesc_v1_dev'
       |             |                              |
       |             null                           false
       ['present':'true', 'signature':'OTEL_CTX', 'version':'2', 'payload_size':'507', 'published_at':'1160505130152', 'service.name':'', 'service.instance.id':'f0039caf-3b02-4272-bd12-25c88f4c698c', 'service.version':'', 'deployment.environment.name':'', 'telemetry.sdk.language':'php', 'telemetry.sdk.version':'1.22.0', 'telemetry.sdk.name':'libdatadog', 'host.name':'1ac22909291a', 'container.id':'']

Assertion failed: 

assert processContext['threadlocal.schema_version'] == 'tlsdesc_v1_dev'
...
otel thread context detaches on empty stack and republishes on explicit stack switch() from com.datadog.appsec.php.integration.OtelThreadContextTests   View in Datadog
Assertion failed: 

assert threadContext['datadog.local_root_span_id'] == responseBody["${stackPrefix}_local_root_span_id"]
       |            |                              |  |           |   |
       |            null                           |  |           |   'entrypoint'
       |                                           |  |           'c312d5d5c05b2f8f'
       |                                           |  ['empty_waited':true, 'entrypoint_restored_waited':true, 'entrypoint_switched_waited':true, 'nested_switched_waited':true, 'entrypoint_trace_id':'0000000000000000e6b31e81a926268c', 'entrypoint_span_id':'c312d5d5c05b2f8f', 'entrypoint_local_root_span_id':'c312d5d5c05b2f8f', 'nested_trace_id':'6a443881000000004243144ac6324573', 'nested_span_id':'4243144ac6324573', 'nested_local_root_span_id':'4243144ac6324573', 'service_name':'appsec_int_tests', 'service_version':'1.2.3', 'deployment_environment_name':'integration']
       |                                           false
       ['thread':'1', 'slot':'0x7fcbd8877c50', 'ctx':'0x7fcbd8261a28', 'trace_id':'0000000000000000e6b31e81a926268c', 'span_id':'c312d5d5c05b2f8f', 'valid':'1', 'attrs_data_size':'56']

...
View all 15 test failures

DataDog/apm-reliability/dd-trace-php | appsec integration tests: [test8.4-release]   View in Datadog   GitLab

🧪 1 Test failed

All test failures are known flaky.

❄️ Known flaky: extended heartbeat re-emits configuration, dependencies and integrations() from com.datadog.appsec.php.integration.TelemetryExtendedHeartbeatTests   View in Datadog
java.lang.AssertionError: phpredis not emitted via app-started/app-integrations-change; saw: []. Expression: (phpredis in flushed). Values: flushed = []

java.lang.AssertionError: phpredis not emitted via app-started/app-integrations-change; saw: []. Expression: (phpredis in flushed). Values: flushed = []
	at org.codehaus.groovy.runtime.InvokerHelper.createAssertError(InvokerHelper.java:416)
	at com.datadog.appsec.php.integration.TelemetryExtendedHeartbeatTests.extended heartbeat re-emits configuration, dependencies and integrations(TelemetryExtendedHeartbeatTests.groovy:70)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)

Not introduced in this PR.

DataDog/apm-reliability/dd-trace-php | appsec integration tests (helper-cpp): [test8.3-release-zts]   View in Datadog   GitLab

View all 10 failed jobs.

ℹ️ Info

No other issues found (see more)

❄️ No new flaky tests detected

🔄 Datadog auto-retried 1 job - 0 passed on retry View in Datadog

🎯 Code Coverage (details)
Patch Coverage: 0.00%
Overall Coverage: 54.04% (-0.04%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 110757e | Docs | Datadog PR Page | Give us feedback!

@pr-commenter

pr-commenter Bot commented Jun 10, 2026

Copy link
Copy Markdown

Benchmarks [ profiler ]

Benchmark execution time: 2026-06-30 21:14:11

Comparing candidate commit 110757e in PR branch levi/otel-thread-context with baseline commit e1d0dac in branch master.

Found 0 performance improvements and 3 performance regressions! Performance is the same for 23 metrics, 10 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:php-profiler-timeline-memory-control

  • 🟥 cpu_user_time [+36.781ms; +41.973ms] or [+6.175%; +7.047%]
  • 🟥 execution_time [+35.242ms; +40.481ms] or [+5.645%; +6.484%]

scenario:php-profiler-timeline-memory-with-profiler

  • 🟥 execution_time [+29.608ms; +51.154ms] or [+2.824%; +4.879%]

@pr-commenter

pr-commenter Bot commented Jun 10, 2026

Copy link
Copy Markdown

Benchmarks [ tracer ]

Benchmark execution time: 2026-06-30 22:15:16

Comparing candidate commit 110757e in PR branch levi/otel-thread-context with baseline commit e1d0dac in branch master.

Found 0 performance improvements and 10 performance regressions! Performance is the same for 184 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

scenario:MessagePackSerializationBench/benchMessagePackSerialization

  • 🟥 execution_time [+3.118µs; +3.942µs] or [+3.053%; +3.860%]

scenario:MessagePackSerializationBench/benchMessagePackSerialization-opcache

  • 🟥 execution_time [+3.242µs; +4.758µs] or [+3.063%; +4.494%]

scenario:PDOBench/benchPDOOverhead

  • 🟥 execution_time [+8.115µs; +10.304µs] or [+3.116%; +3.957%]

scenario:PDOBench/benchPDOOverheadWithDBM

  • 🟥 execution_time [+9.622µs; +12.453µs] or [+3.687%; +4.772%]

scenario:SpanBench/benchDatadogAPI

  • 🟥 execution_time [+2.027µs; +3.274µs] or [+2.942%; +4.751%]

scenario:SpanBench/benchOpenTelemetryAPI

  • 🟥 execution_time [+21.401µs; +27.465µs] or [+3.014%; +3.869%]
  • 🟥 mem_peak [+4.487MB; +4.487MB] or [+9.412%; +9.412%]

scenario:SpanBench/benchOpenTelemetryAPI-opcache

  • 🟥 mem_peak [+4.485MB; +4.485MB] or [+10.030%; +10.030%]

scenario:SpanBench/benchOpenTelemetryInteroperability

  • 🟥 mem_peak [+643.609KB; +643.614KB] or [+2.233%; +2.233%]

scenario:SpanBench/benchOpenTelemetryInteroperability-opcache

  • 🟥 mem_peak [+641.567KB; +641.576KB] or [+2.478%; +2.478%]

@pr-commenter

pr-commenter Bot commented Jun 23, 2026

Copy link
Copy Markdown

Benchmarks [ appsec ]

Benchmark execution time: 2026-06-30 21:38:45

Comparing candidate commit 110757e in PR branch levi/otel-thread-context with baseline commit e1d0dac in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics.

Explanation

This is an A/B test comparing a candidate commit's performance against that of a baseline commit. Performance changes are noted in the tables below as:

  • 🟩 = significantly better candidate vs. baseline
  • 🟥 = significantly worse candidate vs. baseline

We compute a confidence interval (CI) over the relative difference of means between metrics from the candidate and baseline commits, considering the baseline as the reference.

If the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD), the change is considered significant.

Feel free to reach out to #apm-benchmarking-platform on Slack if you have any questions.

More details about the CI and significant changes

You can imagine this CI as a range of values that is likely to contain the true difference of means between the candidate and baseline commits.

CIs of the difference of means are often centered around 0%, because often changes are not that big:

---------------------------------(------|---^--------)-------------------------------->
                              -0.6%    0%  0.3%     +1.2%
                                 |          |        |
         lower bound of the CI --'          |        |
sample mean (center of the CI) -------------'        |
         upper bound of the CI ----------------------'

As described above, a change is considered significant if the CI is entirely outside the configured SIGNIFICANT_IMPACT_THRESHOLD (or the deprecated UNCONFIDENCE_THRESHOLD).

For instance, for an execution time metric, this confidence interval indicates a significantly worse performance:

----------------------------------------|---------|---(---------^---------)---------->
                                       0%        1%  1.3%      2.2%      3.1%
                                                  |   |         |         |
       significant impact threshold --------------'   |         |         |
                      lower bound of CI --------------'         |         |
       sample mean (center of the CI) --------------------------'         |
                      upper bound of CI ----------------------------------'

@cataphract cataphract marked this pull request as ready for review June 24, 2026 12:56
@cataphract cataphract requested review from a team as code owners June 24, 2026 12:56

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 259658d91b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tracer/span.c Outdated
span->root = DDTRACE_G(active_stack)->root_span;

ddtrace_set_global_span_properties(span);
ddtrace_update_otel_thread_context();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Refresh OTel context after trace ID changes

Updating the TLS record only on span open/close/switch leaves it stale when an existing root span's trace ID is replaced later, e.g. by DDTrace\consume_distributed_tracing_headers() or DDTrace\set_distributed_tracing_context(), which mutate root_span->trace_id and only call ddtrace_update_root_id_properties(). In requests that consume distributed headers after the autoroot span already exists and do not open another span, OTel readers keep seeing the originally generated trace ID until the next span transition.

Useful? React with 👍 / 👎.

Comment thread tracer/user_request.c Outdated
listener->finish_user_req(listener, &span->std);
}
#ifdef __linux__
ddtrace_clear_otel_thread_context_root_span();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Only clear the matching user-request override

This clears the global OTel root override regardless of which user-request span is finishing. If a second notify_start() begins before the first span is closed, the AppSec listener finishes/replaces the previous request but the first span still has notify_user_req_end; closing that older span later calls this path and detaches the override for the newer active user request, so subsequent OTel/profiling context falls back to the outer stack.

Useful? React with 👍 / 👎.

@bwoebi bwoebi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach proposed by this PR is flawed / more expensive than needed.

The PR tries to synchronize state between otel context and root span.
The otel context is, at it's core, a pointer to some struct.

All we need is making the otel context part of the root_span_data struct.
Any updates to the trace_id / the roots span id -> update the local root span id / trace id part of the context, any updates to active -> update the span_id of the context.
And if there's no root span data, we just write NULL to the TLS variable.

I'm very not fond of "synchronizing" with otel context, any updates to the otel context should in my opinion simply happen in-place. There's also no reason for a "root span override".
It feels also way heavier than it should be. Updating a span id due to open/close/drop should be one 64 bit write. That's it.
Switching the span stack should cause one pointer write of the span stacks root span otel context to the TLS pointer. That's it, too.

Further, this PR at the very least forgot to handle ddtrace_update_root_id_properties().

@bwoebi

bwoebi commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

I also don't understand why this functionality depends on libdatadog. The code should be IMHO simpler and less LoC if we just manage this manually.

Comment thread tracer/otel_context.h Outdated
Comment thread tracer/span.h Outdated
Comment thread tracer/otel_context.c
morrisonlevi and others added 3 commits June 25, 2026 15:20
The OTel thread-context record is now owned and maintained by the PHP
tracer itself rather than going through libdatadog's OTel thread-context
record/update helpers. The tracer keeps its own local record and drives
all updates through granular helpers.

Storage and layout:
- Store the record in ddtrace_root_span_data, placed before the embedded
  ddtrace_span_data, because of requirements of Zend object layout.
-  Move the record to sit after sampling_rule to reduce struct padding in
  the root prefix.
- Make the record naturally 8-byte aligned: trace_id is stored as
  uint64_t[2], while span_id and valid are C11 atomics. Compile-time
  assertions guard record size, field offsets, and root placement.

Update paths:
- Drive context updates through granular helpers: stack attach, trace-id
  update, active span-id update, root-scoped attributes, and detach.
- Keep root-specific detach-if-current behavior: freeing a root span
  clears the TLS pointer only when it currently points at that root's
  embedded record. Stack, fiber, close, drop, trace-id, and attribute
  mutation paths call the granular API directly.

Tests and dependencies:
- Update the affected appsec integration tests.
- Restore the libdatadog gitlink and Cargo.lock to the merge-base /
  origin-master state, since this implementation no longer depends on the
  upstream libdatadog OTel thread-context record/update APIs.
@cataphract cataphract force-pushed the levi/otel-thread-context branch from ce8e5c8 to 158be36 Compare June 25, 2026 14:20
Comment thread tracer/otel_context.c Outdated
@cataphract

Copy link
Copy Markdown
Contributor

The failure in dd-gitlab/check-big-regressions is a direct consequence of this change (7 root spans x 1000 revs x 640 bytes = 4 480 000 bytes = 4.49 MB). This is by design

@cataphract cataphract requested a review from bwoebi June 29, 2026 09:15
Comment thread profiling/src/php_ffi.c Outdated
Comment thread profiling/src/php_ffi.c Outdated
Comment thread tracer/ddtrace.c
Comment on lines +153 to +156
ddtrace_span_stack *stack = DDTRACE_G(active_stack);
if (stack && stack->root_span) {
ddtrace_otel_update_attribute_values(stack->root_span);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract that snippet into dd_alter_prop, it's only used for service/env/version anyway.

Comment thread ext/datadog.c Outdated
}

// Match tracer/serializer.c hostname publishing: DD_HOSTNAME wins, then gethostname().
#ifdef __linux__

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to put this #ifdef to the top of the function? There's no point in half the function.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea was to call gethostname only on linux, but i've now matched what's tracer/serializer.c, let's hope it doesn't blow up on windows where the function is apparently slightly different and depends on a separate header

@bwoebi

bwoebi commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

A call to datadog_publish_otel_process_context() is missing in datadog_internal_handle_fork().

Comment thread compile_rust.sh Outdated
@cataphract cataphract force-pushed the levi/otel-thread-context branch from 139376b to 8b1c9f4 Compare June 29, 2026 15:42
@cataphract cataphract force-pushed the levi/otel-thread-context branch from 5b6fa67 to 7b3e832 Compare June 29, 2026 15:51
@cataphract

Copy link
Copy Markdown
Contributor

Dep on libdatadog: DataDog/libdatadog#2176

Comment thread ext/datadog.c
Comment on lines +742 to +746
bool runtime_id_changed = false;
datadog_sidecar_handle_fork(&runtime_id_changed);
if (runtime_id_changed) {
datadog_publish_configured_otel_process_context();
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to do it unconditionally, the libdatadog code uses libc::MFD_CLOEXEC.

@bwoebi bwoebi Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to make it dependent on datadog_sidecar_instance_id (not sure why you would), please check this in datadog_publish_configured_otel_process_context, not here.

Comment thread ext/sidecar.c
}

*runtime_id_changed = datadog_sidecar_instance_id != NULL;
datadog_force_new_instance_id();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to move this to the start of the function so that it happens unconditionally.

@yannham

yannham commented Jun 30, 2026

Copy link
Copy Markdown

FYI: there's a bug in the libdatadog implementation of publishing the threadlocal attributes to the process context, where thread-context-related attributes are stored in the wrong place. DataDog/libdatadog#2167 fixes it and is being merged, but until you can depend on a version that includes it, an external context reader might not be able to properly resolve the thread context records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☠️ do-not-merge/WIP profiling Relates to the Continuous Profiler tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants