Skip to content

Commit 2c414dc

Browse files
Add troubleshooting guides for SDK and Collector sampling configuration (#3200)
This PR adds two new troubleshooting pages under the EDOT documentation: * "Missing or incomplete traces due to SDK sampling": Helps users identify and resolve trace loss caused by head sampling settings in Elastic's OpenTelemetry SDKs. * "Missing or incomplete traces due to Collector sampling": Focuses on tail sampling configuration issues in the EDOT Collector. Both pages are linked to each other.
1 parent 213bc9f commit 2c414dc

File tree

3 files changed

+169
-0
lines changed

3 files changed

+169
-0
lines changed
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
navigation_title: Collector sampling issues
3+
description: Learn how to troubleshoot missing or incomplete traces in the EDOT Collector caused by sampling configuration.
4+
applies_to:
5+
serverless: all
6+
product:
7+
edot_collector: ga
8+
products:
9+
- id: observability
10+
- id: edot-collector
11+
---
12+
13+
# Missing or incomplete traces due to Collector sampling
14+
15+
If traces or spans are missing in {{kib}}, the issue might be related to the Collector’s sampling configuration.
16+
17+
{applies_to}`stack: ga 9.2` Tail-based sampling (TBS) allows the Collector to evaluate entire traces before deciding whether to keep them. If TBS policies are too strict or not aligned with your workloads, traces you expect to see may be dropped.
18+
19+
Both Collector-based and SDK-level sampling can lead to gaps in telemetry if not configured correctly. See [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md) for more information.
20+
21+
## Symptoms
22+
23+
When Collector-based tail sampling is misconfigured or too restrictive, you might observe the following:
24+
25+
- Only a small subset of traces reaches {{es}}/{{kib}}, even though SDKs are exporting spans.
26+
- Error traces are missing because they’re not explicitly included in the `sampling_policy`.
27+
- Collector logs show dropped spans.
28+
29+
## Causes
30+
31+
The following conditions can lead to missing or incomplete traces when using tail-based sampling in the Collector:
32+
33+
- Tail sampling policies in the Collector are too narrow or restrictive.
34+
- The default rule set excludes key transaction types (for example long-running requests, non-error transactions).
35+
- Differences between head sampling (SDK) and tail sampling (Collector) can lead to fewer traces being available for evaluation.
36+
- Conflicting or overlapping `sampling_policy` rules might result in unexpected drops.
37+
- High load: the Collector might drop traces if it can’t evaluate policies fast enough.
38+
39+
## Resolution
40+
41+
Follow these steps to resolve sampling configuration issues:
42+
43+
::::{stepper}
44+
45+
:::{step} Review `sampling_policy` configuration
46+
47+
- Check the `processor/tailsampling` section of your Collector configuration
48+
- Ensure policies are broad enough to capture the traces you need
49+
:::
50+
51+
:::{step} Add explicit rules for critical traces
52+
53+
- Create specific rules for important trace types
54+
- Example: keep all error traces, 100% of login requests, and 10% of everything else
55+
- Use attributes like `status_code`, `operation`, or `service.name` to fine-tune rules
56+
:::
57+
58+
:::{step} Validate Collector logs
59+
60+
- Review Collector logs for messages about dropped spans, and determine whether drops are due to sampling policy outcomes or resource limits
61+
:::
62+
63+
:::{step} Differentiate head and tail sampling
64+
65+
- Review if SDKs already applies head sampling, which reduces traces available for tail sampling in the Collector
66+
- Consider setting SDKs to `always_on` and managing sampling centrally in the Collector for more flexibility
67+
:::
68+
69+
:::{step} Test in staging
70+
71+
- Adjust sampling policies incrementally in a staging environment
72+
- Monitor trace volume before and after changes
73+
- Validate that critical traces are captured as expected
74+
:::
75+
76+
::::
77+
78+
## Resources
79+
80+
- [Tail sampling processor (Collector)](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor)
81+
- [OpenTelemetry sampling concepts - contrib documentation](https://opentelemetry.io/docs/concepts/sampling/)
82+
- [Missing or incomplete traces due to SDK sampling](../edot-sdks/misconfigured-sampling-sdk.md)
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
navigation_title: SDK sampling issues
3+
description: Learn how to troubleshoot missing or incomplete traces in EDOT SDKs caused by head sampling configuration.
4+
applies_to:
5+
serverless: all
6+
product:
7+
elastic-otel-sdk: ga
8+
products:
9+
- id: observability
10+
- id: edot-sdk
11+
---
12+
13+
# Missing or incomplete traces due to SDK sampling
14+
15+
If traces or spans are missing in Kibana, the issue might be related to SDK-level sampling configuration. By default, SDKs use head-based sampling, meaning the decision to record or drop a trace is made when the trace is first created.
16+
17+
Both SDK-level and Collector-based sampling can result in gaps in telemetry if misconfigured. Refer to [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md) for more details.
18+
19+
## Symptoms
20+
21+
You might notice one or more of the following behaviors when SDK-level sampling is impacting your traces:
22+
23+
- Only a small subset of traces reaches {{es}} or {{kib}}, even though SDKs are exporting spans.
24+
- Transactions look incomplete because some spans are missing.
25+
- Trace volume is unexpectedly low compared to logs or metrics.
26+
27+
## Causes
28+
29+
These factors can result in missing spans or traces when sampling is configured at the SDK level:
30+
31+
- Head sampling at the SDK level drops traces before they're exported.
32+
- Default sampling rates (for example `1/100` or `1/1000`) might be too low for your workload.
33+
- Environment variables like `OTEL_TRACES_SAMPLER` or `OTEL_TRACES_SAMPLER_ARG` are not set, not recognized, or formatted in a way the SDK doesn't support.
34+
- Inconsistent configuration across services can lead to fragmented or incomplete traces.
35+
- Some SDKs enforce stricter formats for sampler arguments, which can cause values to be ignored if not matched precisely.
36+
37+
## Resolution
38+
39+
Follow these steps to resolve SDK sampling configuration issues:
40+
41+
::::{stepper}
42+
43+
:::{step} Check SDK environment variables
44+
45+
- Confirm that `OTEL_TRACES_SAMPLER` and `OTEL_TRACES_SAMPLER_ARG` are set correctly.
46+
- For testing, you can temporarily set:
47+
48+
```bash
49+
export OTEL_TRACES_SAMPLER=always_on
50+
```
51+
- In production, consider using `parentbased_traceidratio` with an explicit ratio.
52+
:::
53+
54+
:::{step} Align configuration across services
55+
56+
- Use consistent sampling configuration across all instrumented services to help avoid dropped child spans or fragmented traces.
57+
:::
58+
59+
:::{step} Adjust sampling ratios for your traffic
60+
61+
- For low-traffic applications, avoid extremely low ratios (such as `1/1000`).
62+
63+
For example, the following configuration samples ~20% of traces:
64+
65+
```bash
66+
export OTEL_TRACES_SAMPLER=parentbased_traceidratio
67+
export OTEL_TRACES_SAMPLER_ARG=0.2
68+
```
69+
:::
70+
71+
:::{step} Use Collector tail sampling for advanced scenarios
72+
73+
- Head sampling can't evaluate the full trace context before making a decision.
74+
- For more control (for example "keep all errors, sample 10% of successes"), use Collector tail sampling.
75+
76+
For more information, refer to [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md).
77+
:::
78+
79+
::::
80+
81+
## Resources
82+
83+
- [OTEL_TRACES_SAMPLER environment variable specifications](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#otel_traces_sampler)
84+
- [OpenTelemetry sampling concepts - contrib documentation](https://opentelemetry.io/docs/concepts/sampling/)
85+
- [Missing or incomplete traces due to Collector sampling](../edot-collector/misconfigured-sampling-collector.md)

troubleshoot/ingest/opentelemetry/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ toc:
99
- file: edot-collector/metadata.md
1010
- file: edot-collector/enable-debug-logging.md
1111
- file: edot-collector/collector-not-starting.md
12+
- file: edot-collector/misconfigured-sampling-collector.md
1213
- file: edot-sdks/index.md
1314
children:
1415
- file: edot-sdks/android/index.md
@@ -23,5 +24,6 @@ toc:
2324
- file: edot-sdks/enable-debug-logging.md
2425
- file: edot-sdks/missing-app-telemetry.md
2526
- file: edot-sdks/proxy.md
27+
- file: edot-sdks/misconfigured-sampling-sdk.md
2628
- file: no-data-in-kibana.md
2729
- file: contact-support.md

0 commit comments

Comments
 (0)