-
Notifications
You must be signed in to change notification settings - Fork 118
GRPC Client - TraceId mixup #2316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If I understand correctly, the symptom of this is having multiple separate requests end up with the same trace ID? If that is the case, I'm seeing the same behaviour on my tests with auto instrumentation |
@ananthvanchip can you verify this is the correct? |
Thanks for opening this issue. The |
Yeah it's definitely a problem and also same issue exists in how ebpf-autoinstrumentation does it, although it's slightly different implementation. I always thought streamid is unique. |
yes.. this is correct.
I can send you a PR on the code changes. While the executeAndPut is indeed called multiple times, it is not called from the same goroutine. So it can be skipped quite easily. Retrieving the nextId in the return might not be ideal as we cannot guarantee that the loopyWriter will be called always after the return of the call as its in another go-routine. |
That would be great! Thank you! |
@ananthvanchip hi, just checking if you were still working on this or had any questions? Thanks! |
Describe the bug
Corruption in the traceId that is being sent in the headers while making grpc calls. The issue is with the grpc-client-instrumentation.
Environment
To Reproduce
Expected behavior
TraceId should not be corrupted/exchanged b/w different requests.
Additional context
There are two issues in the way grpc-client is being impemented.
Issue 1
opentelemetry-go-instrumentation/internal/pkg/instrumentation/bpf/google.golang.org/grpc/client/bpf/probe.bpf.c
Line 220 in 3ca6544
The nextId is used as the bridge b/w the go-routine of the request and the go-routine of the loopyWriter. The nextId is a merely an integer and it increases sequentially for every request. (Its odd numbers starting from 1). However, when multiple http2client are created (to different endpoints) they will all generate nextId from 1. So in a concurrent scenario, we will have different clients(or connections) which can have the same nextId and the association could get jumbled when the requests are being made. This will cause the wrong traceId to be sent to the wrong endpoint randomly and will create confusions.
Issue 2
We are adding a uprobe at the entry of
func (t *http2Client) NewStream(ctx context.Context, callHdr *CallHdr) (*ClientStream, error)
In that uprobe we are reading the nextId. The nextId is a global variable within the http2Client object. When two go-routines using the same connection/client make a request, they will create a new stream. For both of them, in the uprobe we will assign the same nextId. However, internally they will be assigned different nextId as the nextId assignment within the function is covered by a mutex lock.
https://github.com/grpc/grpc-go/blob/b89909b7bd0d9bd333aab291e90fec1fa8d45ce9/internal/transport/http2_client.go#L832
This again will cause different traceIds to be sent to different requests which will lead to eventual corruption.
Solution.
To solve both the issues, we can stay away from using nextId as the bridge b/w the requesting go-routine and the go-routine within loopyWriter. Instead, we can add a uprobe on
https://github.com/grpc/grpc-go/blob/09166b665e8b6442bc70e88cdef580da9e7b7b06/internal/transport/controlbuf.go#L352
The second parameter is a headerFrame which is available in the loopyWriter.handleHeaders method. The pointer to headerFrame can be the bridge. This will help us get rid of the entire nextId/streamId and the issues arising from that.
I am not a go/ebpf expert. Let me know what you think.. I can provide a patch on this if necessary.
The text was updated successfully, but these errors were encountered: