Skip to content

Conversation

3mbe
Copy link

@3mbe 3mbe commented Sep 17, 2025

What this PR does / why we need it:

Fixes a data race in proxy.DialContext by creating a new SPDY transport and upgrader on each dial. This avoids shared state across concurrent dials and eliminates -race failures. While this adds a bit of overhead, the correctness and stability gains are well worth it.

How I validated this change:

  • Ran infrastructure tests with the race detector:

    make test-infrastructure TEST_ARGS='-race -count=1 -shuffle=on -v'  
  • All tests passed with no race warnings.

  • Attached the test log as proof: infra-race.log

Which issue(s) this PR fixes:
Fixes #12767

/area testing
/area provider/infrastructure-in-memory
/kind bug

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area PR is missing an area label labels Sep 17, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @3mbe!

It looks like this is your first PR to kubernetes-sigs/cluster-api 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 17, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @3mbe. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@3mbe
Copy link
Author

3mbe commented Sep 17, 2025

Output of tests with -race showing the data race in proxy.DialContext:

==================
WARNING: DATA RACE
Write at 0x00c000b9d988 by goroutine 266:
  k8s.io/apimachinery/pkg/util/httpstream/spdy.(*SpdyRoundTripper).RoundTrip()
      /home/marcos/go/pkg/mod/k8s.io/apimachinery@v0.34.1/pkg/util/httpstream/spdy/roundtripper.go:356 +0x7a4
  k8s.io/client-go/transport.(*basicAuthRoundTripper).RoundTrip()
      /home/marcos/go/pkg/mod/k8s.io/client-go@v0.34.1/transport/round_trippers.go:203 +0x401
  net/http.send()
      /home/marcos/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.linux-amd64/src/net/http/client.go:259 +0x8ca
  net/http.(*Client).send()
      /home/marcos/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.linux-amd64/src/net/http/client.go:180 +0x14c
  net/http.(*Client).do()
      /home/marcos/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.linux-amd64/src/net/http/client.go:728 +0x1338
  net/http.(*Client).Do()
      /home/marcos/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.linux-amd64/src/net/http/client.go:587 +0x26a
  k8s.io/client-go/transport/spdy.Negotiate()
      /home/marcos/go/pkg/mod/k8s.io/client-go@v0.34.1/transport/spdy/spdy.go:97 +0x255
  k8s.io/client-go/transport/spdy.(*dialer).Dial()
      /home/marcos/go/pkg/mod/k8s.io/client-go@v0.34.1/transport/spdy/spdy.go:87 +0x1ea
  sigs.k8s.io/cluster-api/test/infrastructure/inmemory/pkg/server/proxy.(*Dialer).DialContext()
      /home/marcos/code/cluster-api/test/infrastructure/inmemory/pkg/server/proxy/dial.go:100 +0x384
  sigs.k8s.io/cluster-api/test/infrastructure/inmemory/pkg/server/proxy.(*Dialer).DialContextWithAddr()
      /home/marcos/code/cluster-api/test/infrastructure/inmemory/pkg/server/proxy/dial.go:82 +0x7b
  sigs.k8s.io/cluster-api/test/infrastructure/inmemory/pkg/server/proxy.(*Dialer).DialContextWithAddr-fm()
      <autogenerated>:1 +0x1f
  google.golang.org/grpc/internal/transport.dial()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/internal/transport/http2_client.go:176 +0x302
  google.golang.org/grpc/internal/transport.NewHTTP2Client()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/internal/transport/http2_client.go:221 +0x1c4
  google.golang.org/grpc.(*addrConn).createTransport()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/clientconn.go:1398 +0x475
  google.golang.org/grpc.(*addrConn).tryAllAddrs()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/clientconn.go:1345 +0x669
  google.golang.org/grpc.(*addrConn).resetTransportAndUnlock()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/clientconn.go:1277 +0x23c
  google.golang.org/grpc.(*addrConn).connect()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/clientconn.go:933 +0x224
  google.golang.org/grpc.(*acBalancerWrapper).Connect.gowrap1()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer_wrapper.go:354 +0x33

Previous write at 0x00c000b9d988 by goroutine 268:
  k8s.io/apimachinery/pkg/util/httpstream/spdy.(*SpdyRoundTripper).RoundTrip()
      /home/marcos/go/pkg/mod/k8s.io/apimachinery@v0.34.1/pkg/util/httpstream/spdy/roundtripper.go:356 +0x7a4
  k8s.io/client-go/transport.(*basicAuthRoundTripper).RoundTrip()
      /home/marcos/go/pkg/mod/k8s.io/client-go@v0.34.1/transport/round_trippers.go:203 +0x401
  net/http.send()
      /home/marcos/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.linux-amd64/src/net/http/client.go:259 +0x8ca
  net/http.(*Client).send()
      /home/marcos/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.linux-amd64/src/net/http/client.go:180 +0x14c
  net/http.(*Client).do()
      /home/marcos/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.linux-amd64/src/net/http/client.go:728 +0x1338
  net/http.(*Client).Do()
      /home/marcos/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.24.7.linux-amd64/src/net/http/client.go:587 +0x26a
  k8s.io/client-go/transport/spdy.Negotiate()
      /home/marcos/go/pkg/mod/k8s.io/client-go@v0.34.1/transport/spdy/spdy.go:97 +0x255
  k8s.io/client-go/transport/spdy.(*dialer).Dial()
      /home/marcos/go/pkg/mod/k8s.io/client-go@v0.34.1/transport/spdy/spdy.go:87 +0x1ea
  sigs.k8s.io/cluster-api/test/infrastructure/inmemory/pkg/server/proxy.(*Dialer).DialContext()
      /home/marcos/code/cluster-api/test/infrastructure/inmemory/pkg/server/proxy/dial.go:100 +0x384
  sigs.k8s.io/cluster-api/test/infrastructure/inmemory/pkg/server/proxy.(*Dialer).DialContextWithAddr()
      /home/marcos/code/cluster-api/test/infrastructure/inmemory/pkg/server/proxy/dial.go:82 +0x7b
  sigs.k8s.io/cluster-api/test/infrastructure/inmemory/pkg/server/proxy.(*Dialer).DialContextWithAddr-fm()
      <autogenerated>:1 +0x1f
  google.golang.org/grpc/internal/transport.dial()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/internal/transport/http2_client.go:176 +0x302
  google.golang.org/grpc/internal/transport.NewHTTP2Client()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/internal/transport/http2_client.go:221 +0x1c4
  google.golang.org/grpc.(*addrConn).createTransport()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/clientconn.go:1398 +0x475
  google.golang.org/grpc.(*addrConn).tryAllAddrs()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/clientconn.go:1345 +0x669
  google.golang.org/grpc.(*addrConn).resetTransportAndUnlock()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/clientconn.go:1277 +0x23c
  google.golang.org/grpc.(*addrConn).connect()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/clientconn.go:933 +0x224
  google.golang.org/grpc.(*acBalancerWrapper).Connect.gowrap1()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer_wrapper.go:354 +0x33

Goroutine 266 (running) created at:
  google.golang.org/grpc.(*acBalancerWrapper).Connect()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer_wrapper.go:354 +0xa8
  google.golang.org/grpc/balancer/pickfirst/pickfirstleaf.(*pickfirstBalancer).requestConnectionLocked()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go:543 +0x604
  google.golang.org/grpc/balancer/pickfirst/pickfirstleaf.(*pickfirstBalancer).startFirstPassLocked()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go:386 +0x3a4
  google.golang.org/grpc/balancer/pickfirst/pickfirstleaf.(*pickfirstBalancer).UpdateClientConnState()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go:345 +0xfb3
  google.golang.org/grpc/internal/balancer/gracefulswitch.(*Balancer).UpdateClientConnState()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/internal/balancer/gracefulswitch/gracefulswitch.go:194 +0x2b4
  google.golang.org/grpc.(*ccBalancerWrapper).updateClientConnState.func1()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer_wrapper.go:124 +0x3c8
  google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/internal/grpcsync/callback_serializer.go:94 +0x265
  google.golang.org/grpc/internal/grpcsync.NewCallbackSerializer.gowrap1()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/internal/grpcsync/callback_serializer.go:52 +0x4f

Goroutine 268 (running) created at:
  google.golang.org/grpc.(*acBalancerWrapper).Connect()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer_wrapper.go:354 +0xa8
  google.golang.org/grpc/balancer/pickfirst/pickfirstleaf.(*pickfirstBalancer).requestConnectionLocked()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go:543 +0x604
  google.golang.org/grpc/balancer/pickfirst/pickfirstleaf.(*pickfirstBalancer).startFirstPassLocked()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go:386 +0x3a4
  google.golang.org/grpc/balancer/pickfirst/pickfirstleaf.(*pickfirstBalancer).UpdateClientConnState()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer/pickfirst/pickfirstleaf/pickfirstleaf.go:345 +0xfb3
  google.golang.org/grpc/balancer/endpointsharding.(*balancerWrapper).updateClientConnStateLocked()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer/endpointsharding/endpointsharding.go:344 +0xbfa
  google.golang.org/grpc/balancer/endpointsharding.(*endpointSharding).UpdateClientConnState()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer/endpointsharding/endpointsharding.go:150 +0xa95
  google.golang.org/grpc/balancer/roundrobin.(*rrBalancer).UpdateClientConnState()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer/roundrobin/roundrobin.go:67 +0x194
  google.golang.org/grpc/internal/balancer/gracefulswitch.(*Balancer).UpdateClientConnState()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/internal/balancer/gracefulswitch/gracefulswitch.go:194 +0x2b4
  google.golang.org/grpc.(*ccBalancerWrapper).updateClientConnState.func1()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/balancer_wrapper.go:124 +0x3c8
  google.golang.org/grpc/internal/grpcsync.(*CallbackSerializer).run()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/internal/grpcsync/callback_serializer.go:94 +0x265
  google.golang.org/grpc/internal/grpcsync.NewCallbackSerializer.gowrap1()
      /home/marcos/go/pkg/mod/google.golang.org/grpc@v1.72.3/internal/grpcsync/callback_serializer.go:52 +0x4f
==================

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 18, 2025
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR does a lot of unecesary refactors in the Go code like renaming variables and changing comments. even if the old names are comments are not clear in some way best to leave them be the way they are and only add a minimal diff showcasing the bug fix.

i suggest you backup your current work and start from scratch pushing only the minimal diff.

@3mbe 3mbe force-pushed the github-issue-12767 branch from ef89b88 to 86df4a7 Compare September 18, 2025 10:36
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 18, 2025
@3mbe
Copy link
Author

3mbe commented Sep 18, 2025

Hey @chrischdi @neolit123

Thank you both for your patience and for taking the time to mentor me. I’ve reverted the code back to its original state, keeping only the minimal diff needed for the fix. Please let me know if there’s anything else I can do to help!

test.log

@3mbe 3mbe requested review from neolit123 and chrischdi September 18, 2025 10:51
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the update. this is much cleaner.

please keep the commits squashed to 1.

@3mbe 3mbe force-pushed the github-issue-12767 branch from 86df4a7 to ebe3256 Compare September 18, 2025 14:23
@3mbe 3mbe requested a review from neolit123 September 18, 2025 14:30
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/area test
/lgtm
/assign @chrischdi

@k8s-ci-robot
Copy link
Contributor

@neolit123: The label(s) area/test cannot be applied, because the repository doesn't have them.

In response to this:

/area test
/lgtm
/assign @chrischdi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 18, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 943734d0e5c41772c11480e6c7f046dadb95f57a

@chrischdi chrischdi added the area/provider/infrastructure-in-memory Issues or PRs related to the in-memory infrastructure provider label Sep 19, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-area PR is missing an area label label Sep 19, 2025
Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nit from my side.

@sbueringer
Copy link
Member

/assign
Also want to take a look

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm pending Christian's point

@3mbe 3mbe force-pushed the github-issue-12767 branch from ebe3256 to 9633304 Compare September 24, 2025 01:18
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 24, 2025
@k8s-ci-robot k8s-ci-robot removed the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Sep 24, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from chrischdi. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 24, 2025
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor finding

@3mbe 3mbe force-pushed the github-issue-12767 branch from 9633304 to 71b1313 Compare September 24, 2025 21:30
@sbueringer
Copy link
Member

/lgtm

/assign @chrischdi
I already diffed it, but maybe you can double check that the test-infrastructure targets are ~ in sync with the test targets now

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 25, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 44a194e3f92aece97ee738bf8b4d113577f4ad6e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/infrastructure-in-memory Issues or PRs related to the in-memory infrastructure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable -race for test-infrastructure unit tests
5 participants