Skip to content

credentials: implement file-based JWT Call Credentials (part 1 for A97) #8431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

dimpavloff
Copy link

@dimpavloff dimpavloff commented Jul 7, 2025

Part one for grpc/proposal#492 (A97).
This is done in a new credentials/jwt package to provide file-based PerRPCCallCredentials. It can be used beyond XDS. The package handles token reloading, caching, and validation as per A97 .

There will be a separate PR which uses it in xds/bootstrap.

Whilst implementing the above, I considered credentials/oauth and credentials/xds packages instead of creating a new one. The former package has NewJWTAccessFromKey and jwtAccess which seem very relevant at first. However, I think the jwtAccess behaviour seems more tailored towards Google services. Also, the refresh, caching, and error behaviour for A97 is quite different than what's already there and therefore a separate implementation would have still made sense.
WRT credentials/xds, it could have been extended to both handle transport and call credentials. However, this is a bit at odds with A97 which says that the implementation should be non-XDS specific and, from reading between the lines, usable beyond XDS.
I think the current approach makes review easier but because of the similarities with the other two packages, it is a bit confusing to navigate. Please let me know whether the structure should change.

Relates to istio/istio#53532

RELEASE NOTES:

  • credentials: add credentials/jwt package providing file-based JWT PerRPCCredentials (A97)

Copy link

codecov bot commented Jul 7, 2025

Codecov Report

❌ Patch coverage is 97.63780% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.04%. Comparing base (dd718e4) to head (12fedd5).
⚠️ Report is 51 commits behind head on master.

Files with missing lines Patch % Lines
credentials/jwt/jwt_token_file_call_creds.go 96.29% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8431      +/-   ##
==========================================
- Coverage   82.27%   82.04%   -0.24%     
==========================================
  Files         414      414              
  Lines       40424    40647     +223     
==========================================
+ Hits        33259    33347      +88     
- Misses       5795     5910     +115     
- Partials     1370     1390      +20     
Files with missing lines Coverage Δ
credentials/jwt/jwt_file_reader.go 100.00% <100.00%> (ø)
credentials/jwt/jwt_token_file_call_creds.go 96.29% <96.29%> (ø)

... and 254 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dimpavloff dimpavloff changed the title xds: implement file-based JWT authentication (A97) xds: implement file-based JWT Call Credentials (A97) Jul 7, 2025
@dimpavloff
Copy link
Author

@dfawley hey 👋 Given you approved A97, would you mind having a cursory look at the PR to confirm if at least at a high level the approach looks good?

@eshitachandwani
Copy link
Member

I will take a look at this , I need to go through the gRFC first.

@dfawley dfawley self-assigned this Jul 22, 2025
@dfawley dfawley requested review from easwars and eshitachandwani and removed request for dfawley July 25, 2025 20:39
@dfawley dfawley assigned easwars and unassigned dfawley Jul 25, 2025
@dfawley
Copy link
Member

dfawley commented Jul 25, 2025

Sorry for the delay here.

@easwars would you be able to review this change? I think you have more background into some of the things than I do, like the bootstrap integration. Thank you!

@easwars
Copy link
Contributor

easwars commented Jul 28, 2025

Thank you for your contribution @dimpavloff. Yes, it would be nice if you can split this into smaller PRs. I will continue to use this PR to review the JWT call credentials implementation. If you can move the xDS implementation out to one or more PRs, I would greatly appreciate that and would be happy to review them as well.

AuthInfo: &testAuthInfo{secLevel: credentials.PrivacyAndIntegrity},
})

// First call should read from file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere in this PR, please use appropriate punctuations in comment sentences. See: https://google.github.io/styleguide/go/decisions#comment-sentences

t.Fatalf("NewTokenFileCallCredentials() failed: %v", err)
}

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define a const for default test timeout and use it in all tests that require a timeout. We use this pattern a lot in our tests. Please search for defaultTestTimeout and defaultTestShortTimeout for commonly used patterns.


// Verify cached expiration is 30 seconds before actual token expiration
impl := creds.(*jwtTokenFileCallCreds)
impl.mu.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only test the API surface. Relying on implementation internals in tests makes them brittle and would result in test changes when any changes to implementation is made.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are referring to using a private field rather than obtaining mu specifically.
In general I agree -- white box tests may get fragile and break during a refactor. However, this test and the next couple of ones are about the caching behaviour -- it is meant to be transparent to the external API. If I don't make assertions about the private fields, the tests may pass trivially and become more flaky (e.g. when testing the backoff in the next test).
One alternative could be factoring out these behaviours out into a separate private struct with "public" functions which expose the same information. Given that it would require shifting the majority of the implementation into that struct, I'm not sure it is an improvement from the current approach.
Please do let me know your thoughts and if you have other suggestions.

@easwars easwars assigned dimpavloff and unassigned easwars Jul 28, 2025
@easwars easwars added Type: Feature New features or improvements in behavior Area: Auth Includes regular credentials API and implementation. Also includes advancedtls, authz, rbac etc. labels Aug 11, 2025
@easwars easwars added this to the 1.76 Release milestone Aug 11, 2025
@easwars
Copy link
Contributor

easwars commented Aug 11, 2025

Apologies for the delay on this. Have been busy with some other stuff. Will definitely try to make another pass today.

FYI, we recommend authors to not mark comments as resolved. Instead, we recommend the reviewer to do that once the comment has been satisfactorily addressed, because now I have to unresolve every comment to see what I commented and then verify that the comment has been satisfactorily addressed. Thanks for understanding.

@eshitachandwani eshitachandwani removed their request for review August 12, 2025 05:12
@eshitachandwani eshitachandwani removed their assignment Aug 12, 2025
Comment on lines 38 to 40
// - Errors in reading tokens or parsing JWTs will result in RPC UNAVAILALBE or
// UNAUTHENTICATED errors
// - These errors are cached and retried with exponential backoff.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It looks like these two bullet points can/should be merged?

}

// isTokenValid checks if the cached token is still valid.
// Caller must hold c.mu.RLock().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I personally haven't considered it so far, because I tend to avoid RWLocks after being bitten by them once (in a bad way). But, I'm not opposed to it and it does sound more explicit when using RWLocks.

// invalid or expired, the next RPC will handle synchronous refresh instead.
// Caller must hold c.mu.RLock().
func (c *jwtTokenFileCallCreds) needsPreemptiveRefreshLocked() bool {
return c.isTokenValidLocked() && time.Until(c.cachedExpiration) < time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a const for the one minute pre-emptive refresh trigger?

nextRetryTime time.Time // When next retry is allowed

// Pre-emptive refresh mutex
refreshMu sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid point. Even if it is a single goroutine, we cannot leak it, because our tests check for leaked goroutines and therefore unit tests will fail. Thanks for trying though.

// Multiple concurrent calls are safe - only one refresh will run at a time.
// The refresh runs in a separate goroutine and does not block the caller.
func (c *jwtTokenFileCallCreds) triggerPreemptiveRefresh() {
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@easwars
Copy link
Contributor

easwars commented Aug 12, 2025

Again, apologies for the delay in the review.

I did a little bit of thinking and I feel we can avoid the second mutex and also avoid locking and unlocking multiple times in several codepaths with the following approach.

(this is mostly going to be pseudocode ... so please bear with me)

  • Separate out the file reading and expiration extracting functionality into a separate type, JWTFileReader or some other name

    • This would have a NewXxx function that would take the file name
    • This type would also have a method to read the token from file: ReadToken() (token string, expiry time.Time, err error)
    • This type would be very simple and would not know or care about any of the other stuff happening in the creds (like preemptive refreshes, ongoing refreshes etc)
    • It would be the responsibility of the caller to ensure that they don't call the ReadToken() method concurrently. Nothing bad will happen if they call it concurrently, just that there would unnecessary file reads.
  • We would maintain the following state in the creds (all protected by a single sync.Mutex)

    • token
    • expiry time
    • previously seen error
    • retry attempt number
    • next retry time
    • pending refresh (boolean)
    • a condition variable for starting a token refresh (sync.Cond)
  • The GetRequestMetadata would work as follows:

    • Lock the mutex
    • Defer a call to unlock the mutex
    • If the token is valid:
      • if we need a pre-emptive refresh:
        • if there is no pending refresh:
          • set the pending bit
          • spawn a goroutine to refresh the token
      • return token
    • If we are in backoff:
      • return error
    • If there is no pending request:
      • set the pending bit
      • spawn a goroutine to refresh the token
    • Wait for the condition that (pending == false), using the condition variable. This will be something like:
        for pending == true {
          cond.Wait()
        }
      • When there is an ongoing refresh, this will block the current goroutine without holding the mutex
    • Once we get past the condition variable, we can safely return the result from the our state
      • Once we get past cond.Wait(), we will again have possession of the mutex, and can therefore safely access our state
      • we either have a valid token or a valid error
  • The goroutine to refresh the token would do the following:

    • Call fileReader.ReadToken()
      • (the above call is now happening without the mutex), but since spawning of this goroutine is controlled by the pending bit, we are always guaranteed that only one instance of this goroutine is ever running)
    • Lock mutex
      • Update internal state (token, expiry, error, backoff state)
      • reset the pending bit
    • Unlock mutex
    • Call Broadcast on the condition variable to unblock all goroutines waiting on this condition

Let me know what you think.

Copy link

This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Aug 18, 2025
@dimpavloff
Copy link
Author

Hi @easwars , apologies for the delay, I was on holiday. Thanks for the review and for the suggested approach. I don't have the code very fresh in my mind anymore 😅 but I think that makes sense. I will see what I can do in the coming days and get back to you.

@github-actions github-actions bot removed the stale label Aug 21, 2025
Copy link
Author

@dimpavloff dimpavloff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@easwars , I managed to iterate on top of your suggestion to simplify it further, we no longer need the condition variable, just one lock! LMK what you think

Comment on lines 99 to 101
return map[string]string{
"authorization": "Bearer " + token,
}, nil
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I've cached it just in case

}

// newJWTFileReader creates a new JWTFileReader for the specified file path.
func newJWTFileReader(tokenFilePath string) *jWTFileReader {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to highlight, I made both the struct and function private. LMK if you intended for them to be public

@dimpavloff dimpavloff changed the title xds: implement file-based JWT Call Credentials (A97) credentials: implement file-based JWT Call Credentials (part 1 for A97) Aug 22, 2025
@dimpavloff dimpavloff requested a review from easwars August 22, 2025 11:27
@dimpavloff dimpavloff removed their assignment Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Auth Includes regular credentials API and implementation. Also includes advancedtls, authz, rbac etc. Status: Requires Reporter Clarification Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants