Skip to content

Commit 97ede83

Browse files
committed
KEP-4188: update goals and apply latest template
Signed-off-by: Matthias Bertschy <matthias.bertschy@gmail.com>
1 parent d68340f commit 97ede83

File tree

2 files changed

+119
-82
lines changed

2 files changed

+119
-82
lines changed

keps/sig-node/4188-kubelet-pod-readiness-api/README.md

Lines changed: 114 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,25 @@
1-
# KEP-4188: New Kubelet gRPC API with endpoint returning local Pods readiness information
1+
# KEP-4188: New Kubelet gRPC API with endpoint returning local Pods information
22

33
<!-- toc -->
44
- [Release Signoff Checklist](#release-signoff-checklist)
55
- [Summary](#summary)
66
- [Motivation](#motivation)
77
- [Goals](#goals)
88
- [Non-Goals](#non-goals)
9-
- [User Stories](#user-stories)
109
- [Proposal](#proposal)
1110
- [What kind of API to chose?](#what-kind-of-api-to-chose)
1211
- [Can we integrate with PodResource API?](#can-we-integrate-with-podresource-api)
12+
- [User Stories](#user-stories)
1313
- [Risks and Mitigations](#risks-and-mitigations)
1414
- [Control Plane availability issue](#control-plane-availability-issue)
1515
- [Kubelet restarts issue](#kubelet-restarts-issue)
1616
- [Design Details](#design-details)
17+
- [Pod State Selection](#pod-state-selection)
1718
- [Proposed API](#proposed-api)
19+
- [Test Plan](#test-plan)
20+
- [Prerequisite testing updates](#prerequisite-testing-updates)
1821
- [Unit tests](#unit-tests)
22+
- [Integration tests](#integration-tests)
1923
- [e2e tests](#e2e-tests)
2024
- [Graduation Criteria](#graduation-criteria)
2125
- [Alpha](#alpha)
@@ -66,76 +70,82 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
6670

6771
## Summary
6872

69-
Proposal to add a new Kubelet gRPC API with endpoint returning local Pods readiness information.
70-
Serving that information by Kubelet within a Node will increase reliability and reduce load to the Kubernetes API Server and traffic outside the node. A connectivity issue between Node and Control Plane should not impact workloads which depend on Pods readiness statuses.
73+
Proposal to add a new Kubelet gRPC API with endpoint returning local Pods information,
74+
including the full PodSpec and status, not just readiness.
75+
Serving that information by Kubelet within a Node will increase reliability and reduce load to
76+
the Kubernetes API Server and traffic outside the node. A connectivity issue between Node and
77+
Control Plane should not impact workloads which depend on Pods state information.
78+
Because the full PodSpec may contain sensitive information, access to this API will be secured by
79+
restricting the UNIX socket to local admin users only.
7180

7281
## Motivation
7382

7483
Kubelet is responsible for running Health Checks (probes) and communicating the
7584
results via Pod status. All that information is stored in cache and reported to
76-
Kube-API. Right now pod's readiness information is tightly coupled with the Kubernetes API
85+
Kube-API. Right now pod's state information is tightly coupled with the Kubernetes API
7786
Server. When a workload wants to know the actual state of Pods running on the
7887
Node, it needs to fetch it from Kube-API. This causes some issues:
7988

8089
* Reliability - for various reasons Kube-API might not be available
8190
(connectivity issue, control plane updates)
8291
* Scalability - adding new watchers to kube-API is a scalability concern. By
83-
exposing the endpoint that will serve the Pods readiness status directly from the Kubelet
84-
cache we can use it on node workloads and avoid additional dependencies on the
92+
exposing the endpoint that will serve the Pods state directly from the Kubelet
93+
cache we can use it on node workloads and avoid additional dependencies on the
8594
Kubernetes API Server.
95+
* Flexibility - consumers may need more than just readiness, such as phase, IPs,
96+
resource usage, or labels/annotations, or the full pod spec. Supporting field selection
97+
enables lean, efficient queries, but the API can also provide the full pod object when needed.
98+
Because the full pod spec may contain secrets or other sensitive data, access must be tightly controlled.
8699

87100
| Impact | Description|
88101
| ------- | ------------ |
89-
| + | Reliability - for various reasons kube-API might not be available but this doesn’t mean that local workloads are not accessible and on node system workloads should have the most recent data about pod's readiness even when kube-API is unreachable. |
90-
| + | Scalability - Reduce the load on kube-API by reducing the number of watchers and using Kubelet to fetch local Pods readiness. Fetching only Pods limited to one node is costly operation for kube-API. |
102+
| + | Reliability - for various reasons kube-API might not be available but this doesn’t mean that local workloads are not accessible and on node system workloads should have the most recent data about pod's state even when kube-API is unreachable. |
103+
| + | Scalability - Reduce the load on kube-API by reducing the number of watchers and using Kubelet to fetch local Pods information. Fetching only Pods limited to one node is costly operation for kube-API. |
91104
| + | Safety - Read-only API will not add security risks. |
92-
| + | Reduce resource consumption by workload. Using Kube-API we can fetch objects like PodSpec or PodStatus, for some on-node workload this might be unnecessary, with this API workload can reduce the resource consumption. |
105+
| + | Reduce resource consumption by workload. Using Kube-API we can fetch objects like PodSpec or PodStatus, for some on-node workload this might be unnecessary, with this API workload can reduce the resource consumption by requesting only the fields they need. The API can also provide the full pod spec for advanced use cases, but this is restricted to authorized local users. |
93106
| - | This API will add load to Kubelet (mitigation: API will be rate limited) |
94-
| - | Kubelet does not support RBAC authorization for gRPC. (mitigation: This API is designed to be accessible for all workloads running on the node without the authorization. The unix socket will be used for the connection and all exposed data will be carefully reviewed.) |
95-
| - | Limiting the scope of the API to the readiness information because we are not introducing the RBAC for this API. |
107+
| - | Kubelet does not support RBAC authorization for gRPC. (mitigation: This API is designed to be accessible only to local admin users. The unix socket will be secured with file permissions to restrict access to privileged users, and all exposed data will be carefully reviewed.) |
96108

97109
### Goals
98110

99-
The goal of this API is to expose Pod readiness information directly from the
111+
The goal of this API is to expose comprehensive Pod information, including the full PodSpec and status, directly from the
100112
source - Kubelet, independent of Control Plane availability. This would remove the need for node-local
101113
components to request this node-local information from
102114
the Kubernetes API Server.
103115

116+
The API should allow consumers to request only the fields they need, using protobuf fieldmasks, to enable efficient
117+
and lean data transfer. For advanced use cases, the full pod spec can be returned, but only to authorized local admin
118+
users due to the potential for sensitive information.
119+
104120
Kubelet already has a podresources endpoint
105121
([2403-pod-resources-allocatable-resources](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2403-pod-resources-allocatable-resources))
106122
which returns information about Pod’s containers and Devices. This API does not
107-
contain information about pod readiness status.
123+
contain information about pod state.
108124

109125
Kubelet is responsible for computing Pod status and stores it in a local cache.
110-
We want to create a new gRPC API that will expose pod conditions that are computed by
126+
We want to create a new gRPC API that will expose pod information that is computed by
111127
Kubelet and return the most recent data even when kube-API is not reachable.
112128
This API is open for future modification if needed but exposed data via this API should be limited
113-
to pod's readiness information.
129+
to pod's information relevant for node-local consumers.
114130

115131
### Non-Goals
116132

117-
Exposing pods detailed data that are not related to pod's readiness.
118-
119-
### User Stories
120-
* Some on node system workloads want to reduce Control Plane dependency and
121-
introduce locality for Pod’s readiness to improve reliability and scalability.
122-
* Custom monitoring tools may want to have local visibility into the readiness
123-
of Pods running on the same Node.
124-
* Some on node system workloads interested in Pod readiness want to reduce
125-
resource consumption.
133+
Exposing cluster-wide pod data or supporting mutating pod state.
126134

127135
## Proposal
128136

129-
We are proposing to create a new Kubelet API that will return pod's readiness information.
137+
We are proposing to create a new Kubelet API that will return pod information, including the full PodSpec and status, not just readiness.
130138

131139
* The API will return data about both Static and Regular Pods.
132140
* The API will not return partial data. If Kubelet does not know actual information
133141
about workloads then gRPC FAILED_PRECONDITION (9) error code will be returned.
134142
* The API should return the most recent information about Pods computed by the
135143
Kubelet even when those data were not reported or accepted by kube-API.
136-
* The API will be read-only and accessible for on-node workloads (we will use a
137-
unix socket for the connection) with authorization limited to unix standard permissions.
144+
* The API will be read-only and accessible for on-node workloads via a
145+
unix socket, with access restricted to local admin users (e.g., root or a specific admin group) using file permissions.
138146
* The API will be versioned.
147+
* The API will support protobuf fieldmasks to allow clients to request only the fields they need from the pod information,
148+
* but can also return the full pod spec for authorized users.
139149

140150
### What kind of API to chose?
141151

@@ -171,18 +181,26 @@ The PodResource API includes an entirely unrelated set of information that is
171181
unlikely to be of use to the set of clients that would benefit from
172182
understanding Pod readiness. We propose creating a new API for this purpose.
173183

184+
### User Stories
185+
* Some on node system workloads want to reduce Control Plane dependency and
186+
introduce locality for Pod’s state to improve reliability and scalability.
187+
* Custom monitoring tools may want to have local visibility into the state
188+
of Pods running on the same Node.
189+
* Some on node system workloads interested in Pod state want to reduce
190+
resource consumption by requesting only the fields they need.
191+
174192
### Risks and Mitigations
175193

176194
This API is read-only, which removes a large class of risks.
177195

178196
| Risk | Impact | Mitigation |
179197
| --------------------------------------------------------- | ------------- | ---------- |
180198
| Too many requests to the API impacting the Kubelet performances | High | Rate limiting the API. |
181-
| Misuse of the API | High | This API is Read-only. We will expose only a small portion of the pod's information related to the pod readiness. Exposed data does not contain sensitive information that could be used in a malicious way. |
182-
| Kubelet restart [issue](https://github.com/kubernetes/kubernetes/issues/100277) | High | This API should serve only complete information about workloads readiness. If Kubelet is in the init phase and not all pod’s readiness information is known, then the API should report the error. |
199+
| Misuse of the API | High | This API is Read-only. We will expose only a subset of the pod's information relevant for node-local consumers. Exposed data does not contain sensitive information that could be used in a malicious way. |
200+
| Kubelet restart [issue](https://github.com/kubernetes/kubernetes/issues/100277) | High | This API should serve only complete information about workloads. If Kubelet is in the init phase and not all pod’s information is known, then the API should report the error. |
183201
| Unauthorized access to the API | Medium | This API is designed to be accessed by all on-node workloads. Authorization will be provided by unix standard permissions to the socket file. |
184-
| Exposing the API to all workloads on the node | Medium | Exposed data via the API is limited to readiness information only. |
185-
| Kube-API is down or unreachable | Low | Kube-API availability should not impact this API. When the control plane is down or unreachable but Kubelet is working properly this API should return most recent data about local Pods readiness that are computed by Kubelet even if those data were not reported or accepted by Kube-API. |
202+
| Exposing the API to all workloads on the node | Medium | The unix socket will be secured with file permissions to restrict access to local admin users only. Exposed data via the API may contain sensitive information (e.g., environment variables, secrets references) present in the PodSpec, so access is tightly controlled. |
203+
| Kube-API is down or unreachable | Low | Kube-API availability should not impact this API. When the control plane is down or unreachable but Kubelet is working properly this API should return most recent data about local Pods that are computed by Kubelet even if those data were not reported or accepted by Kube-API. |
186204

187205
#### Control Plane availability issue
188206

@@ -217,75 +235,93 @@ the Pods. We don't want this API to return partial data.
217235

218236
## Design Details
219237

238+
### Pod State Selection
239+
240+
Kubelet maintains multiple representations of pod state:
241+
- The state derived from the Container Runtime Interface (CRI), reflecting the actual status of containers on the node.
242+
- The state that is sent to the API server, which may be subject to additional processing or delays.
243+
- The state received from the API server, reflecting the control plane's view.
244+
245+
It is important to define which state this API will serve. For the purposes of this API, the intent is to serve the most
246+
up-to-date and accurate pod state as known locally by the Kubelet, typically the state based on the CRI and Kubelet's internal
247+
reconciliation, rather than the potentially stale state received from the API server. This ensures consumers receive the
248+
freshest possible information about pods running on the node.
249+
220250
### Proposed API
221251

222-
We propose to add new gPRC API `status` in Kubelet, listening on a unix socket
252+
We propose to add new gRPC API `status` in Kubelet, listening on a unix socket
223253
at `/var/lib/Kubelet/status/Kubelet.sock`. The endpoint will be versioned. The
224-
gRPC Service will expose 3 methods serving local Pods statuses data:
254+
gRPC Service will expose 3 methods serving local Pods data:
225255

226256
```protobuf
257+
import "google/protobuf/field_mask.proto";
258+
227259
service PodStatus {
228-
// ListPodStatus returns a of List of PodStatus
260+
// ListPodStatus returns a list of PodInfo, filtered by field mask.
229261
rpc ListPodStatus(PodStatusListRequest) returns (PodStatusListResponse) {}
230-
// WatchPodStatus returns a stream of List of PodStatus
231-
// Whenever a pod state change api returns the new list
262+
// WatchPodStatus returns a stream of list of PodInfo, filtered by field mask.
263+
// Whenever a pod state changes, api returns the new list.
232264
rpc WatchPodStatus(PodStatusWatchRequest) returns (stream PodStatusWatchResponse) {}
233-
// GetPodStatus returns a PodStatus for given pod's UID
265+
// GetPodStatus returns a PodInfo for given pod's UID, filtered by field mask.
234266
rpc GetPodStatus(PodStatusGetRequest) returns (PodStatusGetResponse) {}
235267
}
236268
237-
// PodCondition aligns with v1.PodCondition.
238-
message PodCondition {
239-
PodConditionType Type = 1;
240-
ConditionStatus Status = 2;
241-
Timestamp LastProbeTime = 3;
242-
Timestamp LastTransitionTime = 4;
243-
string Reason = 5;
244-
string Message = 6;
269+
// PodInfo returns a Pod's details, spec, and status.
270+
message PodInfo {
271+
string podUID = 1;
272+
string podNamespace = 2;
273+
string podName = 3;
274+
bool static = 4;
275+
// Full PodSpec as defined in v1.PodSpec.
276+
PodSpec spec = 5;
277+
// PodStatus as defined in v1.PodStatus.
278+
PodStatus status = 6;
279+
// ...other fields as needed...
245280
}
246281
247-
// PodConditionType aligns with v1.PodConditionType
248-
enum PodConditionType {
249-
ContainersReady = 0;
250-
Initialized = 1;
251-
Ready = 2;
252-
PodScheduled = 3;
253-
DisruptionTarget = 4;
282+
// PodSpec and PodStatus should align with the Kubernetes API definitions.
283+
// For brevity, only a placeholder is shown here.
284+
message PodSpec {
285+
// ...fields from v1.PodSpec...
254286
}
255-
256-
// ConditionStatus aligns with v1.ConditionStatus
257-
enum ConditionStatus {
258-
True = 0;
259-
False = 1;
260-
Unknown = 2;
287+
message PodStatus {
288+
// ...fields from v1.PodStatus...
261289
}
262290
263-
// PodStatus returns a Pod details and list of status Conditions with deletion info.
264-
message PodStatus {
265-
string podUID = 1;
266-
string podNamespace = 2;
267-
string podName = 3;
268-
bool static = 4;
269-
repeated PodCondition conditions = 5;
270-
Timestamp DeletionTimestamp = 3;
291+
// PodStatusListRequest allows specifying a field mask.
292+
message PodStatusListRequest {
293+
google.protobuf.FieldMask field_mask = 1;
271294
}
272295
273-
// PodStatusResponse returns a stream of List of PodStatus.
274-
// Whenever a Pod state changes it will return the new list.
296+
// PodStatusListResponse returns a list of PodInfo.
275297
message PodStatusListResponse {
276-
// PodStatus includes the Readiness information of Pods.
277-
// In the future it may be extended to include additional information.
278-
repeated PodStatus Pods = 1;
298+
repeated PodInfo pods = 1;
279299
}
280300
281-
// PodStatusGetRequest contains Pods UID
301+
// PodStatusGetRequest contains Pod UID and optional field mask.
282302
message PodStatusGetRequest {
283303
string podUID = 1;
304+
google.protobuf.FieldMask field_mask = 2;
305+
}
306+
307+
// PodStatusGetResponse returns a PodInfo.
308+
message PodStatusGetResponse {
309+
PodInfo pod = 1;
284310
}
311+
312+
// ...other request/response messages as needed...
285313
```
286314

315+
The use of `google.protobuf.FieldMask` allows clients to specify which fields of the PodInfo message they are interested in,
316+
enabling lean and efficient responses. The full PodSpec and status can be returned for authorized local admin users.
317+
318+
### Test Plan
319+
320+
##### Prerequisite testing updates
321+
287322
##### Unit tests
288323

324+
##### Integration tests
289325

290326
##### e2e tests
291327

@@ -330,11 +366,11 @@ N/A
330366

331367
###### Does enabling the feature change any default behavior?
332368

333-
No.
369+
No, but the API will only be accessible to local admin users due to the sensitive nature of the full PodSpec.
334370

335371
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
336372

337-
Yes, through feature gates.
373+
Yes, through feature gates and by restricting/removing access to the UNIX socket.
338374

339375
###### What happens if we reenable the feature if it was previously rolled back?
340376

@@ -434,7 +470,7 @@ N/A.
434470

435471
###### What are other known failure modes?
436472

437-
The Kubelet might be in init phase when client call the API. The API should return well-known error message.
473+
The Kubelet might be in init phase when client call the API. The API should return well-known error message. Unauthorized access attempts will be prevented by UNIX socket file permissions.
438474

439475
###### What steps should be taken if SLOs are not being met to determine the problem?
440476

@@ -443,11 +479,12 @@ The API should be disabled using the feature gate.
443479
## Implementation History
444480

445481
- 2023-09-05: KEP created
482+
- 2025-09-30: KEP updated with new API and goals
446483

447484
## Drawbacks
448485

449486
## Alternatives
450487

451488
## Future work
452489

453-
This API is open to future extension but added information should be limited to pod's readiness information.
490+
This API is open to future extension but added information should be limited to pod's information relevant for node-local consumers. The use of field masks allows for future extensibility while maintaining efficient message sizes. The security model may be revisited if finer-grained access control is needed in the future.

keps/sig-node/4188-kubelet-pod-readiness-api/kep.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ reviewers:
1313
approvers:
1414
- "@dchen1107"
1515
- "@mrunalp"
16-
see-also:
16+
see-also: []
1717
replaces: []
1818

1919
# The target maturity stage in the current dev cycle for this KEP.
@@ -22,13 +22,13 @@ stage: alpha
2222
# The most recent milestone for which work toward delivery of this KEP has been
2323
# done. This can be the current (upcoming) milestone, if it is being actively
2424
# worked on.
25-
latest-milestone: "v1.29"
25+
latest-milestone: "v1.35"
2626

2727
# The milestone at which this feature was, or is targeted to be, at each stage.
2828
milestone:
29-
alpha: "v1.29"
30-
beta: "v1.30"
31-
stable: "v1.31"
29+
alpha: "v1.35"
30+
beta: "v1.36"
31+
stable: "v1.37"
3232

3333
# The following PRR answers are required at alpha release
3434
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)