You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -66,76 +70,82 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
66
70
67
71
## Summary
68
72
69
-
Proposal to add a new Kubelet gRPC API with endpoint returning local Pods readiness information.
70
-
Serving that information by Kubelet within a Node will increase reliability and reduce load to the Kubernetes API Server and traffic outside the node. A connectivity issue between Node and Control Plane should not impact workloads which depend on Pods readiness statuses.
73
+
Proposal to add a new Kubelet gRPC API with endpoint returning local Pods information,
74
+
including the full PodSpec and status, not just readiness.
75
+
Serving that information by Kubelet within a Node will increase reliability and reduce load to
76
+
the Kubernetes API Server and traffic outside the node. A connectivity issue between Node and
77
+
Control Plane should not impact workloads which depend on Pods state information.
78
+
Because the full PodSpec may contain sensitive information, access to this API will be secured by
79
+
restricting the UNIX socket to local admin users only.
71
80
72
81
## Motivation
73
82
74
83
Kubelet is responsible for running Health Checks (probes) and communicating the
75
84
results via Pod status. All that information is stored in cache and reported to
76
-
Kube-API. Right now pod's readiness information is tightly coupled with the Kubernetes API
85
+
Kube-API. Right now pod's state information is tightly coupled with the Kubernetes API
77
86
Server. When a workload wants to know the actual state of Pods running on the
78
87
Node, it needs to fetch it from Kube-API. This causes some issues:
79
88
80
89
* Reliability - for various reasons Kube-API might not be available
81
90
(connectivity issue, control plane updates)
82
91
* Scalability - adding new watchers to kube-API is a scalability concern. By
83
-
exposing the endpoint that will serve the Pods readiness status directly from the Kubelet
84
-
cache we can use it on node workloads and avoid additional dependencies on the
92
+
exposing the endpoint that will serve the Pods state directly from the Kubelet
93
+
cache we can use it on node workloads and avoid additional dependencies on the
85
94
Kubernetes API Server.
95
+
* Flexibility - consumers may need more than just readiness, such as phase, IPs,
96
+
resource usage, or labels/annotations, or the full pod spec. Supporting field selection
97
+
enables lean, efficient queries, but the API can also provide the full pod object when needed.
98
+
Because the full pod spec may contain secrets or other sensitive data, access must be tightly controlled.
86
99
87
100
| Impact | Description|
88
101
| ------- | ------------ |
89
-
| + | Reliability - for various reasons kube-API might not be available but this doesn’t mean that local workloads are not accessible and on node system workloads should have the most recent data about pod's readiness even when kube-API is unreachable. |
90
-
| + | Scalability - Reduce the load on kube-API by reducing the number of watchers and using Kubelet to fetch local Pods readiness. Fetching only Pods limited to one node is costly operation for kube-API. |
102
+
| + | Reliability - for various reasons kube-API might not be available but this doesn’t mean that local workloads are not accessible and on node system workloads should have the most recent data about pod's state even when kube-API is unreachable. |
103
+
| + | Scalability - Reduce the load on kube-API by reducing the number of watchers and using Kubelet to fetch local Pods information. Fetching only Pods limited to one node is costly operation for kube-API. |
91
104
| + | Safety - Read-only API will not add security risks. |
92
-
| + | Reduce resource consumption by workload. Using Kube-API we can fetch objects like PodSpec or PodStatus, for some on-node workload this might be unnecessary, with this API workload can reduce the resource consumption. |
105
+
| + | Reduce resource consumption by workload. Using Kube-API we can fetch objects like PodSpec or PodStatus, for some on-node workload this might be unnecessary, with this API workload can reduce the resource consumption by requesting only the fields they need. The API can also provide the full pod spec for advanced use cases, but this is restricted to authorized local users. |
93
106
| - | This API will add load to Kubelet (mitigation: API will be rate limited) |
94
-
| - | Kubelet does not support RBAC authorization for gRPC. (mitigation: This API is designed to be accessible for all workloads running on the node without the authorization. The unix socket will be used for the connection and all exposed data will be carefully reviewed.) |
95
-
| - | Limiting the scope of the API to the readiness information because we are not introducing the RBAC for this API. |
107
+
| - | Kubelet does not support RBAC authorization for gRPC. (mitigation: This API is designed to be accessible only to local admin users. The unix socket will be secured with file permissions to restrict access to privileged users, and all exposed data will be carefully reviewed.) |
96
108
97
109
### Goals
98
110
99
-
The goal of this API is to expose Pod readiness information directly from the
111
+
The goal of this API is to expose comprehensive Pod information, including the full PodSpec and status, directly from the
100
112
source - Kubelet, independent of Control Plane availability. This would remove the need for node-local
101
113
components to request this node-local information from
102
114
the Kubernetes API Server.
103
115
116
+
The API should allow consumers to request only the fields they need, using protobuf fieldmasks, to enable efficient
117
+
and lean data transfer. For advanced use cases, the full pod spec can be returned, but only to authorized local admin
118
+
users due to the potential for sensitive information.
| Too many requests to the API impacting the Kubelet performances | High | Rate limiting the API. |
181
-
| Misuse of the API | High | This API is Read-only. We will expose only a small portion of the pod's information related to the pod readiness. Exposed data does not contain sensitive information that could be used in a malicious way. |
182
-
| Kubelet restart [issue](https://github.com/kubernetes/kubernetes/issues/100277)| High | This API should serve only complete information about workloads readiness. If Kubelet is in the init phase and not all pod’s readiness information is known, then the API should report the error. |
199
+
| Misuse of the API | High | This API is Read-only. We will expose only a subset of the pod's information relevant for node-local consumers. Exposed data does not contain sensitive information that could be used in a malicious way. |
200
+
| Kubelet restart [issue](https://github.com/kubernetes/kubernetes/issues/100277)| High | This API should serve only complete information about workloads. If Kubelet is in the init phase and not all pod’s information is known, then the API should report the error. |
183
201
| Unauthorized access to the API | Medium | This API is designed to be accessed by all on-node workloads. Authorization will be provided by unix standard permissions to the socket file. |
184
-
| Exposing the API to all workloads on the node | Medium | Exposed data via the API is limited to readiness information only. |
185
-
| Kube-API is down or unreachable | Low | Kube-API availability should not impact this API. When the control plane is down or unreachable but Kubelet is working properly this API should return most recent data about local Pods readiness that are computed by Kubelet even if those data were not reported or accepted by Kube-API. |
202
+
| Exposing the API to all workloads on the node | Medium |The unix socket will be secured with file permissions to restrict access to local admin users only. Exposed data via the API may contain sensitive information (e.g., environment variables, secrets references) present in the PodSpec, so access is tightly controlled. |
203
+
| Kube-API is down or unreachable | Low | Kube-API availability should not impact this API. When the control plane is down or unreachable but Kubelet is working properly this API should return most recent data about local Pods that are computed by Kubelet even if those data were not reported or accepted by Kube-API. |
186
204
187
205
#### Control Plane availability issue
188
206
@@ -217,75 +235,93 @@ the Pods. We don't want this API to return partial data.
217
235
218
236
## Design Details
219
237
238
+
### Pod State Selection
239
+
240
+
Kubelet maintains multiple representations of pod state:
241
+
- The state derived from the Container Runtime Interface (CRI), reflecting the actual status of containers on the node.
242
+
- The state that is sent to the API server, which may be subject to additional processing or delays.
243
+
- The state received from the API server, reflecting the control plane's view.
244
+
245
+
It is important to define which state this API will serve. For the purposes of this API, the intent is to serve the most
246
+
up-to-date and accurate pod state as known locally by the Kubelet, typically the state based on the CRI and Kubelet's internal
247
+
reconciliation, rather than the potentially stale state received from the API server. This ensures consumers receive the
248
+
freshest possible information about pods running on the node.
249
+
220
250
### Proposed API
221
251
222
-
We propose to add new gPRC API `status` in Kubelet, listening on a unix socket
252
+
We propose to add new gRPC API `status` in Kubelet, listening on a unix socket
223
253
at `/var/lib/Kubelet/status/Kubelet.sock`. The endpoint will be versioned. The
224
-
gRPC Service will expose 3 methods serving local Pods statuses data:
254
+
gRPC Service will expose 3 methods serving local Pods data:
225
255
226
256
```protobuf
257
+
import "google/protobuf/field_mask.proto";
258
+
227
259
service PodStatus {
228
-
// ListPodStatus returns a of List of PodStatus
260
+
// ListPodStatus returns a list of PodInfo, filtered by field mask.
// PodInfo returns a Pod's details, spec, and status.
270
+
message PodInfo {
271
+
string podUID = 1;
272
+
string podNamespace = 2;
273
+
string podName = 3;
274
+
bool static = 4;
275
+
// Full PodSpec as defined in v1.PodSpec.
276
+
PodSpec spec = 5;
277
+
// PodStatus as defined in v1.PodStatus.
278
+
PodStatus status = 6;
279
+
// ...other fields as needed...
245
280
}
246
281
247
-
// PodConditionType aligns with v1.PodConditionType
248
-
enum PodConditionType {
249
-
ContainersReady = 0;
250
-
Initialized = 1;
251
-
Ready = 2;
252
-
PodScheduled = 3;
253
-
DisruptionTarget = 4;
282
+
// PodSpec and PodStatus should align with the Kubernetes API definitions.
283
+
// For brevity, only a placeholder is shown here.
284
+
message PodSpec {
285
+
// ...fields from v1.PodSpec...
254
286
}
255
-
256
-
// ConditionStatus aligns with v1.ConditionStatus
257
-
enum ConditionStatus {
258
-
True = 0;
259
-
False = 1;
260
-
Unknown = 2;
287
+
message PodStatus {
288
+
// ...fields from v1.PodStatus...
261
289
}
262
290
263
-
// PodStatus returns a Pod details and list of status Conditions with deletion info.
264
-
message PodStatus {
265
-
string podUID = 1;
266
-
string podNamespace = 2;
267
-
string podName = 3;
268
-
bool static = 4;
269
-
repeated PodCondition conditions = 5;
270
-
Timestamp DeletionTimestamp = 3;
291
+
// PodStatusListRequest allows specifying a field mask.
292
+
message PodStatusListRequest {
293
+
google.protobuf.FieldMask field_mask = 1;
271
294
}
272
295
273
-
// PodStatusResponse returns a stream of List of PodStatus.
274
-
// Whenever a Pod state changes it will return the new list.
296
+
// PodStatusListResponse returns a list of PodInfo.
275
297
message PodStatusListResponse {
276
-
// PodStatus includes the Readiness information of Pods.
277
-
// In the future it may be extended to include additional information.
278
-
repeated PodStatus Pods = 1;
298
+
repeated PodInfo pods = 1;
279
299
}
280
300
281
-
// PodStatusGetRequest contains Pods UID
301
+
// PodStatusGetRequest contains Pod UID and optional field mask.
282
302
message PodStatusGetRequest {
283
303
string podUID = 1;
304
+
google.protobuf.FieldMask field_mask = 2;
305
+
}
306
+
307
+
// PodStatusGetResponse returns a PodInfo.
308
+
message PodStatusGetResponse {
309
+
PodInfo pod = 1;
284
310
}
311
+
312
+
// ...other request/response messages as needed...
285
313
```
286
314
315
+
The use of `google.protobuf.FieldMask` allows clients to specify which fields of the PodInfo message they are interested in,
316
+
enabling lean and efficient responses. The full PodSpec and status can be returned for authorized local admin users.
317
+
318
+
### Test Plan
319
+
320
+
##### Prerequisite testing updates
321
+
287
322
##### Unit tests
288
323
324
+
##### Integration tests
289
325
290
326
##### e2e tests
291
327
@@ -330,11 +366,11 @@ N/A
330
366
331
367
###### Does enabling the feature change any default behavior?
332
368
333
-
No.
369
+
No, but the API will only be accessible to local admin users due to the sensitive nature of the full PodSpec.
334
370
335
371
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
336
372
337
-
Yes, through feature gates.
373
+
Yes, through feature gates and by restricting/removing access to the UNIX socket.
338
374
339
375
###### What happens if we reenable the feature if it was previously rolled back?
340
376
@@ -434,7 +470,7 @@ N/A.
434
470
435
471
###### What are other known failure modes?
436
472
437
-
The Kubelet might be in init phase when client call the API. The API should return well-known error message.
473
+
The Kubelet might be in init phase when client call the API. The API should return well-known error message. Unauthorized access attempts will be prevented by UNIX socket file permissions.
438
474
439
475
###### What steps should be taken if SLOs are not being met to determine the problem?
440
476
@@ -443,11 +479,12 @@ The API should be disabled using the feature gate.
443
479
## Implementation History
444
480
445
481
- 2023-09-05: KEP created
482
+
- 2025-09-30: KEP updated with new API and goals
446
483
447
484
## Drawbacks
448
485
449
486
## Alternatives
450
487
451
488
## Future work
452
489
453
-
This API is open to future extension but added information should be limited to pod's readiness information.
490
+
This API is open to future extension but added information should be limited to pod's information relevant for node-local consumers. The use of field masks allows for future extensibility while maintaining efficient message sizes. The security model may be revisited if finer-grained access control is needed in the future.
0 commit comments