1
1
# EP-001: Redfish Power Monitoring Support
2
2
3
- - ** Status** : Draft
3
+ - ** Status** : Implemented
4
+ - ** Maturity** : Experimental
4
5
- ** Author** : Sunil Thaha
5
6
- ** Created** : 2025-08-14
7
+ - ** Updated** : 2025-08-28
6
8
7
9
## Summary
8
10
@@ -109,9 +111,25 @@ graph TD
109
111
110
112
Implements standard Kepler patterns:
111
113
112
- - ` service.Initializer ` : Configuration and connection setup
113
- - ` service.Runner ` : Periodic power collection with context
114
- - ` service.Shutdowner ` : Clean resource release
114
+ - ` service.Initializer ` : Configuration and BMC connection setup
115
+ - ` service.Runner ` : Hybrid collection mode with context cancellation
116
+ - ` service.Shutdowner ` : Clean resource release and client disconnection
117
+
118
+ ### Implementation Details
119
+
120
+ ** Demand-Based Architecture:**
121
+
122
+ - ` LatestReading() ` : Returns cached data or triggers collection if stale
123
+ - ` ensureFreshData() ` : Checks staleness and coordinates collection
124
+ - ` synchronizedPowerRefresh() ` : Thread-safe BMC data collection with retry logic
125
+ - Atomic pointers for thread-safe data access (` lastReading ` , ` lastUpdateTime ` )
126
+ - Singleflight pattern prevents concurrent BMC API calls
127
+
128
+ ** Service Lifecycle:**
129
+
130
+ - ` Init() ` : Establishes BMC connection, validates credentials
131
+ - ` Run() ` : Implements hybrid collection based on ` interval ` configuration
132
+ - ` Shutdown() ` : Gracefully disconnects from BMC
115
133
116
134
### Configuration
117
135
@@ -123,9 +141,15 @@ type Platform struct {
123
141
}
124
142
125
143
type Redfish struct {
126
- Enabled *bool ` yaml:"enabled"`
127
- NodeID string ` yaml:"nodeID"`
128
- ConfigFile string ` yaml:"configFile"`
144
+ Enabled *bool ` yaml:"enabled"`
145
+ NodeID string ` yaml:"nodeID"`
146
+ ConfigFile string ` yaml:"configFile"`
147
+ Collection RedfishCollection ` yaml:"collection"`
148
+ }
149
+
150
+ type RedfishCollection struct {
151
+ Staleness time.Duration ` yaml:"staleness"` // Max age before forcing new collection
152
+ Interval time.Duration ` yaml:"interval"` // Periodic collection interval (0 = on-demand only)
129
153
}
130
154
```
131
155
@@ -135,6 +159,8 @@ type Redfish struct {
135
159
--platform.redfish.enabled=true
136
160
--platform.redfish.node-id=worker-1
137
161
--platform.redfish.config=/etc/kepler/redfish.yaml
162
+ --platform.redfish.collection.staleness=30s
163
+ --platform.redfish.collection.interval=0
138
164
```
139
165
140
166
** Main Configuration (` hack/config.yaml ` ):**
@@ -147,6 +173,9 @@ platform:
147
173
enabled : true
148
174
nodeID : " worker-1" # Node identifier for BMC mapping
149
175
configFile : " /etc/kepler/redfish.yaml"
176
+ collection :
177
+ staleness : 30s # Max age before forcing new collection
178
+ interval : 0 # Periodic collection interval (0 = on-demand only)
150
179
` ` `
151
180
152
181
**BMC Configuration (` /etc/kepler/redfish.yaml`):**
@@ -185,6 +214,47 @@ bmcs:
185
214
186
215
` ` `
187
216
217
+ # # Collection Strategy
218
+
219
+ The Redfish service implements a **demand-based collection pattern** for optimal resource efficiency :
220
+
221
+ # ## Collection Modes
222
+
223
+ 1. **On-demand Only** (`interval : 0`, default):
224
+ - No periodic BMC polling
225
+ - Data collected only when requested via `LatestReading()`
226
+ - Collection triggered when data is stale (older than `staleness` threshold)
227
+
228
+ 2. **Hybrid Mode** (`interval : >0`):
229
+ - Periodic collection at specified interval
230
+ - Plus on-demand collection when data is stale between intervals
231
+
232
+ # ## Resource Efficiency Benefits
233
+
234
+ - **Reduces BMC API calls**: No unnecessary polling when data isn't needed
235
+ - **Configurable freshness**: `staleness` parameter controls data age tolerance
236
+ - **Thread-safe**: Uses singleflight pattern to prevent concurrent BMC calls
237
+ - **Graceful degradation**: Service continues running despite BMC failures
238
+
239
+ # ## Example Collection Behaviors
240
+
241
+ ` ` ` yaml
242
+ # Minimal BMC load - collect only when needed
243
+ collection:
244
+ staleness: 30s
245
+ interval: 0
246
+
247
+ # Balanced approach - periodic backup with on-demand freshness
248
+ collection:
249
+ staleness: 15s
250
+ interval: 60s
251
+
252
+ # Aggressive collection - frequent updates
253
+ collection:
254
+ staleness: 5s
255
+ interval: 10s
256
+ ` ` `
257
+
188
258
# # Metrics
189
259
190
260
Platform-level metrics are introduced as a separate metric namespace to distinguish from
@@ -201,7 +271,8 @@ metal or within a VM. This separation enables:
201
271
Energy counters (`kepler_platform_joules_total`) are not supported because :
202
272
203
273
- Redfish does not provide native energy counters
204
- - BMC polling is intermittent (every 10 seconds) vs continuous monitoring
274
+ - BMC polling is intermittent and on-demand vs continuous monitoring
275
+ - Collection frequency varies based on demand and configuration
205
276
206
277
` ` ` prometheus
207
278
# Platform power metrics (bare metal power consumption)
0 commit comments