Skip to content

Conversation

sthaha
Copy link
Collaborator

@sthaha sthaha commented Aug 12, 2025

Introduces enhancement proposal for adding MSR (Model Specific Register) support as a fallback mechanism when Intel RAPL powercap sysfs interface is unavailable. This improves Kepler's deployment flexibility in environments with restricted powercap access.

The proposal includes:

  • Architecture design using powerReader abstraction
  • Security considerations for MSR access (PLATYPUS mitigation)
  • Phased implementation plan with backward compatibility
  • Configuration for opt-in MSR fallback behavior

@github-actions github-actions bot added the docs Documentation changes label Aug 12, 2025
@sthaha sthaha force-pushed the feat-rapl-msr branch 5 times, most recently from 7fd5fd5 to f64731f Compare August 12, 2025 02:20
Introduces enhancement proposal for adding MSR (Model Specific Register)
support as a fallback mechanism when Intel RAPL powercap sysfs interface
is unavailable. This improves Kepler's deployment flexibility in
environments with restricted powercap access.

The proposal includes:
- Architecture design using powerReader abstraction
- Security considerations for MSR access (PLATYPUS mitigation)
- Phased implementation plan with backward compatibility
- Configuration for opt-in MSR fallback behavior

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
@vimalk78
Copy link
Collaborator

request not use AI to generate enhancement proposals. a short, concise proposal can convey the intention better, IMHO.

Comment on lines +45 to +49
- **Primary Goal**: Implement MSR-based RAPL reading as automatic fallback when
powercap is unavailable
- **Secondary Goal**: Maintain existing CPUPowerMeter interface compatibility
- **Tertiary Goal**: Provide configurable control over fallback behavior for
security-conscious deployments
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Primary Goal**: Implement MSR-based RAPL reading as automatic fallback when
powercap is unavailable
- **Secondary Goal**: Maintain existing CPUPowerMeter interface compatibility
- **Tertiary Goal**: Provide configurable control over fallback behavior for
security-conscious deployments
- Implement MSR-based RAPL reading as automatic fallback when
powercap is unavailable
- Maintain existing CPUPowerMeter interface compatibility
- Provide configurable control over fallback behavior for
security-conscious deployments

Comment on lines +105 to +109
style CPUPowerMeter fill:#e1f5fe
style raplPowerMeter fill:#b3e5fc
style powercapReader fill:#81d4fa
style msrReader fill:#ffccbc
style zoneAdapter fill:#c5e1a5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
style CPUPowerMeter fill:#e1f5fe
style raplPowerMeter fill:#b3e5fc
style powercapReader fill:#81d4fa
style msrReader fill:#ffccbc
style zoneAdapter fill:#c5e1a5

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the text visible

Comment on lines +340 to +342
kepler_node_package_energy_millijoule{node="node1"} 12345
kepler_node_core_energy_millijoule{node="node1"} 6789
kepler_node_dram_energy_millijoule{node="node1"} 3456
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what metrics are these?

Comment on lines +324 to +325
2. **Phase 2**: Enable MSR fallback in staging environments
3. **Phase 3**: Gradual rollout to production with monitoring
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is meant by "staging environment", and "rollout to production" ?

style zoneAdapter fill:#c5e1a5
```

### Key Design Choices
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading MSR will require higher privileges

@sthaha
Copy link
Collaborator Author

sthaha commented Aug 13, 2025

request not use AI to generate enhancement proposals. a short, concise proposal can convey the intention better, IMHO.

I agree, let me strip to down to bare minimal and resubmit.

@sthaha sthaha closed this Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants