-
Notifications
You must be signed in to change notification settings - Fork 481
Provide real-time service metrics #1874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Provide real-time service metrics #1874
Conversation
899ca46
to
9dfa6dd
Compare
9dfa6dd
to
ddf8c80
Compare
ddf8c80
to
cb4594b
Compare
6699874
to
6511070
Compare
6511070
to
ceb14d1
Compare
Thanks for submitting this @rkojedzinszky. I'm always excited to see people from the community that want to make kube-router better! As a small note for next time, typically with larger change sets like this, it is best to open an issue ahead of time with the project just to ensure that we don't waste your development time on a feature that might not be accepted by the project. The issue also helps because the template usually guides users through the process of describing the business value of the change a little better. In this case, what is the real-world problem that you're trying to solve? Is it fair to say that you're seeing an adverse impact on minimal systems from IPVS metric collection when it is attached to the service update timer? Are you able to quantify this? What type of intervals do you use after this update is made? How has it impacted your cluster? |
Closes #1875 |
@aauren I really appreciate your attention, I've created an issue for this. Sorry for me being hasty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for this PR @rkojedzinszky!
I really appreciate the thought and time that you put into this PR and I agree with the use-case. Just because people have long (or short) service sync times, doesn't mean that it should change the granularity of the metrics produced. I really like this approach to making the metrics produce just in time.
A lot of the review (other than requesting that metric logic be pulled out of the network_services_controller.go
file, is mostly questions and genuine query for your thoughts. Let me know what you think!
@@ -725,7 +720,21 @@ func (nsc *NetworkServicesController) syncIpvsFirewall() error { | |||
return nil | |||
} | |||
|
|||
func (nsc *NetworkServicesController) publishMetrics(serviceInfoMap serviceInfoMap) error { | |||
func (*NetworkServicesController) Describe(ch chan<- *prometheus.Desc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over time I've been trying to decrease the size of the controller files. Lets go ahead and move this out of this file and into a dedicated metrics.go
file inside the controllers/proxy
package to separate the concerns a bit and reduce the complexity of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will.
|
||
protocol = convertSvcProtoToSysCallProto(svc.protocol) | ||
for _, ipvsSvc := range ipvsSvcs { | ||
serviceMap := map[svcMapKey]*serviceInfo{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what you did here with the map instead of having the embeded for loops!
However, this Collect()
function is still a bit large. Let's go ahead and separate this serviceMap
building into its own function. We could potentially separate out the second for
loop into its own function as well and then just call both from Collect()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will.
@@ -115,7 +118,7 @@ type NetworkServicesController struct { | |||
krNode utils.NodeAware | |||
syncPeriod time.Duration | |||
mu sync.Mutex | |||
serviceMap serviceInfoMap | |||
serviceMap unsafe.Pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is semi related to the request to move metrics out of this file. If possible, I would really like to remove the usage of unsafe pointers in this design.
After thinking about this a bit, what would you think of a pattern that looks more like the following?
type MetricsCollector struct {
serviceMapChan chan serviceInfoMap
metricsChan chan MetricsRequest
}
type MetricsRequest struct {
ServiceName string
Response chan ServiceMetrics
}
type ServiceMetrics struct {
ServiceName string
Connections int
Packets int
// ... other metrics
}
func NewMetricsCollector() *MetricsCollector {
mc := &MetricsCollector{
serviceMapChan: make(chan serviceInfoMap, 1),
metricsChan: make(chan MetricsRequest),
}
go mc.metricsManager()
return mc
}
func (mc *MetricsCollector) metricsManager() {
var currentMap serviceInfoMap
for {
select {
case newMap := <-mc.serviceMapChan:
currentMap = newMap
case req := <-mc.metricsChan:
// Calculate metrics for the specific service
if service, exists := currentMap[req.ServiceName]; exists {
metrics := mc.calculateMetrics(service)
req.Response <- metrics
} else {
req.Response <- ServiceMetrics{ServiceName: req.ServiceName}
}
}
}
}
func (mc *MetricsCollector) calculateMetrics(service *serviceInfo) ServiceMetrics {
// Implement IPVS metrics calculation here
return ServiceMetrics{
ServiceName: service.name,
Connections: mc.getIPVSConnections(service),
Packets: mc.getIPVSPackets(service),
}
}
func (mc *MetricsCollector) GetServiceMetrics(serviceName string) ServiceMetrics {
responseChan := make(chan ServiceMetrics, 1)
mc.metricsChan <- MetricsRequest{
ServiceName: serviceName,
Response: responseChan,
}
return <-responseChan
}
func (mc *MetricsCollector) UpdateServiceMap(serviceMap serviceInfoMap) {
mc.serviceMapChan <- serviceMap
}
This is just some pseudo-code, so don't take it as the end all be all approach to implementing a metric collector. Mostly just trying to show an approach that uses channels over atomic references.
That keeps the type safety of the serviceInfoMap
, eliminates the possibility of a dangling pointer reference (even if that possibility is slim in your implementation), and is more idiomatic and testable.
Generally curious on your thoughts for this approach though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the purpose of the unsafe pointer was to simply make the pointer read/write operations atomic, and avoid using a mutex. For readability, of course a simple mutex and the real pointer could be used. I see you proposal too complex here, did not catch the point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally feel that using channels is just more idiomatic and understandable for future use-cases.
But its fine for now. If you don't mind, just add a comment above the unsafe pointer declarations with this info in it for now. We can see how it goes and refactor later if needed.
@rkojedzinszky I think that we're almost there with this one. We just need to separate the logic out into a metrics.go file, break up the collect function a bit, and then add a comment above the unsafe pointer declaration. Let me know if you run into any time constraints and I can make the changes as well. I think this will be a good one to get in for the rest of the community. |
I am bit busy now, I'll be able to work on that in the next two weeks. |
Sounds good! |
2b8a15f
to
4102dd9
Compare
Service metrics are collected during scraping now, rather than prepopulating them. This allows more frequent collection, and not bound to service synchronization period anymore. For example, on a PI cluster, CPU cycles can be saved with higher synchronization intervals, howewer, metrics still can be collected real-time.