-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fix(observability): Aggregate metrics #5174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Apologies @ktff, I'm playing with this locally and got trigger-happy with the 'push' on a master merge that was required to fix the jemalloc issue on macOS. It shouldn't materially change the PR but it wasn't intentional. |
@MOZGIII will likely have more constructive feedback on the use of |
@leebenson no problem. |
It looks like it is too early to thing about a generic algorithm for all of the metrics that we gather - until we have an actual use case that would force us to think and understand how to use the metrics, we simply don't have the basis to root our assumptions on how the metrics should be processed. I hope this would be somewhat resolved after we take care of the grafana dashboard (#4838). Speaking of This problem needs further analysis. Quite possible we should prohibit the use of transforms as parts of other components as-is, and instead extract the core implementations of the reusable transforms into a set of building blocks that both respective transforms and other components can use. Then we'll only implement the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this approach is right. Maybe I don't understand what we're trying to solve here. I've explained my understanding of the problem above.
I'm looking forward to discussing this in more detail if needed.
@MOZGIII I'l try to explain how I've imagined
This can be achieved by aggregating counters for metrics that are the equal except in origin. For those metrics that shouldn't be double counted we choose one of the values, and the rest we add up.
I agree, but we can at least solve the double counting for those two metrics for current case.
That would be clearer, and we can relax that by only requiring For
Why would that be a problem? |
Got it, what you outlined matches what I have in mind. Great!
Yep, I saw this is what the code does currently, however it's not clear to me how do we pick the right one of all those counters. Looks like currently, it's an arbitrary pick, which doesn't really solve the issue - we might present the wrong values to the user. With the
Sounds good!
Precisely because of the duplicating emits. We have emits that we sort of tied to the component level entities ( As a side note, I realized there's no |
Then we are not just talking of prohibiting use of transforms, but also of sinks and sources. For extreme examples there are That's why for scenarios of where there are no metrics with Talking about this gave me an idea. We can handle metrics
Second is a simple change, and for first I don't know of any one source/sink that uses more than one, and if such case do arise we can deal with it later. The third point is the problematic one since it has fuzzy borders. This would achieve that only one No @MOZGIII what do you think? |
Ok, so I think the current approach in the PR is solid with its' main problem being the choice of counter when we don't have Plus, to ease the problem we can raise emitting of @MOZGIII what do you think? Do you see some additional issues? |
I think this was explored already and there were issues with this approach. CC @lukesteensen
Yeeah, for the lack of a better alternative - I guess this is fine for now. We should probably just wrap the "nested components" invocation with an extra span, adding a new label (for instance |
I guess the inline transforms will be an issue.
Yep, that's also my stance on this.
I'll explore if we can somehow automate this. |
It can at least be made pretty ergonomic, so if entering/exiting span for every event for every reused component isn't too costly this seams like the best option. Although I guess inline transforms will have this same problem/cost of spans so it won't be nothing new. @MOZGIII I agree, if you are for it we can try this approach. |
This I'm a bit worried about, and it seems like it might be relatively costly based on our other observations. We can try and compare the benches before/after. Other than that, I think this is the only correct solution on the table so far - so I am for it. |
Closing in favor of #5424 |
First part of solution for #4454
This is the more sensitive, disruptive part, so I would like to address it separately from the larger change of adding the label to all of the internal metrics.
PR aggregates same named and same tagged
Counter
metrics with different origin in the same component. Whereevents_processed_total
andprocessed_bytes_total
are aggregated:origin
the same ascomponent_type
, have priority, and the rest are aggregated by sum.
Open questions
origin
.