-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Prepare a Grafana Dashboard for internal_metrics
gathered via Prometheus
#4838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think in a Grafana only world - ignoring the Vector UI - you could have a "Vector System" dashboard with metrics from everything, and have builtin drillins/links to agent/aggregator dashboards with details specific to them. I don't know how different the metrics would be between those though, it might make more sense for it to be a selector on a single dashboard. Default show all, drop down to change that to agent or aggregator, additional drop down to select a single instance. I'd be happy to work on/collaborate on this since I expect we'll primarily use Grafana for viz. |
Hi, I noticed that 0.11 added a host metrics source, which I think is capable of replacing node_exporter. I would love to do so but having this popular dashboard https://grafana.com/grafana/dashboards/1860 available out of the box is really nice. I think it would help adoption of that source in particular if there were a similarly provided dashboard for vector. |
@nivekuil Thanks! That's on our agenda to look at soon! |
We've set up the k8s dev environment that unblocks this. The only caveat for working on the dashboard that remains is a potential data loss until we implement vectordotdev/vector-k8s-dev-env#7 - so please do manual backups of your Grafana dashboards (that would survive the whole EKS cluster removal - i.e. a local copy of the dashboard json) until we implement cluster-wide backups. |
I really like this idea. It would close a feature gap in our recent comparison with Vector and Fluent Bit. As Fluent Bit has a grafana dashboard. |
Taking inspiration by node_exporter dashboard is a good idea! |
I think that a good part of the work here would be identifying what metrics
need to be displayed to understand the health and performance of Vector in
it's different roles. For our use case, we are principally looking to have
vector act as a log shipper on kubernetes nodes to S3. We are going to
start going down the route of figuring out what metrics make sense in this
case, but if any one has any suggestions I'd love to get their insights.
Once we've figured something out, I'll update this issue either way.
…On Sat, Jan 23, 2021 at 1:25 PM Matteo Baiguini ***@***.***> wrote:
Taking inspiration by node_exporter dashboard
<https://grafana.com/grafana/dashboards/1860> is a good idea!
But as the vector dashboard has to be created from scratch, I think it
would be a good idea to base on the latest Grafana version (7.x) in order
to profit by new features!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4838 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG7U262PCYRLOD75IZNKALS3MIBZANCNFSM4THKN6HQ>
.
--
--
This message and any attachments are solely for the intended
recipient. If you are not the intended recipient, disclosure, copying, use,
or distribution of the information included in this message is prohibited
-- please immediately and permanently delete this message.
|
I think having a single dashboard to cover all usecases might be tricky. Including basic things about memory usage/cpu usage/disk usage for buffer(?) would be easy enough. Most/all stages should have a similar bytes/events processed (or at least there are outstanding issues to standardize some of the generic metrics) |
Right, I was sort of envisioning something like three dashboards to cover
the three general use cases of shipping, conversation, and indexing. IDK
exactly what any of that would look like yet :D
…On Mon, Jan 25, 2021 at 10:40 AM Spencer Gilbert ***@***.***> wrote:
I think having a single dashboard to cover all usecases might be tricky.
Including basic things about memory usage/cpu usage/disk usage for
buffer(?) would be easy enough.
Most/all stages should have a similar bytes/events processed (or at least
there are outstanding issues to standardize some of the generic metrics)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4838 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG7U2ZDZGJFPASFC7LDF23S3WGGNANCNFSM4THKN6HQ>
.
--
--
This message and any attachments are solely for the intended
recipient. If you are not the intended recipient, disclosure, copying, use,
or distribution of the information included in this message is prohibited
-- please immediately and permanently delete this message.
|
I think there still might be some awkwardness if there's a need for metrics that may be unique to particular sinks/sources |
In the next couple of weeks I will try to create a prototype :) Relating also to feature request #5363, can someone make me an example of |
Some searches that I am using:
There are some more use-cases that I have but there are no related metrics or I don't understand existing ones:
|
Related: #7971
I suspect today you'd need a third party exporter (kafka-exporter kminion lag-exporter). Though given how frequently kafka seems to be used in Vector setups, it may be nice to have a
|
@spencergilbert thank you for references. Ad. monitoring offset lag, I think it makes more sense to have this metric on client-side (eg. Kafka Connect also exposes this metric as a Kafka client). Similar case might also apply for other sources (eg. AWS SQS). To be able to simply determine status of queue independently on source - it could be single metric for all sources. |
do we have any update regarding a full dashboard? |
@jszwedko Does anyone from Vector team work on it? Do you need a help here? |
Hey @zamazan4ik - this issue isn't currently on our roadmap, but we'd be happy to review a community built dashboard. |
I'd love to see this for datadog |
Hey @mjperrone we're actually working on that right now, hopefully it won't be long until there's Vector Integration (dashboard, monitors, metrics)! |
That's exciting @spencergilbert. I can't wait! |
Do you work on some example dashboard for Vector? Just curious, could a dashboard on which you work right now, be useful for |
@zamazan4ik I'm not 100% sure what you're asking, but the dashboard is currently populated by metrics available from the |
I have written an initial (early work in progress) Grafana dashboard here: #14369 Please check it and leave their your thoughts about it. |
Has there been any traction with this issue? |
We would prefer to have this kind of work live in a separate community repository. As such, this issue will only be resolved there. Ref: #14369 (comment) |
@bruceg Creating |
No there is not. |
Made a dashboard here. But Iam only using vector buffer as in-memory. Beside that, it displays all components metrics. Hope this helps: |
Motivation
In the Kubernetes, Prometheus is usually used to gather metrics, and Grafana is used in conjunction to view the gathered metrics.
We want to provide an option to expose our internal metrics (
internal_metrics
source) viaprometheus
metrics sink out of the box when deploying Vector into Kubernetes environment (#3799), and a natural extension to that would be shipping a Grafana Dashboard out of the box as well.The end goal is to make it so that when the user deploys Vector with internal metrics enabled, a dashboard with all Vector metrics immediately appears at Grafana. Zero-configuration (on a preconfigured cluster, where Grafa Dashboard gathering is enabled) besides opting-in to exposing
internal_metrics
and picking the way to hook into the Prometheus scraping (prometheus
-native annotations orprometheus-operator
-poweredPodMonitor
/ServiceMonitor
).Design
There are a few unknowns so far:
What metrics to include in the dashboard?
processed_event_total
. But what else?How to organize it (in the context of Helm charts)?
vector-agent
,vector-aggregator
, etc charts?vector-grafana-dashboard
chart with a common dashboard, and make other charts depend on it?Basically, we can use either way, and we can pick which one to use after we figure out how do we want to architect the dashboard itself. Some design decision constraints may be dictated by the Helm charts layout too, so the additional investigation is necessary here.
We should reuse the dashboard from https://github.com/timberio/vector-grafana and make them work great for this use case. This should reduce the estimate, but we need to discuss it.
The text was updated successfully, but these errors were encountered: