-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Open
Labels
docsDocumentation relatedDocumentation relatedneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainers
Description
📚 Documentation
I only know of two places where DeviceStatsMonitor() is mentioned in the docs: here and here. Neither place documents what any of the metrics it logs mean!
- e.g. "active.all.current" what does active mean? is this memory or compute? what units is it in?
- or "active.large_pool.current" what is the large vs small pool?
- Furthermore here it says "ensure that you’re using the full capacity of your accelerator (GPU/TPU/HPU). This can be measured with the DeviceStatsMonitor()" this implies it can measure GPU utilization but it does not say which metric records that.
jpata
Metadata
Metadata
Assignees
Labels
docsDocumentation relatedDocumentation relatedneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainers