Add CUDAGuard to ensure correct device #5113

cthi · 2025-11-11T15:14:42Z

If the input tensors use a device that differs from the current device, it would cause the wrong device to be used for things such as workspace allocation (when using cutlass::device_memory::allocation) and kernel to run on the wrong stream. Either would break the kernel. As a fix we add the CUDAGuard to ensure correct device is used.

cutlass::device_memory::allocation is a wrapper around cudaMalloc, but this would bypass PyTorch CCA. We replace all usages with torch tensor allocation instead which would be less error prone and allow proper memory reuse.

Differential Revision: D86768064

netlify · 2025-11-11T15:14:48Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`722f8b6`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/69138f65a8a80c00083158a9
😎 Deploy Preview	https://deploy-preview-5113--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

meta-codesync · 2025-11-11T15:14:49Z

@cthi has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86768064.

Summary: X-link: facebookresearch/FBGEMM#2119 If the input tensors use a device that differs from the current device, it would cause the wrong device to be used for things such as workspace allocation (when using `cutlass::device_memory::allocation`) and kernel to run on the wrong stream. Either would break the kernel. As a fix we add the `CUDAGuard` to ensure correct device is used. - `cutlass::device_memory::allocation` is a wrapper around [`cudaMalloc`](https://github.com/NVIDIA/cutlass/blob/2252254ce2c3f11ef5cfff9721ebbe7bd62cf8cb/tools/util/include/cutlass/util/device_memory.h#L56), but this would bypass PyTorch CCA. We replace all usages with torch tensor allocation instead which would be less error prone and allow proper memory reuse. Differential Revision: D86768064

meta-codesync · 2025-11-12T06:06:48Z

This pull request has been merged in 62bdc5f.

meta-cla bot added the cla signed label Nov 11, 2025

meta-codesync bot added fb-exported meta-exported labels Nov 11, 2025

cthi changed the title ~~Use torch allocation instead of cutlass::device_memory::allocation~~ Add CUDAGuard to ensure correct device Nov 11, 2025

cthi force-pushed the export-D86768064 branch from c611606 to 722f8b6 Compare November 11, 2025 19:32

meta-codesync bot closed this in 62bdc5f Nov 12, 2025

facebook-github-bot added the Merged label Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add CUDAGuard to ensure correct device #5113

Add CUDAGuard to ensure correct device #5113

cthi commented Nov 11, 2025 •

edited

Loading

Uh oh!

netlify bot commented Nov 11, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Nov 11, 2025

Uh oh!

meta-codesync bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add CUDAGuard to ensure correct device #5113

Add CUDAGuard to ensure correct device #5113

Conversation

cthi commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

meta-codesync bot commented Nov 11, 2025

Uh oh!

meta-codesync bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cthi commented Nov 11, 2025 •

edited

Loading

netlify bot commented Nov 11, 2025 •

edited

Loading