Fix cutlass_blackwell_fmha_custom_op and add comprehensive FMHA tests #5108

jsisometa · 2025-11-10T18:55:05Z

Summary:
This diff fixes the cutlass_blackwell_fmha_custom_op.py to be fully functional and adds comprehensive testing for Blackwell FMHA (Fused Multi-Head Attention).

Changes Made:

1. Fixed `cutlass_blackwell_fmha_custom_op.py`

Added missing parameters to fmha_fwd: page_table, seqlen_k, window_size_left, window_size_right, bottom_right
Added missing parameters to fmha_bwd: softmax_scale, window_size_left, window_size_right, bottom_right, deterministic
Fixed parameter type issues: torch.ops.fbgemm.fmha_fwd/bwd expect int and bool types, not Optional[int] or Optional[bool]
Added proper default value handling:
- window_size_left = -1 (default for no left window)
- window_size_right = -1 (default for no right window)
- bottom_right = True (default)
- deterministic = False (default)
Updated _backward, _setup_context, and wrapper functions to properly pass all parameters
The custom op now correctly wraps torch.ops.fbgemm.fmha_fwd and torch.ops.fbgemm.fmha_bwd

2. Created `blackwell_fmha.py` Test File

Structured following blackwell_gdpa.py as reference
Uses cutlass_blackwell_fmha_custom_op (Cutlass implementation) for forward and backward passes
Compares against jagged_flash_attention_v2 (Triton JFA v2 implementation)
Tests BF16 dtype only (as specified)
Tests both forward outputs and backward gradients (dq, dk, dv)
Runs 10 random test configurations with varying batch sizes, sequence lengths, and number of heads
Uses generate_jagged_data utility for proper test data generation

3. Updated BUCK Dependencies

Changed from //ads_mkl/ops:jfa to //ads_mkl/ops/triton:triton_jfa_v2
Added //ads_mkl/ops/utils:jfa_utils for data generation utilities
Changed from blackwell_attention_ops_gpu to blackwell_attention to include Python bindings

Generated by Confucius Code Assist (CCA)
Session, Trace

Differential Revision: D86583157

netlify · 2025-11-10T18:55:10Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`bed170c`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/691364b0918a1a0008f92021
😎 Deploy Preview	https://deploy-preview-5108--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

meta-codesync · 2025-11-10T18:55:14Z

@jsisometa has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86583157.

…pytorch#5108) Summary: X-link: facebookresearch/FBGEMM#2113 This diff fixes the cutlass_blackwell_fmha_custom_op.py to be fully functional and adds comprehensive testing for Blackwell FMHA (Fused Multi-Head Attention). ## Changes Made: ### 1. Fixed `cutlass_blackwell_fmha_custom_op.py` - Added missing parameters to `fmha_fwd`: `page_table`, `seqlen_k`, `window_size_left`, `window_size_right`, `bottom_right` - Added missing parameters to `fmha_bwd`: `softmax_scale`, `window_size_left`, `window_size_right`, `bottom_right`, `deterministic` - Fixed parameter type issues: `torch.ops.fbgemm.fmha_fwd/bwd` expect `int` and `bool` types, not `Optional[int]` or `Optional[bool]` - Added proper default value handling: - `window_size_left = -1` (default for no left window) - `window_size_right = -1` (default for no right window) - `bottom_right = True` (default) - `deterministic = False` (default) - Updated `_backward`, `_setup_context`, and wrapper functions to properly pass all parameters - The custom op now correctly wraps `torch.ops.fbgemm.fmha_fwd` and `torch.ops.fbgemm.fmha_bwd` ### 2. Created `blackwell_fmha.py` Test File - Structured following `blackwell_gdpa.py` as reference - Uses `cutlass_blackwell_fmha_custom_op` (Cutlass implementation) for forward and backward passes - Compares against `jagged_flash_attention_v2` (Triton JFA v2 implementation) - Tests BF16 dtype only (as specified) - Tests both forward outputs and backward gradients (dq, dk, dv) - Runs 10 random test configurations with varying batch sizes, sequence lengths, and number of heads - Uses `generate_jagged_data` utility for proper test data generation ### 3. Updated BUCK Dependencies - Changed from `//ads_mkl/ops:jfa` to `//ads_mkl/ops/triton:triton_jfa_v2` - Added `//ads_mkl/ops/utils:jfa_utils` for data generation utilities - Changed from `blackwell_attention_ops_gpu` to `blackwell_attention` to include Python bindings --- > Generated by [Confucius Code Assist (CCA)](https://www.internalfb.com/wiki/Confucius/Analect/Shared_Analects/Confucius_Code_Assist_(CCA)/) [Session](https://www.internalfb.com/confucius?session_id=96622022-bc27-11f0-bdba-7c8c09f29af2&tab=Chat), [Trace](https://www.internalfb.com/confucius?session_id=96622022-bc27-11f0-bdba-7c8c09f29af2&tab=Trace) Differential Revision: D86583157

meta-codesync · 2025-11-12T17:42:10Z

This pull request has been merged in 6350109.

meta-cla bot added the cla signed label Nov 10, 2025

meta-codesync bot added fb-exported meta-exported labels Nov 10, 2025

jsisometa force-pushed the export-D86583157 branch from d7b1d11 to bed170c Compare November 11, 2025 16:30

meta-codesync bot closed this in 6350109 Nov 12, 2025

facebook-github-bot added the Merged label Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix cutlass_blackwell_fmha_custom_op and add comprehensive FMHA tests #5108

Fix cutlass_blackwell_fmha_custom_op and add comprehensive FMHA tests #5108

Uh oh!

jsisometa commented Nov 10, 2025

Uh oh!

netlify bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Nov 10, 2025

Uh oh!

meta-codesync bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix cutlass_blackwell_fmha_custom_op and add comprehensive FMHA tests #5108

Fix cutlass_blackwell_fmha_custom_op and add comprehensive FMHA tests #5108

Uh oh!

Conversation

jsisometa commented Nov 10, 2025

Changes Made:

1. Fixed cutlass_blackwell_fmha_custom_op.py

2. Created blackwell_fmha.py Test File

3. Updated BUCK Dependencies

Uh oh!

netlify bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

meta-codesync bot commented Nov 10, 2025

Uh oh!

meta-codesync bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Fixed `cutlass_blackwell_fmha_custom_op.py`

2. Created `blackwell_fmha.py` Test File

netlify bot commented Nov 10, 2025 •

edited

Loading