feat(mma): add fp16@fp16->fp32 mma and unit tests #101

liyanc · 2025-03-06T20:25:21Z

feat(mma): add half-precision MMAs for automotive devices and training

Add FP16 variants of matrix multiply operations benefiting non-Hopper
devices including NVIDIA Orin (sm_87) and Ada Lovelace (sm_89) automotive
edge devices. In addition, backward passes during training can benefit from
higher precisions. These variants provide higher precision compared to BF16
when needed.

Key changes:

Add mma.sync.aligned.m16n8k16.row.col.f32.f16.f16.f32 interfaces
Add mma_ interfaces for register tiles rt_base<half,...>
Add half-precision MMA implementations for all matrix operation patterns
(AB, ABt, AtB, AtBt)
Add corresponding unit tests to verify correctness

Tested on NVIDIA A100, Ada, and H100 platforms.

Add mma with half-precision (FP16) inputs and fp32 accumulators for `mma.sync.aligned` instructions including: - mma_AB_base - mma_ABt_base - mma_AtB_base - mma_AtBt_base Add unit tests to ensure the correctness: - test_mma_AB_half - test_mma_ABt_half - test_mma_AtB_half - test_mma_AtBt_half

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(mma): add fp16@fp16->fp32 mma and unit tests #101

feat(mma): add fp16@fp16->fp32 mma and unit tests #101

Uh oh!

liyanc commented Mar 6, 2025

Uh oh!

Uh oh!

feat(mma): add fp16@fp16->fp32 mma and unit tests #101

Are you sure you want to change the base?

feat(mma): add fp16@fp16->fp32 mma and unit tests #101

Uh oh!

Conversation

liyanc commented Mar 6, 2025

Uh oh!

Uh oh!