Skip to content

Conversation

neildhar
Copy link
Contributor

@neildhar neildhar commented Oct 9, 2025

Enabling VECT_MUL previously caused a regression, but this seems to come from the fma vectorisation in particular. Enable it just for multiplication for now, which seems to be a performance win.

Enabling `VECT_MUL` previously caused a regression, but this seems to
come from the fma vectorisation in particular. Enable it just for
multiplication for now, which seems to be a performance win.
if VECT_MUL:
# TODO: Figure out why vector FMA slows things down.
if VECT_MUL and False:
qk = _fma_f32x2(qk, qk_scale, -m_ij[:, None])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should make VECT_MUL an integer with each bit representing one vectorization so we can autotune.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants