Skip to content

Conversation

@fps
Copy link

@fps fps commented Jul 18, 2025

This saves between 0.5 and 1% of cpu (relative to without this change). In my own implementation I have it as a template parameter [1][2]. But I wasn't sure how to add it to yours. I guess the 20 comparisons (one for each layer) might pale in comparison to the matrix multiplication and addition that is saved. But possibly making it a template parameter might save another small fraction of a percent :)

[1] https://github.com/fps/anna/blob/master/examples/nam_wavenet.cpp
[2] https://github.com/fps/anna/blob/master/include/anna/nam.hpp

@mikeoliphant
Copy link
Owner

Yeah, the outputs from the last layer aren't used. Not a big performance impact, but every bit helps.

I'm inclined to try to do this with a template parameter (although it is unlikely to make much performance impact).

Should be able to get rid of the "arrayOutputs" matrix as well, since it is just there to catch the unused output from the last layer.

@fps
Copy link
Author

fps commented Jul 24, 2025

Here's another small optimization opportunity:

headArray.setZero();
sets the headArray to zero. The first WaveNetLayerT could instead use = instead of +=. This would save the zeroing. Again this could be done with a template parameter and an if constexpr(first). I have done so in my implementation:

https://github.com/fps/anna/blob/cdef14fe0b9e3bf96aef1b0b492e771071710348/include/anna/nam.hpp#L68

I'm still not beating yours though. Probably because of my (more costly) buffer handling ;)

@mikeoliphant
Copy link
Owner

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants