Skip to content

Conversation

@ChALkeR
Copy link
Contributor

@ChALkeR ChALkeR commented Oct 18, 2025

This increases the size, but only a bit (unlike a full unroll, ref: #88 (comment)), and still keeps readability / auditability

The main optimization here is that you don't need a full B copy, only a few indices that intersect

This gives ~1.4x speedup in Node.js

Before (Node.js 22):

sha3_256(16) x 174,520 ops/sec @ 5μs/op (5μs..930μs)
sha3_512(16) x 177,022 ops/sec @ 5μs/op (5μs..97μs)
keccak_256(16) x 177,588 ops/sec @ 5μs/op (5μs..335μs)
keccak_512(16) x 177,683 ops/sec @ 5μs/op (5μs..100μs)
sha3_256(128) x 172,473 ops/sec @ 5μs/op (5μs..71μs)
sha3_512(128) x 89,095 ops/sec @ 11μs/op (11μs..135μs)
keccak_256(128) x 172,147 ops/sec @ 5μs/op (5μs..91μs)
keccak_512(128) x 89,421 ops/sec @ 11μs/op (11μs..364μs)
sha3_256(512) x 44,265 ops/sec @ 22μs/op (22μs..113μs)
sha3_512(512) x 22,739 ops/sec @ 43μs/op (43μs..199μs)
keccak_256(512) x 44,232 ops/sec @ 22μs/op (22μs..106μs)
keccak_512(512) x 22,713 ops/sec @ 44μs/op (43μs..167μs)
sha3_256(1024) x 22,274 ops/sec @ 44μs/op (44μs..140μs)
sha3_512(1024) x 12,161 ops/sec @ 82μs/op (81μs..169μs)
keccak_256(1024) x 22,280 ops/sec @ 44μs/op (44μs..132μs)
keccak_512(1024) x 12,157 ops/sec @ 82μs/op (81μs..208μs)
sha3_256(5120) x 4,720 ops/sec @ 211μs/op (210μs..366μs)
sha3_512(5120) x 2,537 ops/sec @ 394μs/op (391μs..567μs)
keccak_256(5120) x 4,733 ops/sec @ 211μs/op (210μs..394μs)
keccak_512(5120) x 2,535 ops/sec @ 394μs/op (391μs..551μs)
sha3_256(20480) x 1,191 ops/sec @ 839μs/op (834μs..1391μs)
sha3_512(20480) x 638 ops/sec @ 1568μs/op (1549μs..2ms)
keccak_256(20480) x 1,184 ops/sec @ 844μs/op (834μs..1377μs)
keccak_512(20480) x 638 ops/sec @ 1567μs/op (1549μs..2ms)
sha3_256(65536) x 371 ops/sec @ 2ms/op (2ms..3ms)
sha3_512(65536) x 200 ops/sec @ 4ms/op (4ms..5ms)
keccak_256(65536) x 372 ops/sec @ 2ms/op (2ms..3ms)
keccak_512(65536) x 200 ops/sec @ 4ms/op (4ms..5ms)

After (Node.js 22):

sha3_256(16) x 245,761 ops/sec @ 4μs/op (3μs..885μs)
sha3_512(16) x 249,377 ops/sec @ 4μs/op (3μs..88μs)
keccak_256(16) x 250,627 ops/sec @ 3μs/op (3μs..77μs)
keccak_512(16) x 250,627 ops/sec @ 3μs/op (3μs..91μs)
sha3_256(128) x 239,923 ops/sec @ 4μs/op (4μs..75μs)
sha3_512(128) x 126,358 ops/sec @ 7μs/op (7μs..131μs)
keccak_256(128) x 240,385 ops/sec @ 4μs/op (4μs..71μs)
keccak_512(128) x 126,326 ops/sec @ 7μs/op (7μs..86μs)
sha3_256(512) x 62,120 ops/sec @ 16μs/op (15μs..84μs)
sha3_512(512) x 32,258 ops/sec @ 31μs/op (30μs..152μs)
keccak_256(512) x 61,816 ops/sec @ 16μs/op (15μs..181μs)
keccak_512(512) x 32,174 ops/sec @ 31μs/op (30μs..144μs)
sha3_256(1024) x 31,422 ops/sec @ 31μs/op (31μs..113μs)
sha3_512(1024) x 17,243 ops/sec @ 57μs/op (57μs..197μs)
keccak_256(1024) x 31,282 ops/sec @ 31μs/op (31μs..101μs)
keccak_512(1024) x 17,279 ops/sec @ 57μs/op (57μs..196μs)
sha3_256(5120) x 6,673 ops/sec @ 149μs/op (148μs..303μs)
sha3_512(5120) x 3,621 ops/sec @ 276μs/op (274μs..453μs)
keccak_256(5120) x 6,686 ops/sec @ 149μs/op (148μs..388μs)
keccak_512(5120) x 3,613 ops/sec @ 276μs/op (274μs..452μs)
sha3_256(20480) x 1,683 ops/sec @ 594μs/op (588μs..1216μs)
sha3_512(20480) x 912 ops/sec @ 1096μs/op (1084μs..1667μs)
keccak_256(20480) x 1,685 ops/sec @ 593μs/op (588μs..1204μs)
keccak_512(20480) x 912 ops/sec @ 1096μs/op (1084μs..1595μs)
sha3_256(65536) x 527 ops/sec @ 1897μs/op (1878μs..2ms)
sha3_512(65536) x 285 ops/sec @ 3ms/op (3ms..4ms)
keccak_256(65536) x 524 ops/sec @ 1909μs/op (1877μs..2ms)
keccak_512(65536) x 286 ops/sec @ 3ms/op (3ms..4ms)
And ~1.2-1.3x speedup in Hermes

Before (Hermes on M3):

sha3_256(16) x 3878 ops/sec @ 257μs/op (0ns..1000μs)
sha3_512(16) x 3856 ops/sec @ 259μs/op (0ns..1000μs)
keccak_256(16) x 3854 ops/sec @ 259μs/op (0ns..1000μs)
keccak_512(16) x 3887 ops/sec @ 257μs/op (0ns..1000μs)
sha3_256(128) x 3489 ops/sec @ 286μs/op (0ns..1000μs)
sha3_512(128) x 1865 ops/sec @ 536μs/op (0ns..1000μs)
keccak_256(128) x 3772 ops/sec @ 265μs/op (0ns..1000μs)
keccak_512(128) x 1890 ops/sec @ 529μs/op (0ns..1000μs)
sha3_256(512) x 913 ops/sec @ 1095μs/op (999μs..2ms)
sha3_512(512) x 458 ops/sec @ 2ms/op (1999μs..3ms)
keccak_256(512) x 915 ops/sec @ 1092μs/op (999μs..2ms)
keccak_512(512) x 453 ops/sec @ 2ms/op (1999μs..3ms)
sha3_256(1024) x 452 ops/sec @ 2ms/op (1999μs..3ms)
sha3_512(1024) x 242 ops/sec @ 4ms/op (3ms..5ms)
keccak_256(1024) x 456 ops/sec @ 2ms/op (1999μs..3ms)
keccak_512(1024) x 242 ops/sec @ 4ms/op (3ms..7ms)
sha3_256(5120) x 94 ops/sec @ 10ms/op (9ms..11ms)
sha3_512(5120) x 50 ops/sec @ 19ms/op (19ms..20ms)
keccak_256(5120) x 95 ops/sec @ 10ms/op (9ms..11ms)
keccak_512(5120) x 50 ops/sec @ 20ms/op (18ms..21ms)
sha3_256(20480) x 24 ops/sec @ 42ms/op (41ms..43ms)
sha3_512(20480) x 13 ops/sec @ 78ms/op (78ms..79ms)
keccak_256(20480) x 24 ops/sec @ 42ms/op (41ms..43ms)
keccak_512(20480) x 13 ops/sec @ 78ms/op (77ms..79ms)
sha3_256(65536) x 7.4 ops/sec @ 134ms/op (133ms..135ms)
sha3_512(65536) x 4.0 ops/sec @ 251ms/op (250ms..252ms)
keccak_256(65536) x 7.4 ops/sec @ 135ms/op (133ms..136ms)
keccak_512(65536) x 4.0 ops/sec @ 251ms/op (251ms..252ms)

After (Hermes on M3):

sha3_256(16) x 5037 ops/sec @ 198μs/op (0ns..1000μs)
sha3_512(16) x 4974 ops/sec @ 201μs/op (0ns..1000μs)
keccak_256(16) x 4958 ops/sec @ 201μs/op (0ns..1000μs)
keccak_512(16) x 4801 ops/sec @ 208μs/op (0ns..4ms)
sha3_256(128) x 4852 ops/sec @ 206μs/op (0ns..1000μs)
sha3_512(128) x 2387 ops/sec @ 419μs/op (0ns..1000μs)
keccak_256(128) x 4785 ops/sec @ 208μs/op (0ns..1000μs)
keccak_512(128) x 2392 ops/sec @ 418μs/op (0ns..1000μs)
sha3_256(512) x 1153 ops/sec @ 867μs/op (0ns..1000μs)
sha3_512(512) x 569 ops/sec @ 1757μs/op (999μs..2ms)
keccak_256(512) x 1146 ops/sec @ 872μs/op (0ns..1000μs)
keccak_512(512) x 567 ops/sec @ 1762μs/op (999μs..2ms)
sha3_256(1024) x 562 ops/sec @ 1777μs/op (999μs..2ms)
sha3_512(1024) x 300 ops/sec @ 3ms/op (2ms..4ms)
keccak_256(1024) x 565 ops/sec @ 1768μs/op (999μs..2ms)
keccak_512(1024) x 299 ops/sec @ 3ms/op (2ms..4ms)
sha3_256(5120) x 116 ops/sec @ 8ms/op (8ms..9ms)
sha3_512(5120) x 62 ops/sec @ 16ms/op (16ms..17ms)
keccak_256(5120) x 116 ops/sec @ 8ms/op (8ms..9ms)
keccak_512(5120) x 61 ops/sec @ 16ms/op (16ms..17ms)
sha3_256(20480) x 29 ops/sec @ 34ms/op (33ms..35ms)
sha3_512(20480) x 16 ops/sec @ 64ms/op (64ms..66ms)
keccak_256(20480) x 29 ops/sec @ 34ms/op (33ms..36ms)
keccak_512(20480) x 16 ops/sec @ 64ms/op
sha3_256(65536) x 9.1 ops/sec @ 109ms/op (108ms..111ms)
sha3_512(65536) x 4.9 ops/sec @ 205ms/op (204ms..206ms)
keccak_256(65536) x 9.1 ops/sec @ 109ms/op (108ms..110ms)
keccak_512(65536) x 4.9 ops/sec @ 206ms/op (204ms..208ms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant