Improve [SD]SYEVD performance by parallelizing [SD]LAED3 #5355

tetsuzo-usui · 2025-07-01T14:02:45Z

This pull request introduces a parallelized version of the [SD]LAED3 routine, a key component of the [SD]SYEVD eigensolver for symmetric matrices. OpenBLAS replaces certain LAPACK routines with custom-parallelized versions, and this PR aligns with that strategy.

The [SD]SYEVD routine consists of three main steps:

Symmetric matrix tridiagonalization ([SD]SYTRD)
Tridiagonal eigensolver ([SD]STEDC)
Eigenvector transformation ([SD]ORMTR)

While PR #5221 improved [SD]SYTRD performance on arm64 by adding tuned [SD]SYMV kernels, this PR focuses on the second step, [SD]STEDC, by parallelizing the internal [SD]LAED3 routine.

Note that [SD]STEDC exhibits poorer scalability with increasing thread counts compared to [SD]SYTRD and [SD]ORMTR. As a result, the proportion of time spent in [SD]STEDC within [SD]SYEVD execution increases with higher thread counts as shown in the following graph.

The parallel [SD]LAED3 reduces the execution time of [SD]STEDC by approximately half in multi-threaded environments. This leads to an overall [SD]SYEVD performance improvement of 1.3x to 1.8x.

I understand that improvements at the LAPACK level are relatively rare in OpenBLAS. Therefore, the parallel [SD]LAED3 implementation has been carefully designed to minimize impact on the library’s structure and to adhere to OpenBLAS’s existing thread management. The parallelization is achieved by setting the necessary parameters in the 'blas_queue_t' structure and calling 'exec_blas(num_cpu, queue)'.

Add parallel laed3

14107e3

martin-frbg added this to the 0.3.31 milestone Jul 1, 2025

martin-frbg merged commit 36c2589 into OpenMathLib:develop Jul 2, 2025
97 of 101 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve [SD]SYEVD performance by parallelizing [SD]LAED3 #5355

Improve [SD]SYEVD performance by parallelizing [SD]LAED3 #5355

Uh oh!

tetsuzo-usui commented Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!

Improve [SD]SYEVD performance by parallelizing [SD]LAED3 #5355

Improve [SD]SYEVD performance by parallelizing [SD]LAED3 #5355

Uh oh!

Conversation

tetsuzo-usui commented Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!