Skip to content

Conversation

tetsuzo-usui
Copy link
Contributor

This pull request introduces a parallelized version of the [SD]LAED3 routine, a key component of the [SD]SYEVD eigensolver for symmetric matrices. OpenBLAS replaces certain LAPACK routines with custom-parallelized versions, and this PR aligns with that strategy.

The [SD]SYEVD routine consists of three main steps:

  1. Symmetric matrix tridiagonalization ([SD]SYTRD)
  2. Tridiagonal eigensolver ([SD]STEDC)
  3. Eigenvector transformation ([SD]ORMTR)

While PR #5221 improved [SD]SYTRD performance on arm64 by adding tuned [SD]SYMV kernels, this PR focuses on the second step, [SD]STEDC, by parallelizing the internal [SD]LAED3 routine.

Note that [SD]STEDC exhibits poorer scalability with increasing thread counts compared to [SD]SYTRD and [SD]ORMTR. As a result, the proportion of time spent in [SD]STEDC within [SD]SYEVD execution increases with higher thread counts as shown in the following graph.

DSYEVD_thread_scalability

The parallel [SD]LAED3 reduces the execution time of [SD]STEDC by approximately half in multi-threaded environments. This leads to an overall [SD]SYEVD performance improvement of 1.3x to 1.8x.

DSYEVDperf_withParallelDLAED3

I understand that improvements at the LAPACK level are relatively rare in OpenBLAS. Therefore, the parallel [SD]LAED3 implementation has been carefully designed to minimize impact on the library’s structure and to adhere to OpenBLAS’s existing thread management. The parallelization is achieved by setting the necessary parameters in the 'blas_queue_t' structure and calling 'exec_blas(num_cpu, queue)'.

@martin-frbg martin-frbg added this to the 0.3.31 milestone Jul 1, 2025
@martin-frbg martin-frbg merged commit 36c2589 into OpenMathLib:develop Jul 2, 2025
97 of 101 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants