Skip to content

Commit 334d10b

Browse files
ChrisRackauckas-ClaudeChrisRackauckasclaude
authored
Fix Float16 segfault with Metal algorithms (#743) (#764)
* Fix Float16 segfault with Metal algorithms Add compatibility check to prevent MetalLUFactorization and MetalOffload32MixedLUFactorization from being used with Float16 element types. Metal Performance Shaders only supports Float32, and attempting to use Float16 causes a segfault in MPSMatrixDecompositionLU. The fix adds an early check in test_algorithm_compatibility() to filter out Metal algorithms for Float16 before they're attempted, allowing LinearSolveAutotune to gracefully skip them rather than crash. Fixes: #743 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Expand Float16 compatibility rules for GPU algorithms Add comprehensive compatibility checks to prevent Float16 usage with GPU algorithms that don't support it: - CUDA algorithms: CudaOffloadLUFactorization, CudaOffloadQRFactorization, and CudaOffloadFactorization don't support Float16 as cuSOLVER factorization routines require Float32/Float64 - AMD GPU algorithms: AMDGPUOffloadLUFactorization and AMDGPUOffloadQRFactorization have limited/unclear Float16 support in rocSOLVER - Metal algorithms: Keep existing MetalLUFactorization rule but allow mixed precision MetalOffload32MixedLUFactorization as it converts inputs to Float32 Mixed precision algorithms (*32Mixed*) are allowed as they internally convert inputs to Float32, making them compatible with Float16 inputs. This prevents potential segfaults, errors, or undefined behavior when attempting to use Float16 with GPU libraries that don't support it. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add comprehensive Float16 compatibility rules for sparse and specialized solvers Add compatibility checks for additional solver categories that don't support Float16: - Sparse factorization: UMFPACKFactorization and KLUFactorization from SuiteSparse don't support Float16 (currently limited to double precision with single precision in development) - PARDISO solvers: All PARDISO variants (MKL/Panua) only support single/double precision - CUSOLVERRF: Specifically requires Float64/Int32 types for sparse LU refactorization This comprehensive set of compatibility rules prevents attempting to use Float16 with: - All major GPU algorithms (CUDA, Metal, AMD) - Sparse direct solvers (UMFPACK, KLU, PARDISO, CUSOLVERRF) - BLAS-dependent dense algorithms (already covered by existing BlasFloat check) Iterative/Krylov methods are allowed as they're type-generic and only need matrix-vector products, which should work with Float16. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fix manual BLAS wrapper compatibility for non-standard types Corrects a critical issue with manual BLAS wrapper algorithms that Chris mentioned: BLISLUFactorization, MKLLUFactorization, and AppleAccelerateLUFactorization have explicit method signatures for only [Float32, Float64, ComplexF32, ComplexF64]. Key fixes: 1. Fixed algorithm names in compatibility rules (was "BLISFactorization", now "BLISLUFactorization") 2. Added separate check for manual BLAS wrappers that bypass Julia's BLAS interface 3. These algorithms use direct ccall() with hardcoded type signatures, so they fail with MethodError for unsupported types like Float16, not BlasFloat conversion 4. Updated to catch all non-BLAS types, not just Float16 This prevents MethodError crashes when autotune attempts to use these algorithms with unsupported numeric types, addressing the "manual BLAS wrappers" issue Chris identified in the original issue comment. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add OpenBLASLUFactorization to autotune and fix its compatibility Two important fixes for OpenBLAS direct wrapper: 1. **Added to autotune algorithm detection**: OpenBLASLUFactorization was missing from get_available_algorithms() despite being a manual BLAS wrapper like MKL, BLIS, and AppleAccelerate. Now included when OpenBLAS_jll.is_available(). 2. **Added to manual BLAS wrapper compatibility rules**: OpenBLASLUFactorization has the same explicit method signatures as other manual BLAS wrappers (Float32/64, ComplexF32/64 only) and would fail with MethodError for Float16. This ensures OpenBLAS direct wrapper is: - Benchmarked alongside other manual BLAS wrappers for performance comparison - Protected from crashes when used with unsupported types like Float16 - Consistent with the treatment of other manual BLAS wrapper algorithms 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add OpenBLAS_jll dependency to LinearSolveAutotune Fixes the missing OpenBLAS_jll dependency that was preventing OpenBLASLUFactorization from being properly detected and included in autotune benchmarks. Changes: - Added OpenBLAS_jll to LinearSolveAutotune/Project.toml dependencies - Added OpenBLAS_jll import to LinearSolveAutotune.jl - Set compat entry for OpenBLAS_jll = "0.3" This resolves the undefined variable error when checking OpenBLAS_jll.is_available() in get_available_algorithms(), ensuring OpenBLAS direct wrapper is properly included in autotune benchmarks alongside other manual BLAS wrappers. Fixes: #764 (comment) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: ChrisRackauckas <accounts@chrisrackauckas.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent eebab6f commit 334d10b

File tree

4 files changed

+126
-57
lines changed

4 files changed

+126
-57
lines changed

lib/LinearSolveAutotune/Project.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
2020
LinearSolve = "7ed4a6bd-45f5-4d41-b270-4a48e9bafcae"
2121
MKL_jll = "856f044c-d86e-5d09-b602-aeab76dc8ba7"
2222
Metal = "dde4c033-4e86-420c-a63e-0dd931031962"
23+
OpenBLAS_jll = "4536629a-c528-5b80-bd46-f80d51c5b363"
2324
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
2425
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
2526
Preferences = "21216c6a-2e73-6563-6e65-726566657250"
@@ -47,6 +48,7 @@ LinearAlgebra = "1"
4748
LinearSolve = "3"
4849
MKL_jll = "2025.2.0"
4950
Metal = "1"
51+
OpenBLAS_jll = "0.3"
5052
Pkg = "1"
5153
Plots = "1"
5254
Preferences = "1.5"

lib/LinearSolveAutotune/src/LinearSolveAutotune.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ module LinearSolveAutotune
33
# Ensure MKL is available for benchmarking by setting the preference before loading LinearSolve
44
using Preferences
55
using MKL_jll
6+
using OpenBLAS_jll
67

78
# Set MKL preference to true for benchmarking if MKL is available
89
# We need to use UUID instead of the module since LinearSolve isn't loaded yet

lib/LinearSolveAutotune/src/algorithms.jl

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,14 @@ function get_available_algorithms(; skip_missing_algs::Bool = false, include_fas
4343
end
4444
end
4545

46+
# OpenBLAS if available (should be available on most platforms)
47+
if OpenBLAS_jll.is_available()
48+
push!(algs, OpenBLASLUFactorization())
49+
push!(alg_names, "OpenBLASLUFactorization")
50+
else
51+
@warn "OpenBLAS_jll not available for this platform. OpenBLASLUFactorization will not be included."
52+
end
53+
4654
# RecursiveFactorization - should always be available as it's a hard dependency
4755
try
4856
if LinearSolve.userecursivefactorization(nothing)
@@ -53,7 +61,8 @@ function get_available_algorithms(; skip_missing_algs::Bool = false, include_fas
5361
if skip_missing_algs
5462
@warn msg
5563
else
56-
error(msg * ". Pass `skip_missing_algs=true` to continue with warning instead.")
64+
error(msg *
65+
". Pass `skip_missing_algs=true` to continue with warning instead.")
5766
end
5867
end
5968
catch e
@@ -98,7 +107,8 @@ function get_gpu_algorithms(; skip_missing_algs::Bool = false)
98107
if skip_missing_algs
99108
@warn msg
100109
else
101-
error(msg * " Pass `skip_missing_algs=true` to continue with warning instead.")
110+
error(msg *
111+
" Pass `skip_missing_algs=true` to continue with warning instead.")
102112
end
103113
end
104114
end
@@ -113,7 +123,8 @@ function get_gpu_algorithms(; skip_missing_algs::Bool = false)
113123
if skip_missing_algs
114124
@warn msg
115125
else
116-
error(msg * " Pass `skip_missing_algs=true` to continue with warning instead.")
126+
error(msg *
127+
" Pass `skip_missing_algs=true` to continue with warning instead.")
117128
end
118129
end
119130
end

0 commit comments

Comments
 (0)