Small performance optimizations #969

efaulhaber · 2025-10-30T14:28:19Z

This PR contains two small performance optimizations. The first is an algebraic simplification of the derivatives of the Wendland kernels and the normalization factors:

main:

julia> r = rand(SVector{3, Float64}); d = norm(r); h = 1.0; kernel = WendlandC2Kernel{3}();

julia> @b TrixiParticles.kernel_grad($kernel, $r, $d, $h) seconds=1
3.409 ns

julia> r = rand(SVector{2, Float64}); d = norm(r); h = 1.0; kernel = WendlandC2Kernel{2}();

julia> @b TrixiParticles.kernel_grad($kernel, $r, $d, $h) seconds=1
2.650 ns

With result simplified algebraically:

julia> r = rand(SVector{3, Float64}); d = norm(r); h = 1.0; kernel = WendlandC2Kernel{3}();

julia> @b TrixiParticles.kernel_grad($kernel, $r, $d, $h) seconds=1
3.006 ns

julia> r = rand(SVector{2, Float64}); d = norm(r); h = 1.0; kernel = WendlandC2Kernel{2}();

julia> @b TrixiParticles.kernel_grad($kernel, $r, $d, $h) seconds=1
2.573 ns

With normalization_factor simplified to 7 / (pi * h^2 * 4):

julia> r = rand(SVector{3, Float64}); d = norm(r); h = 1.0; kernel = WendlandC2Kernel{3}();

julia> @b TrixiParticles.kernel_grad($kernel, $r, $d, $h) seconds=1
2.843 ns

julia> r = rand(SVector{2, Float64}); d = norm(r); h = 1.0; kernel = WendlandC2Kernel{2}();

julia> @b TrixiParticles.kernel_grad($kernel, $r, $d, $h) seconds=1
2.557 ns

Interestingly, this difference is not measurable when benchmarking only kernel_deriv.

The second is a small optimization of the computation of v_max (apparently only relevant on the CPU).

julia> A = rand(3, 10_000_000);

julia> @b maximum(x -> sqrt(dot(x, x)), reinterpret(reshape, SVector{3, eltype($A)}, view($A, 1:3, :))) seconds=1
6.966 ms

julia> @b sqrt(maximum(x -> dot(x, x), reinterpret(reshape, SVector{3, eltype($A)}, view($A, 1:3, :)))) seconds=1
6.574 ms

julia> A = Metal.rand(3, 10_000_000);

julia> @b maximum(x -> sqrt(dot(x, x)), reinterpret(reshape, SVector{3, eltype($A)}, view($A, 1:3, :))) seconds=1
982.000 μs (664 allocs: 15.070 KiB)

julia> @b sqrt(maximum(x -> dot(x, x), reinterpret(reshape, SVector{3, eltype($A)}, view($A, 1:3, :)))) seconds=1
981.459 μs (664 allocs: 15.070 KiB)

Copilot

Pull Request Overview

This PR refactors smoothing kernel normalization factors and related computations to improve GPU performance and code clarity. The changes focus on simplifying arithmetic expressions to reduce instructions and improve readability.

Key Changes:

Simplified normalization factor expressions by consolidating divisions (e.g., a / b / c → a / (b * c))
Optimized v_max computation in particle shifting to compute squared magnitude first, then take square root
Simplified kernel derivative formulas for Wendland kernels by algebraically reducing expressions

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/schemes/fluid/shifting_techniques.jl	Optimized `v_max` calculation to compute maximum of squared velocities before taking square root
src/general/smoothing_kernels.jl	Simplified normalization factors and kernel derivatives across multiple kernel types (Schoenberg, Wendland, Poly6)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/general/smoothing_kernels.jl

efaulhaber · 2025-10-31T17:29:20Z

/run-gpu-tests

Small performance optimizations

0852ab0

efaulhaber added the performance label Oct 30, 2025

efaulhaber self-assigned this Oct 30, 2025

Optimize kernel normalization factors

f7964d1

efaulhaber force-pushed the small-optimizations branch from f228950 to f7964d1 Compare October 31, 2025 17:13

efaulhaber added 2 commits October 31, 2025 18:24

Reformat

04b5728

Merge branch 'main' into small-optimizations

8d63f2b

efaulhaber requested a review from Copilot October 31, 2025 17:25

Copilot AI reviewed Oct 31, 2025

View reviewed changes

src/general/smoothing_kernels.jl Outdated Show resolved Hide resolved

src/general/smoothing_kernels.jl Show resolved Hide resolved

Remove rational

3c576ba

efaulhaber requested review from LasNikas, Copilot and svchb and removed request for Copilot October 31, 2025 17:29

efaulhaber marked this pull request as ready for review October 31, 2025 17:29

LasNikas previously approved these changes Nov 1, 2025

View reviewed changes

Avoid rational in Schoenberg cubic spline kernel

087f2db

efaulhaber dismissed LasNikas’s stale review via 087f2db November 5, 2025 15:16

efaulhaber added 2 commits November 5, 2025 16:18

Merge branch 'main' into small-optimizations

7726274

Merge branch 'main' into small-optimizations

edfcb32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Small performance optimizations #969

Small performance optimizations #969

efaulhaber commented Oct 30, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

efaulhaber commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Small performance optimizations #969

Are you sure you want to change the base?

Small performance optimizations #969

Conversation

efaulhaber commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

efaulhaber commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

efaulhaber commented Oct 30, 2025 •

edited

Loading