-
Notifications
You must be signed in to change notification settings - Fork 817
SIMD vectorization of Array.sum<int>, etc #18509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ge, Array.sum and Array.average to take advantage of vectorization in System.Linq.Enumerable module.
❗ Release notes required
|
src/FSharp.Core/array.fs
Outdated
[<CompiledName("Sum")>] | ||
let inline sumFloat (array: float array) : float = | ||
System.Linq.Enumerable.Sum array | ||
|
||
[<CompiledName("Sum")>] | ||
let inline sumFloat32 (array: float32 array) : float32 = | ||
System.Linq.Enumerable.Sum array | ||
|
||
[<CompiledName("Sum")>] | ||
let inline sumInt (array: int array) : int = | ||
System.Linq.Enumerable.Sum array | ||
|
||
[<CompiledName("Sum")>] | ||
let inline sumInt64 (array: int64 array) : int64 = | ||
System.Linq.Enumerable.Sum array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that this would be a reasonable place to use static optimization syntax to specify which types should delegate to the LINQ method and which to the existing code, e.g.,
let inline sum (array: ^T array) : ^T =
existingSumCode array
when ^T : int = System.Linq.Enumerable.Sum array
when ^T : int64 = System.Linq.Enumerable.Sum array
…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I expect "static optimization conditionals" are a compile-time thing and not runtime? Because I can't check easily with sharplab.io, it says "error FS0817: Static optimization conditionals are only for use within the F# library"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the appropriate implementation would be chosen at compile-time once the type parameter was resolved.
You could add some IL tests under https://github.com/dotnet/fsharp/tree/main/tests/FSharp.Compiler.ComponentTests/EmittedIL if you wanted.
It looks like the error case is now different for average over an empty collection. (we should not evaluate a seq before calling into Average, since this would be breaking in a different way) |
I did some initial tests to see if this makes sense at all or not. I used the current [<Benchmark>]
member x.ArraySum() = // Array sum
array
|> Array.sum
|> ignore
[<Benchmark>]
member x.ArrayAverage() = // There has an extra map, because average needs float
array
|> Array.map float
|> Array.average
|> ignore
[<Benchmark>]
member x.ArraySeqSum() = // Seq sum by using array as base data
array
|> Seq.sum
|> ignore
[<Benchmark>]
member x.ListSeqSum() = // Seq sum by using list as base data
list
|> Seq.sum
|> ignore And here are the results with current main:
Here are the results with this PR:
Edit: I used Net 9 but the FSharp.Core.dll netstandard2.0 version in both, which was probably a mistake because Spans are truely efficient on netstandard2.1 only (?) |
Hmm, I'm thinking if default FSI is compiled with debug mode and people use F# as a scripting language, then performance optimizations like this could be marked as I did run the same performance tests with FSharp.Core.dll netstandard2.1 version Main branch:
This PR
By quick look of this, the Sum seems to be improved but the Average not really. Should I revert the Average change? Edit: Average removed |
1/As per your benchmark, the cost for average might be dominated by doing the casts and additional array allocations. 2/The idea of using |
What are the numbers when running on the full clr? 4.6-4.8? Enumerable probably has different implementation there. |
Average will now throw different exception, so it's a breaking change at this point. |
@Thorium: |
I did check this, and the results were worse for Enumerable.Sum
This PR - with Enumerable.Sum
|
…nchmarks\CompiledCodeBenchmarks\MicroPerf\MicroPerf.fsproj
Because this is just a forwarding, behaviour difference is easy to explore already in FSI: [| 1.; Double.NaN |] |> Array.sum;;
//val it: float = nan
[| 1.; Double.NaN |] |> System.Linq.Enumerable.Sum;;
//val it: float = nan Seems to be the same with edge-cases like Double.MaxValue and infinity. |
I don't know how to easily run MicroPerf.fsproj in .NET, but I can run
// * Summary * BenchmarkDotNet v0.13.12, Windows 11 (10.0.26100.3915) Job=.NET Framework 4.8 Runtime=.NET Framework 4.8
Not exactly the results I was hoping for. |
Framework-dependent runtime switch would solve this (and since it is static, I would hope JIT would eliminate such branching), but is not something we are doing in FSharp.Core. On Desktop:
On Core:
It would be great if we could make use of https://learn.microsoft.com/en-us/dotnet/api/system.numerics.tensors.tensorprimitives.sum?view=net-9.0-pp#system-numerics-tensors-tensorprimitives-sum-1(system-readonlyspan((-0))) for some of those implementations. Right now, any such support depending on Tensors will have to be done as a separate lib outside of FSharp.Core . |
Yeah, that's what I suspected will happen :( Bcl in full framework doesn't have those functions vectorised. And the issue with fslib is that it ships as netstandard, so there's no way to tailor separate implementations for netfx and netcore. |
The
What do you mean by "On Desktop" and "On Core" ? I can see that in FSharp.Core.fsproj there is |
Reflection can likely neglect many optimizations (needs proving tho).
Desktop is usually referred when full framework is used, and core is when coreclr runtime and bcl are used.
These should already be defined. However they will be defined on both net48 and net9.0 |
I meant a statically stored flag based on something like: open System.Runtime.InteropServices
let runtimeDescription = RuntimeInformation.FrameworkDescription
let hasLinqAcceleration = runtimeDescription > "..." // string-based logic here And switching at runtime (so cannot be part of statically optimized e.g. top level function: |
Runtime check added. Compared to this This PR:
|
After |
src/FSharp.Core/array.fs
Outdated
let inline sum (array: ^T array) : ^T = | ||
classicSum array | ||
when ^T : float = | ||
if System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription.StartsWith ".NET Framework" then classicSum array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work.
Can you store it in a static boolean flag somewhere, is it possible?
src/FSharp.Core/array.fs
Outdated
@@ -1578,8 +1578,7 @@ module Array = | |||
checkNonNull "array" array | |||
Microsoft.FSharp.Primitives.Basics.Array.permute indexMap array | |||
|
|||
[<CompiledName("Sum")>] | |||
let inline sum (array: ^T array) : ^T = | |||
let inline private classicSum (array: ^T array) : ^T = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it sumImpl
to indicate it is the sum implemented here. Or maybe fsharpSumImpl
.
Would rather avoid the term classic
Is it vectorized as well? Is it doing copies first? I am curious. |
Oh, you are right, it isn't. That's sad, makes this very marginal feature especially when tools like FSharpLint prefer sumBy over (map >> sum) |
.NET Framework is not .NET Standard 2.1 compatible, thus .NET Standard 2.1 version shouldn't need runtime check.
|
Will all netstandard2.1 implementations be vectorized? If yes, then it's safe enough (and also it should be defined already). |
@@ -1588,6 +1587,32 @@ module Array = | |||
|
|||
acc | |||
|
|||
let isNetFramework = System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription.StartsWith ".NET Framework" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am stunned that this is seriously being considered, yet multitargeting with #if
keeps being dismissed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one of the issues with multitargeting in its current state is that where do we draw the support line? De we add LTS only (net10)? Or sts too (net9h). Or both LTS and STS (9 and 10)? What happens to them when new STS and LTS are released?
I personally dislike the runtime solution too and would much rather prefer proper fslib separation/layering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one of the issues with multitargeting in its current state is that where do we draw the support line?
That's easily done - whatever we (you) want the policy to be:) Thinking out loud:
- Keep
netstandard2.0
as a fallback until Framework goes out of support - Drop
netstandard2.1
- Floating "previous LTS" target - Add
net6.0
now, change it tonet8.0
when .NET 10 releases, then tonet10.0
with .NET 12 and so on- This guarantees that if you're targetting a supported TFM, you'll at worst "drop" down to here
- If you're for some reason releasing a .NET 6 application, you can use FSharp.Core 10, but you will go down the slower
netstandard2.0
compilation paths, which isn't the end of the world, or you can keep using FSharp.Core 9, which could be faster
- Optionally also add the latest version if there's a specific API that we could leverage to make something faster in FSharp.Core
fslib separation/layering.
I obviously don't know the specifics of what you're imagining implementationwise, but my immediate thought is that it's way too late for that. We might have been able to pull it of 15 years ago, but wouldn't splitting parts of FSharp.Core away into separate packages break half the ecosystem?
Since FSharp.Core is not part of the BCL (and yet releases in lockstep with it), I am actually very happy with the fact that I only have to reference an all-in-one package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it all makes sense. It's not up to me now though, maybe team will have time/plans in future to produce more TFMs and deprecate some (like ns21), so fslib can utilise modern BCL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess @vzarytovskii point of the slippery slope is that the boundaries would be enforced by documented coding conventions, not by any technical checks/force (like Fantomas). The outside "API" should stay the same.
I value very much that I can take whatever old project and an updated FSharp.Core just works (with binding redirects). Thus, I took very seriously Vlad's initial note that the .NET Framework is not as fast as the current implementation. F# has been used in old finance applications where the existing performance does matter.
That being said, from a library maintenance point of view, 2 different code-paths are 2 different edges between nodes, and a maintenance burden, no matter is it done on runtime or preprocessor directive (or other conditional compilation).
multitargeting with #if keeps being dismissed.
I would prefer this over runtime checks. But in the current F#, those are very noisy. If you even think of making them more common, then this or some other way (#match
?) should be implemented.
until Framework goes out of support
It'll probably outlive Cobol in lifespan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One good first step will be to throw away old vsix (which is 4.6) and have an oop one which can be on the latest supported TFM for vs (now it's net9).
This will allow to bump FCS to latest TFM as well as TP sdk. Might show some gotchas and caveats with release process and all.
Description
Specific overloads (float, float32, int, int64) of Seq.sum,
Seq.average,Array.sumand Array.averageto take advantage of vectorization in System.Linq.Enumerable module.This is potentially a naive first try to solve #16230 by the spirit of @T-Gro comment #16230 (comment)
Checklist