Skip to content

More string optimizations #18546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

brianrourkeboll
Copy link
Contributor

@brianrourkeboll brianrourkeboll commented May 14, 2025

Description

(This might need an RFC? It also might be too much of a hack to accept... But it does work.)

This PR follows and improves upon #9549. It improves the string function's implementation for signed integer types (sbyte, int16, int32, int64) and enum types based on signed integer types by directly calling the appropriate ToString overload on the underlying type. The boxing and casting of the previous implementation (see #1714, #9153) are now avoided altogether when the type is known at compile-time. All existing culture-invariant and culture-dependent behavior is preserved.

This is done in a backwards- and forwards-compatibile way by working within the confines of the existing when 'T : … library-only static optimization construct, avoiding the need to extend that feature with new syntax or constraints (as suggested in #9594) or to introduce a new construct to the pickling format. That is, this and newer compilers will still be able to compile code using older FSharp.Core versions, while older F# compilers will be able to consume this and newer FSharp.Core versions without any compile-time or runtime breaking changes.

Example of IL before

.method public static string  'string int32'(int32 'value') cil managed
{

  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  box        [runtime]System.Int32
  IL_0006:  unbox.any  [runtime]System.IFormattable
  IL_000b:  ldnull
  IL_000c:  call       class [netstandard]System.Globalization.CultureInfo [netstandard]System.Globalization.CultureInfo::get_InvariantCulture()
  IL_0011:  tail.
  IL_0013:  callvirt   instance string [netstandard]System.IFormattable::ToString(string,
                                                                                  class [netstandard]System.IFormatProvider)
  IL_0018:  ret
} 
.method public static string  'string<Int32Enum>'(valuetype assembly/String/Int32Enum 'enum') cil managed
{
  
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  box        assembly/String/Int32Enum
  IL_0006:  unbox.any  [runtime]System.IFormattable
  IL_000b:  ldnull
  IL_000c:  call       class [netstandard]System.Globalization.CultureInfo [netstandard]System.Globalization.CultureInfo::get_InvariantCulture()
  IL_0011:  tail.
  IL_0013:  callvirt   instance string [netstandard]System.IFormattable::ToString(string,
                                                                                  class [netstandard]System.IFormatProvider)
  IL_0018:  ret
} 

Example of IL after

.method public static string  'string int32'(int32 'value') cil managed
{
  
  .maxstack  8
  IL_0000:  ldarga.s   'value'
  IL_0002:  ldnull
  IL_0003:  call       class [netstandard]System.Globalization.CultureInfo [netstandard]System.Globalization.CultureInfo::get_InvariantCulture()
  IL_0008:  call       instance string [netstandard]System.Int32::ToString(string,
                                                                           class [netstandard]System.IFormatProvider)
  IL_000d:  ret
}
.method public static string  'string<Int32Enum>'(valuetype assembly/String/Int32Enum 'enum') cil managed
{
  
  .maxstack  3
  .locals init (valuetype assembly/String/Int32Enum V_0)
  IL_0000:  ldarg.0
  IL_0001:  stloc.0
  IL_0002:  ldloca.s   V_0
  IL_0004:  constrained. assembly/String/Int32Enum
  IL_000a:  callvirt   instance string [netstandard]System.Object::ToString()
  IL_000f:  ret
} 

Changes

  1. A new marker type—SupportsWhenTEnum—is added to the Microsoft.FSharp.Core.CompilerServices namespace in FSharp.Core. This type is marked [<Sealed; AbstractClass>], has no members or constructors, and is hidden by default via [<CompilerMessage(…)>]. (Could/should we use IsError = true, and/or use EditorBrowsable or Experimental to discourage use from other languages as well?)
  2. Special compiler support is added for two new library-only (i.e., FSharp.Core-only) static optimization constraints:
    • The compiler now recognizes the when 'T : Enum library-only static optimization constraint and treats it very similarly to the already-possible when 'T : 'T & #Enum, only the subtype constraint is not propagated back to the outer 'T.
    • The compiler now recognizes the special when 'T : SupportsWhenTEnum library-only static optimization constraint. This enables compilers that understand it to process any following static optimization constraints in a different order from compilers that do not understand it while remaining fully compatible with them.
  3. The sequence of library-only static optimizations in the string operator is updated to use a new ordering via the when 'T : SupportsWhenTEnum and when 'T : Enum constraints when compiled with newer compilers. Older compilers will not recognize the new constraints; they will simply skip over them and use the old sequence of constraints exactly as before.

The compiler change could be put behind a language feature if necessary.

Tradeoffs

Pros

  • Enables significant speedups for string on very common types like int.
  • Fully backwards and forwards compatible with older and newer F# compilers and FSharp.Core versions.
  • Introduces essentially zero public-facing changes to the language.
  • Avoids the complex engineering and compatibility gymnastics that would likely be required to support altogether new kinds of static optimization constraints (like Support compiler-specific statically-resolved 'when' syntax with 'enum<_>' and possibly others #9594).
  • The changes to the compiler and core library that are made are very small, localized, and could conceivably be safely removed or changed in the future if necessary or desired.

Cons

  • Feels like a bit of a hack.
  • Introduces two subtle new special cases to the already-subtle typechecking of library-only static optimization constraints.
  • Not terribly scalable: due to the way that each new static optimization marker type requires one-upping anything that came before, significant application of this technique could quickly become ugly. It does seem rather unlikely for that to happen, though.

Alternatives

  1. Do nothing.
  2. Try something like Support compiler-specific statically-resolved 'when' syntax with 'enum<_>' and possibly others #9594.
  3. Change the typechecking of generic constraints in static optimization constraints such that they no longer propagate up to the outer 'T (and explicitly add the necessary constraints to the core library functions that currently depend on this propagation). This would allow existing syntactically-valid generic constraints to be used (like when 'T : 'T & #Enum) instead of adding a special case for when 'T : Enum.
  4. Do special lowering tailored to the compiled output of the existing code, along the lines of LowerComputedCollections.fs, etc.
  5. Add some more general mechanism to the optimizer to strip away boxing/casts that could be known to be unnecessary at compile-time.

If 2. or 3. had been done in F# 1.0 (or whenever static optimization constraints were added), that would have been ideal. However, doing either 2. or 3. now would involve a significant amount of engineering work, and both would have compatibility problems.

Checklist

  • Test cases added.
  • Performance benchmarks added in case of performance changes.
  • Release notes entry updated.

Benchmarks

  • The string representation of small positive integer values is cached. Calling string on such values for signed integer types (sbyte, int16, int32, int64) now results in zero allocations and a ~3× speedup.
  • There is a noticeable speedup and reduction in allocations for string on negative and non-cached positive signed integers.
  • There is also a noticeable speedup for string on enum values, including negative values that do not correspond to a member of the given enum type.
Source
open System
open System.Runtime.CompilerServices

[<MethodImpl(MethodImplOptions.NoInlining)>]
let ``string 3`` () = string 3

[<MethodImpl(MethodImplOptions.NoInlining)>]
let ``string -3`` () = string -3

[<MethodImpl(MethodImplOptions.NoInlining)>]
let ``string 1152921504606846975L`` () = string 1152921504606846975L

[<MethodImpl(MethodImplOptions.NoInlining)>]
let ``string -1152921504606846975L`` () = string -1152921504606846975L

[<MethodImpl(MethodImplOptions.NoInlining)>]
let ``string DayOfWeek.Wednesday`` () = string DayOfWeek.Wednesday

[<MethodImpl(MethodImplOptions.NoInlining)>]
let ``string (enum<DayOfWeek> -3)`` () = string (enum<DayOfWeek> -3)
| Categories                   | Mean      | Ratio | Gen0   | Allocated | Alloc Ratio |
|----------------------------- |----------:|------:|-------:|----------:|------------:|
| string 3                     |  6.172 ns |  1.00 | 0.0019 |      24 B |        1.00 |
| string 3                     |  1.925 ns |  0.31 |      - |         - |        0.00 |
|                              |           |       |        |           |             |
| string -3                    | 12.879 ns |  1.00 | 0.0044 |      56 B |        1.00 |
| string -3                    |  8.082 ns |  0.63 | 0.0025 |      32 B |        0.57 |
|                              |           |       |        |           |             |
| string 1152921504606846975L  | 18.343 ns |  1.00 | 0.0070 |      88 B |        1.00 |
| string 1152921504606846975L  | 13.976 ns |  0.76 | 0.0051 |      64 B |        0.73 |
|                              |           |       |        |           |             |
| string -1152921504606846975L | 21.641 ns |  1.00 | 0.0070 |      88 B |        1.00 |
| string -1152921504606846975L | 17.313 ns |  0.80 | 0.0051 |      64 B |        0.73 |
|                              |           |       |        |           |             |
| string DayOfWeek.Wednesday   | 10.036 ns |  1.00 | 0.0019 |      24 B |        1.00 |
| string DayOfWeek.Wednesday   |  7.940 ns |  0.79 | 0.0019 |      24 B |        1.00 |
|                              |           |       |        |           |             |
| string (enum<DayOfWeek> -3)  | 28.096 ns |  1.00 | 0.0044 |      56 B |        1.00 |
| string (enum<DayOfWeek> -3)  | 15.098 ns |  0.54 | 0.0044 |      56 B |        1.00 |

Copy link
Contributor

github-actions bot commented May 14, 2025

❗ Release notes required


✅ Found changes and release notes in following paths:

Change path Release notes path Description
src/FSharp.Core docs/release-notes/.FSharp.Core/10.0.100.md
src/Compiler docs/release-notes/.FSharp.Compiler.Service/10.0.100.md

@brianrourkeboll brianrourkeboll force-pushed the more-string-optimizations branch 4 times, most recently from 18062d5 to 5eded0b Compare May 15, 2025 22:00
@brianrourkeboll brianrourkeboll changed the base branch from main to T-Gro-patch-3 May 16, 2025 17:10
@brianrourkeboll brianrourkeboll force-pushed the more-string-optimizations branch from 5eded0b to 0c48a37 Compare May 16, 2025 17:11
@brianrourkeboll brianrourkeboll changed the base branch from T-Gro-patch-3 to main May 16, 2025 22:25
@brianrourkeboll brianrourkeboll changed the base branch from main to T-Gro-patch-3 May 16, 2025 22:32
@brianrourkeboll brianrourkeboll changed the base branch from T-Gro-patch-3 to main May 17, 2025 16:34
@brianrourkeboll brianrourkeboll force-pushed the more-string-optimizations branch 2 times, most recently from 277f5c4 to 2e83e22 Compare May 17, 2025 17:39
@brianrourkeboll brianrourkeboll marked this pull request as ready for review May 17, 2025 23:42
@brianrourkeboll brianrourkeboll requested a review from a team as a code owner May 17, 2025 23:42
@@ -1278,6 +1278,8 @@ type TcGlobals(

member val ArrayCollector_tcr = mk_MFCompilerServices_tcref fslibCcu "ArrayCollector`1"

member val SupportsWhenTEnum_tcr = mk_MFCompilerServices_tcref fslibCcu "SupportsWhenTEnum"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when an older Fsharp.Core is explicitely added instead of the implicit one?
Does this value support optionality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this does still work if the FSharp.Core version referenced does not contain this type (just as with ListCollector and ArrayCollector above).

/// </summary>
[<Sealed; AbstractClass>]
[<CompilerMessage("This type is for compiler use and should not be used directly", 1204, IsHidden = true)>]
type SupportsWhenTEnum = class end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add an EditorBrowsable Never here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when 'T : Enum = let x = (# "" value : 'T #) in x.ToString() // Use 'T to constrain the call to the specific enum type.

// For compilers that understand `when 'T : Enum`, we can safely make a constrained call on the integral type itself here.
when 'T : sbyte = let x = (# "" value : sbyte #) in x.ToString(null, CultureInfo.InvariantCulture)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this block fully superceeding the existing block for integral types?
In what constellations is the old block still being hit?

https://github.com/dotnet/fsharp/pull/18546/files#diff-06bb7bec1ea92205ec6f4e69f53cdf3aa53e3431db58ca15a30c7d426ab6f24bR5189-R5192

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This supersedes the existing block for enums and signed integral types only when compiled by a compiler that understands when 'T : SupportsTEnum and when 'T : Enum. Older compilers will not give special processing to when 'T : SupportsTEnum, so nothing will match that rule (they will treat it as an exact type equivalence test, which they understand but which nothing can ever match), and they will immediately fall back to defaultString, which contains the old sequence of static optimizations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.
In that case please adjust the comment over at L 5176.
The code should still remain, but the reasons changed.

You can also add e.g. a 5 year timer in that comment (or current date + that this block of code is needed for the supported lifetime of .NET8 SDKs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this? d5dc4a5


// Special handling for enums whose underlying type is a signed integral type.
// The runtime value may be outside the defined members of the enum, and the negative sign may be overridden.
when 'T : Enum = let x = (# "" value : 'T #) in x.ToString() // Use 'T to constrain the call to the specific enum type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clause is when working with generic Enum code - please add a comment linking that to the specific source code constructs this handles.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add WhenTyparIsEnum in the comment - I believe that's what it is, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultString value

// Only compilers that understand `when 'T : SupportsWhenTEnum` will understand `when 'T : Enum`.
when 'T : SupportsWhenTEnum =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new mechanism for making language feature toggles which are strongly coupled between fsharp.core and the compiler.

When I was first reading the code (before reading your PR's description), I thought that it related to T itself.
I would want to come up with a naming scheme that makes it apparent that this is indeed a feature toggle.

What about a dedicated module/namespace to host these marker types?
like 'T : CompilerLibraryFeatures.SupportsWhenTEnum or something indicating that naming?

= I want it to be apparent that it decides based on compiler capabilities

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe CompilerServices.SupportsWhenTEnumFeature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is when 'T : CompilerServices.SupportsWhenTEnum OK, or would you prefer the Feature suffix? Or there could even be a Features module or something, although I don't expect we'll be adding many (if any) more of these.

Copy link
Member

@T-Gro T-Gro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this addition and we can uplift what you called type marker hack into a more explicit (new) mechanism - a feature toggle mechanism to communicate between compiler and statically optimized library code.

Testing for new compiler + new fsharp core is well covered with existing FSharp.Core tests.
I think it would be wise to add a smoke test for showing that new compiler + old fsharp.core works fine.

(we could also brainstorm on a good way to test the last combination, old compiler + new fsharp.core. A way could be to run fsharp.core test suite, with freshly built Fsharp.Core, using the last-known-good SDK, without compiler overrides)

This special-casing allows us to update FSharp.Core to avoid boxing
when caling the `string` function on enums and signed integral types
going forward while still allowing the updated version of FSharp.Core
to be fully compatible with older compilers.

Adding support for some form of constraint in library-only static
optimizations instead would have been problematic for multiple reasons.
Supporting something like `when 'T : enum<'U>` would have required
additional modifications to the compiler and would not have been
consumable by older compilers. It would also introduce a new type
variable. While something like `when 'T : 'T & #Enum` is already
syntactically valid, it would add that constraint to the entire `string`
function without further modification to the typechecker. It would also
not be consumable by older compilers.

I think adding a special case for enums is justifiable since (1) enums are a
special kind of type to begin with, and (2) static optimization
constraints are only allowed in FSharp.Core, so the change to the
language itself is quite small.
@brianrourkeboll brianrourkeboll force-pushed the more-string-optimizations branch from 7f2499f to d5dc4a5 Compare May 19, 2025 23:49
@brianrourkeboll brianrourkeboll force-pushed the more-string-optimizations branch from d5dc4a5 to 4d198f5 Compare May 20, 2025 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

2 participants