You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In CakeML strings are immutable vectors of characters. At present they are always represented at runtime as a pointer to an immutable byte array. For example, the string "hello" is represented by
This issue is about making the compiler generate code with a more space efficient (and faster?) representation for short strings. The idea is to represent short strings as numbers directly in the register (or memory slot) where the pointer to the byte array would otherwise be. The string "hello" would be represented as:
0x568656C6C6F00000
which is 0x5 and then 0x680x650x6C0x6C0x6F followed by zeros. This encoding can represent strings up to length 7 directly as an integer with least significant bit 0 indicating to the CakeML GC that it's a number.
With this representation the empty string is number 0. A single character string such as "a" is 0x16100000000000 where 0x61 is the hex for character a.
This string representation pads zeros to the right in order to make the following string comparison fast:
fast_str_le s t =
if strlen s = strlen t then s <= t else strlen s <= strlen t
This can be implemented as follows:
When computing this for two short strings, we can implement this with a single machine instruction for 64-bit integer comparison.
When comparing a short and a long string, we can immediately tell which the result based on the least significant bit of the pointer / integer, since pointers in CakeML always have least significant bit 1.
When comparing two long strings, we first compare the headers. If they differ, then an integer comparison of the headers gives the result. If the headers are the same, then we need to walk the payload.
Using the space optimiation and fast_str_le in balanced binary maps where (mostly short) strings are keys should improve both space usage and speed.
For 32-bit architectures, the length of short strings would be at most 3, which is such a low limit that the optimised representation is probably less useful, but still good to have for 32-bit architectures as long as it can be turned off if users find it's harmful in their application.
To implement this in the CakeML compiler, one needs to:
make BVL, BVI, DataLang aware (both in the compiler state and semantics) of whether we are running in 64-bit mode.
update clos_to_bvl to map short ByteVectors to Number when possible
update data_to_word for this new representation of strings; the implementation of some string operations might need to move to data_to_word from bvl_to_bvi or be supported by new primitive ops that allows them to be implemented in bvl_to_bvi.
add fast_str_le to the source semantics and implement it in data_to_word
add a new compiler flag that can be used to disable this optimisation
The text was updated successfully, but these errors were encountered:
Uh oh!
There was an error while loading. Please reload this page.
In CakeML strings are immutable vectors of characters. At present they are always represented at runtime as a pointer to an immutable byte array. For example, the string "hello" is represented by
This issue is about making the compiler generate code with a more space efficient (and faster?) representation for short strings. The idea is to represent short strings as numbers directly in the register (or memory slot) where the pointer to the byte array would otherwise be. The string "hello" would be represented as:
which is
0x5
and then0x68
0x65
0x6C
0x6C
0x6F
followed by zeros. This encoding can represent strings up to length 7 directly as an integer with least significant bit 0 indicating to the CakeML GC that it's a number.With this representation the empty string is number 0. A single character string such as "a" is
0x16100000000000
where0x61
is the hex for charactera
.This string representation pads zeros to the right in order to make the following string comparison fast:
This can be implemented as follows:
Using the space optimiation and
fast_str_le
in balanced binary maps where (mostly short) strings are keys should improve both space usage and speed.For 32-bit architectures, the length of short strings would be at most 3, which is such a low limit that the optimised representation is probably less useful, but still good to have for 32-bit architectures as long as it can be turned off if users find it's harmful in their application.
To implement this in the CakeML compiler, one needs to:
clos_to_bvl
to map shortByteVector
s toNumber
when possibledata_to_word
for this new representation of strings; the implementation of some string operations might need to move todata_to_word
frombvl_to_bvi
or be supported by new primitive ops that allows them to be implemented inbvl_to_bvi
.fast_str_le
to the source semantics and implement it indata_to_word
The text was updated successfully, but these errors were encountered: