Skip to content

Validate fast_index in CodePointTrie constructor and omit bound checks in the getters in the fast_index case #6854

@hsivonen

Description

@hsivonen

Based on normalizer benchmarking, it's impossible to get to ICU4C-like performance if CodePointTrie getters check slice bounds when reading from the index and data ZeroVecs. Moreover, omitting the slice bound checks results in ICU4C-like trie performance and on the fast_index path it's easy to articulate the length requirements in a way that would be practical to check in the constructor.

AFAICT, assert!(fast_max >> FAST_TYPE_SHIFT < index.len()) (where fast_max is one of two constants based on the trie type flag) and assert!(index.iter().max().unwrap() + FAST_TYPE_DATA_MASK < data.len()).

Unfortunately, the const constructor marked doc(hidden) and // databake internal is not marked unsafe, so adding this kind of internal safety invariant could be violated by safe code by using the hidden databake constructor unless enough things are const so that we could compute the validation also in the const context.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-performanceArea: Performance (CPU, Memory)C-unicodeComponent: Props, sets, tries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions