Skip to content

Conversation

@V0ldek
Copy link
Member

@V0ldek V0ldek commented Apr 20, 2025

DRAFT: This is work in progress, the code is rough around the edges and not SIMD configurations are implemented.

Short description

This is a big one, it has been over a year in the making now.

Strings used in queries are now compiled into StringPattern structures that contain precomputed tables for unicode and escape-aware matching of keys.

This finally makes us spec-compliant when it comes to string comparison. However, it is most likely significantly slower, and the matching algorithm is quite complicated.

Before this gets merged the following things need to be addressed:

  • Matching algorithm for SSE2 and 32-bit architectures.
  • Backwards-matching algorithm.
  • Fuzzing tests specifically for the string matcher.
  • Update CTS and work through which tests should now pass.

Issue

Resolves: #117

Checklist

All of these should be ticked off before you submit the PR.

  • I ran just verify locally and it succeeded.
  • Issue was given go ahead and is linked above OR I have included justification for a minor change.
  • Unit tests for my changes are included OR no functionality was changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Properly handle UTF-8 labels

2 participants