Skip to content

Removed upb_Decoder's stateful, bespoke logic for field lookup. #21707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

copybara-service[bot]
Copy link

Removed upb_Decoder's stateful, bespoke logic for field lookup.

upb_Decoder used to have its own logic for finding a MiniTable field by number. Instead of using the standard upb_MiniTable_FindFieldByNumber() function, it used a custom linear search that cached the most recently seen field index. The theory behind this design was that field numbers are usually serialized in increasing order, so by linear searching from the last seen index, we would have a small expected number of elements to inspect before finding the matching one.

In practice, it appears that we can use the generic searching logic with no loss in performance. Removing the stateful search has several benefits:

  1. This simplifies the decoder since we no longer have to preserve this extra state (last_field_index). Note that it was not even possible to put this variable into upb_Decoder because it is preserved per parsed message, so we previously needed a stack of last_field_index values. This CL lets us get rid of that completely.
  2. This makes parsing performance less sensitive to whether field numbers are serialized in order.
  3. This simplification will help as we refine the dispatch logic that switches between the standard and fasttable decoders.

upb_Decoder used to have its own logic for finding a MiniTable field by number.  Instead of using the standard `upb_MiniTable_FindFieldByNumber()` function, it used a custom linear search that cached the most recently seen field index.  The theory behind this design was that field numbers are usually serialized in increasing order, so by linear searching from the last seen index, we would have a small expected number of elements to inspect before finding the matching one.

In practice, it appears that we can use the generic searching logic with no loss in performance.  Removing the stateful search has several benefits:

1. This simplifies the decoder since we no longer have to preserve this extra state (`last_field_index`).  Note that it was not even possible to put this variable into `upb_Decoder` because it is preserved per parsed message, so we previously needed a *stack* of `last_field_index` values.  This CL lets us get rid of that completely.
2. This makes parsing performance less sensitive to whether field numbers are serialized in order.
3. This simplification will help as we refine the dispatch logic that switches between the standard and fasttable decoders.

PiperOrigin-RevId: 757333347
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant