You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hashing fired detectors with boost::dynamic_bitset (#57)
### Hashing Syndrome Patterns with `boost::dynamic_bitset`
In this PR, I address a key performance bottleneck: the hashing of fired
detector patterns (syndrome patterns). I introduce the use of
`boost::dynamic_bitset` from the Boost library, a data structure that
combines the memory-saving bit-packing feature of `std::vector<bool>`
with highly optimized bit-wise operations and built-in hashing, enabling
fast access and modification operations like in `std::vector<char>`.
Crucially, `boost::dynamic_bitset` also provides highly optimized,
built-in functions for efficiently hashing sequences of boolean
elements.
---
### Initial Optimization: `std::vector<bool>` to `std::vector<char>`
The initial _Tesseract_ implementation, as documented in #25, utilized
`std::vector<bool>` to store patterns of fired detectors and predicates
that block specific errors from being added to the current error
hypothesis. While `std::vector<bool>` optimizes memory usage by packing
elements into individual bits, accessing and modifying its elements is
highly inefficient due to its reliance on proxy objects that perform
costly bit-wise operations (shifting, masking). Given _Tesseract_'s
frequent access and modification of these elements, this caused
significant performance overheads.
In #25, I transitioned from `std::vector<bool>` to `std::vector<char>`.
This change made boolean elements addressable bytes, enabling efficient
and direct byte-level access. Although this increased memory footprint
(as each boolean was stored as a full byte), it delivered substantial
performance gains by eliminating `std::vector<bool>`'s proxy objects and
their associated overheads for element access and modification. Speedups
achieved with this initial optimization were significant:
* For Color Codes, speedups reached 17.2%-32.3%
* For Bivariate-Bicycle Codes, speedups reached 13.0%-22.3%
* For Surface Codes, speedups reached 33.4%-42.5%
* For Transversal CNOT Protocols, speedups reached 12.2%-32.4%
These significant performance gains highlight the importance of choosing
appropriate data structures for boolean sequences, especially in
performance-sensitive applications like _Tesseract_. The remarkable
42.5% speedup achieved in Surface Codes with this initial switch
underscores the substantial overhead caused by unsuitable data
structures. The performance gain from removing `std::vector<bool>`'s
proxy objects and their inefficient operations far outweighed any
overhead from increased memory consumption.
---
### Current Bottleneck: `std::vector<char>` and Hashing
Following the optimizations in #25, _Tesseract_ continued to use
`std::vector<char>` for storing and managing patterns of fired detectors
and predicates that block errors. Subsequently, PR #34 replaced and
merged vectors of blocked errors into the `DetectorCostTuple` structure,
which efficiently stores `error_blocked` and `detectors_count` as
`uint32_t` fields (reasons explained in #34). These changes left vectors
of fired detectors as the sole remaining `std::vector<char>` data
structure in this context.
After implementing and evaluating optimizations in #25, #27, #34, and
#45, profiling _Tesseract_ to analyze remaining bottlenecks revealed
that, aside from the `get_detcost` function, a notable bottleneck
emerged: `VectorCharHash` (originally `VectorBoolHash`). This function
is responsible for hashing patterns of fired detectors to prevent
re-exploring previously visited syndrome states. The implementation of
`VectorCharHash` involved iterating through each element, byte by byte,
and accumulating the hash. Even though this function saw significant
speedups with the initial switch from `std::vector<bool>` to
`std::vector<char>`, hashing patterns of fired detectors still consumed
considerable time. Post-optimization profiling (after #25, #27, #34, and
#45) revealed that this hashing function consumed approximately 25% of
decoding time in Surface Codes, 30% in Transversal CNOT Protocols, 10%
in Color Codes, and 2% in Bivariate-Bicycle Codes (`get_detcost`
remained the primary bottleneck for Bivariate-Bicycle Codes). Therefore,
I decided to explore opportunities to further optimize this function and
enhance the decoding speed.
---
### Solution: Introducing `boost::dynamic_bitset`
This PR addresses the performance bottleneck of hashing fired detector
patterns and mitigates the increased memory footprint from the initial
switch to `std::vector<char>` by introducing the `boost::dynamic_bitset`
data structure. The C++ standard library's `std::bitset` offers an ideal
conceptual solution: memory-efficient bit-packed storage (like
`std::vector<bool>`) combined with highly efficient access and
modification operations (like `std::vector<char>`). This data structure
achieves efficient access and modification by employing highly optimized
bit-wise operations, thereby reducing performance overhead stemming from
proxy objects in `std::vector<bool>`. However, `std::bitset` requires a
static size (determined at compile-time), rendering it unsuitable for
_Tesseract_'s dynamically sized syndrome patterns.
The Boost library's `boost::dynamic_bitset` provides the perfect
solution by offering dynamic-sized bit arrays whose dimensions can be
determined at runtime. This data structure brilliantly combines the
memory efficiency of `std::vector<bool>` (by packing elements into
individual bits) with the performance benefits of direct element access
and modification, similar to `std::vector<char>`. This is achieved by
internally storing bits within a contiguous array of fundamental integer
types (e.g., `unsigned long` or `uint64_t`) and accessing/modifying
elements using highly optimized bit-wise operations, thus avoiding the
overheads of `std::vector<bool>`'s proxy objects and costly bit-wise
operations. Furthermore, `boost::dynamic_bitset` offers highly
optimized, built-in hashing functions, replacing our custom, less
efficient byte-by-byte hashing and resulting in a cleaner, faster
implementation.
---
### Performance Evaluation: Individual Impact of Optimization
I performed two types of experiments to evaluate the achieved
performance gains. First, I conducted extensive benchmarks across
various code families and configurations to evaluate the individual
performance gains achieved by this specific optimization. Speedups
achieved include:
* For Surface Codes: 8.0%-24.7%
* For Transversal CNOT Protocols: 12.1%-26.8%
* For Color Codes: 3.6%-7.0%
* For Bivariate-Bicycle Codes: 0.5%-4.8%
These results highlight the highest impact in Surface Codes and
Transversal CNOT Protocols, which aligns with the initial profiling data
that showcased these code families were spending more time in the
original `VectorCharHash` function.
---
#### Speedups in Surface Codes
<img width="1990" height="989" alt="img1"
src="https://github.com/user-attachments/assets/04044da5-a980-4282-a6fe-4debfa815f41"
/>
---
#### Speedups in Transversal CNOT Protocols
<img width="1990" height="989" alt="img2"
src="https://github.com/user-attachments/assets/f79e4d7d-5cfc-4077-be1a-13ef92a2d65a"
/>
<img width="1990" height="989" alt="img3"
src="https://github.com/user-attachments/assets/35a9b672-07d3-45ea-9334-23dd85760925"
/>
---
#### Speedups in Color Codes
<img width="1990" height="989" alt="img4"
src="https://github.com/user-attachments/assets/2b52c4fd-5137-47f0-9bae-7c667c740ff0"
/>
<img width="1990" height="989" alt="img5"
src="https://github.com/user-attachments/assets/e7883dec-5a88-4b2b-914b-3d12a1843d6f"
/>
---
#### Speedups in Bivariate-Bicycle Codes
<img width="1990" height="989" alt="img6"
src="https://github.com/user-attachments/assets/bd530a3b-da17-4ac1-bf68-702aaafe6047"
/>
<img width="1990" height="989" alt="img7"
src="https://github.com/user-attachments/assets/2d2f2576-0b16-4f0a-b8a2-221723250945"
/>
---
### Performance Evaluation: Cumulative Speedup
Following the evaluation of individual performance gains, I analyzed the
cumulative effect of the optimizations implemented across PRs #25, #27,
#34, and #45. The cumulative speedups achieved are:
* For Color Codes: 40.7%-54.8%
* For Bivariate-Bicycle Codes: 41.5%-80.3%
* For Surface Codes: 50.0%-62.4%
* For Transversal CNOT Protocols: 57.8%-63.6%
These results demonstrate that my optimizations achieved over 2x speedup
in Color Codes, over 2.5x speedup in Surface Codes and Transversal CNOT
Protocols, and over 5x speedup in Bivariate-Bicycle Codes.
---
#### Speedups in Color Codes
<img width="1990" height="989" alt="img1"
src="https://github.com/user-attachments/assets/cd81dc98-8599-4740-b00c-4ff396488f69"
/>
<img width="1990" height="989" alt="img2"
src="https://github.com/user-attachments/assets/c337ddcf-44f0-4641-91df-2a6d3c586680"
/>
---
#### Speedups in Bivariate-Bicycle Codes
<img width="1990" height="989" alt="img3"
src="https://github.com/user-attachments/assets/a57cf9e2-4c2c-44e8-8a6e-1860b1544cbd"
/>
<img width="1990" height="989" alt="img4"
src="https://github.com/user-attachments/assets/fde60159-fd7f-4893-b30d-34da844ac452"
/>
---
#### Speedups in Surface Codes
<img width="1990" height="989" alt="img5"
src="https://github.com/user-attachments/assets/57234d33-201b-41a9-b867-15e9ff87e666"
/>
---
#### Speedups in Transversal CNOT Protocols
<img width="1990" height="989" alt="img6"
src="https://github.com/user-attachments/assets/5780843d-2055-4870-9454-50184a268ad1"
/>
---
### Conclusion
These results demonstrate that the `boost::dynamic_bitset` optimization
significantly impacts code families where the original hashing function
(`VectorCharHash`) was a primary bottleneck (Surface Codes and
Transversal CNOT Protocols). The substantial speedups achieved in these
code families validate that `boost::dynamic_bitset` provides
demonstrably more efficient hashing and bit-wise operations. For code
families where hashing was less of a bottleneck (Color Codes and
Bivariate-Bicycle Codes), the speedups were modest, reinforcing that
`std::vector<char>` can remain highly efficient even with increased
memory usage when bit packing is not the primary performance concern.
Crucially, this optimization delivers comparable or superior performance
to `std::vector<char>` while simultaneously reducing memory footprint,
providing additional speedups where hashing performance is critical.
---
### Key Contributions
* Identified the hashing of syndrome patterns as the primary remaining
bottleneck in Surface Codes and Transversal CNOT Protocols, post prior
optimizations (#25, #27, #34, #45).
* Adopted `boost::dynamic_bitset` as a superior data structure,
combining `std::vector<bool>`'s memory efficiency with high-performance
bit-wise operations and built-in hashing, enabling fast access and
modification operations like in `std::vector<char>`
* Replaced `std::vector<char>` with `boost::dynamic_bitset` for storing
syndrome patterns.
* Performed extensive benchmarking to evaluate both the individual
impact of this optimization and its cumulative effect with prior PRs.
* Achieved significant individual speedups (e.g., 8.0%-24.7% in Surface
Codes, 12.1%-26.8% in Transversal CNOT Protocols) and substantial
cumulative speedups (over 2x in Color Codes, over 2.5x in Surface Codes
and Transversal CNOT Protocols, and over 5x in Bivariate-Bicycle Codes).
PR #47 contains the scripts I used for benchmarking and plotting the
results.
---------
Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com>
Co-authored-by: noajshu <shutty@google.com>
Co-authored-by: LaLeh <lalehbeni@google.com>
0 commit comments