Skip to content

Commit 8699f2d

Browse files
draganaurosgrbicnoajshuLalehB
authored
Hashing fired detectors with boost::dynamic_bitset (#57)
### Hashing Syndrome Patterns with `boost::dynamic_bitset` In this PR, I address a key performance bottleneck: the hashing of fired detector patterns (syndrome patterns). I introduce the use of `boost::dynamic_bitset` from the Boost library, a data structure that combines the memory-saving bit-packing feature of `std::vector<bool>` with highly optimized bit-wise operations and built-in hashing, enabling fast access and modification operations like in `std::vector<char>`. Crucially, `boost::dynamic_bitset` also provides highly optimized, built-in functions for efficiently hashing sequences of boolean elements. --- ### Initial Optimization: `std::vector<bool>` to `std::vector<char>` The initial _Tesseract_ implementation, as documented in #25, utilized `std::vector<bool>` to store patterns of fired detectors and predicates that block specific errors from being added to the current error hypothesis. While `std::vector<bool>` optimizes memory usage by packing elements into individual bits, accessing and modifying its elements is highly inefficient due to its reliance on proxy objects that perform costly bit-wise operations (shifting, masking). Given _Tesseract_'s frequent access and modification of these elements, this caused significant performance overheads. In #25, I transitioned from `std::vector<bool>` to `std::vector<char>`. This change made boolean elements addressable bytes, enabling efficient and direct byte-level access. Although this increased memory footprint (as each boolean was stored as a full byte), it delivered substantial performance gains by eliminating `std::vector<bool>`'s proxy objects and their associated overheads for element access and modification. Speedups achieved with this initial optimization were significant: * For Color Codes, speedups reached 17.2%-32.3% * For Bivariate-Bicycle Codes, speedups reached 13.0%-22.3% * For Surface Codes, speedups reached 33.4%-42.5% * For Transversal CNOT Protocols, speedups reached 12.2%-32.4% These significant performance gains highlight the importance of choosing appropriate data structures for boolean sequences, especially in performance-sensitive applications like _Tesseract_. The remarkable 42.5% speedup achieved in Surface Codes with this initial switch underscores the substantial overhead caused by unsuitable data structures. The performance gain from removing `std::vector<bool>`'s proxy objects and their inefficient operations far outweighed any overhead from increased memory consumption. --- ### Current Bottleneck: `std::vector<char>` and Hashing Following the optimizations in #25, _Tesseract_ continued to use `std::vector<char>` for storing and managing patterns of fired detectors and predicates that block errors. Subsequently, PR #34 replaced and merged vectors of blocked errors into the `DetectorCostTuple` structure, which efficiently stores `error_blocked` and `detectors_count` as `uint32_t` fields (reasons explained in #34). These changes left vectors of fired detectors as the sole remaining `std::vector<char>` data structure in this context. After implementing and evaluating optimizations in #25, #27, #34, and #45, profiling _Tesseract_ to analyze remaining bottlenecks revealed that, aside from the `get_detcost` function, a notable bottleneck emerged: `VectorCharHash` (originally `VectorBoolHash`). This function is responsible for hashing patterns of fired detectors to prevent re-exploring previously visited syndrome states. The implementation of `VectorCharHash` involved iterating through each element, byte by byte, and accumulating the hash. Even though this function saw significant speedups with the initial switch from `std::vector<bool>` to `std::vector<char>`, hashing patterns of fired detectors still consumed considerable time. Post-optimization profiling (after #25, #27, #34, and #45) revealed that this hashing function consumed approximately 25% of decoding time in Surface Codes, 30% in Transversal CNOT Protocols, 10% in Color Codes, and 2% in Bivariate-Bicycle Codes (`get_detcost` remained the primary bottleneck for Bivariate-Bicycle Codes). Therefore, I decided to explore opportunities to further optimize this function and enhance the decoding speed. --- ### Solution: Introducing `boost::dynamic_bitset` This PR addresses the performance bottleneck of hashing fired detector patterns and mitigates the increased memory footprint from the initial switch to `std::vector<char>` by introducing the `boost::dynamic_bitset` data structure. The C++ standard library's `std::bitset` offers an ideal conceptual solution: memory-efficient bit-packed storage (like `std::vector<bool>`) combined with highly efficient access and modification operations (like `std::vector<char>`). This data structure achieves efficient access and modification by employing highly optimized bit-wise operations, thereby reducing performance overhead stemming from proxy objects in `std::vector<bool>`. However, `std::bitset` requires a static size (determined at compile-time), rendering it unsuitable for _Tesseract_'s dynamically sized syndrome patterns. The Boost library's `boost::dynamic_bitset` provides the perfect solution by offering dynamic-sized bit arrays whose dimensions can be determined at runtime. This data structure brilliantly combines the memory efficiency of `std::vector<bool>` (by packing elements into individual bits) with the performance benefits of direct element access and modification, similar to `std::vector<char>`. This is achieved by internally storing bits within a contiguous array of fundamental integer types (e.g., `unsigned long` or `uint64_t`) and accessing/modifying elements using highly optimized bit-wise operations, thus avoiding the overheads of `std::vector<bool>`'s proxy objects and costly bit-wise operations. Furthermore, `boost::dynamic_bitset` offers highly optimized, built-in hashing functions, replacing our custom, less efficient byte-by-byte hashing and resulting in a cleaner, faster implementation. --- ### Performance Evaluation: Individual Impact of Optimization I performed two types of experiments to evaluate the achieved performance gains. First, I conducted extensive benchmarks across various code families and configurations to evaluate the individual performance gains achieved by this specific optimization. Speedups achieved include: * For Surface Codes: 8.0%-24.7% * For Transversal CNOT Protocols: 12.1%-26.8% * For Color Codes: 3.6%-7.0% * For Bivariate-Bicycle Codes: 0.5%-4.8% These results highlight the highest impact in Surface Codes and Transversal CNOT Protocols, which aligns with the initial profiling data that showcased these code families were spending more time in the original `VectorCharHash` function. --- #### Speedups in Surface Codes <img width="1990" height="989" alt="img1" src="https://github.com/user-attachments/assets/04044da5-a980-4282-a6fe-4debfa815f41" /> --- #### Speedups in Transversal CNOT Protocols <img width="1990" height="989" alt="img2" src="https://github.com/user-attachments/assets/f79e4d7d-5cfc-4077-be1a-13ef92a2d65a" /> <img width="1990" height="989" alt="img3" src="https://github.com/user-attachments/assets/35a9b672-07d3-45ea-9334-23dd85760925" /> --- #### Speedups in Color Codes <img width="1990" height="989" alt="img4" src="https://github.com/user-attachments/assets/2b52c4fd-5137-47f0-9bae-7c667c740ff0" /> <img width="1990" height="989" alt="img5" src="https://github.com/user-attachments/assets/e7883dec-5a88-4b2b-914b-3d12a1843d6f" /> --- #### Speedups in Bivariate-Bicycle Codes <img width="1990" height="989" alt="img6" src="https://github.com/user-attachments/assets/bd530a3b-da17-4ac1-bf68-702aaafe6047" /> <img width="1990" height="989" alt="img7" src="https://github.com/user-attachments/assets/2d2f2576-0b16-4f0a-b8a2-221723250945" /> --- ### Performance Evaluation: Cumulative Speedup Following the evaluation of individual performance gains, I analyzed the cumulative effect of the optimizations implemented across PRs #25, #27, #34, and #45. The cumulative speedups achieved are: * For Color Codes: 40.7%-54.8% * For Bivariate-Bicycle Codes: 41.5%-80.3% * For Surface Codes: 50.0%-62.4% * For Transversal CNOT Protocols: 57.8%-63.6% These results demonstrate that my optimizations achieved over 2x speedup in Color Codes, over 2.5x speedup in Surface Codes and Transversal CNOT Protocols, and over 5x speedup in Bivariate-Bicycle Codes. --- #### Speedups in Color Codes <img width="1990" height="989" alt="img1" src="https://github.com/user-attachments/assets/cd81dc98-8599-4740-b00c-4ff396488f69" /> <img width="1990" height="989" alt="img2" src="https://github.com/user-attachments/assets/c337ddcf-44f0-4641-91df-2a6d3c586680" /> --- #### Speedups in Bivariate-Bicycle Codes <img width="1990" height="989" alt="img3" src="https://github.com/user-attachments/assets/a57cf9e2-4c2c-44e8-8a6e-1860b1544cbd" /> <img width="1990" height="989" alt="img4" src="https://github.com/user-attachments/assets/fde60159-fd7f-4893-b30d-34da844ac452" /> --- #### Speedups in Surface Codes <img width="1990" height="989" alt="img5" src="https://github.com/user-attachments/assets/57234d33-201b-41a9-b867-15e9ff87e666" /> --- #### Speedups in Transversal CNOT Protocols <img width="1990" height="989" alt="img6" src="https://github.com/user-attachments/assets/5780843d-2055-4870-9454-50184a268ad1" /> --- ### Conclusion These results demonstrate that the `boost::dynamic_bitset` optimization significantly impacts code families where the original hashing function (`VectorCharHash`) was a primary bottleneck (Surface Codes and Transversal CNOT Protocols). The substantial speedups achieved in these code families validate that `boost::dynamic_bitset` provides demonstrably more efficient hashing and bit-wise operations. For code families where hashing was less of a bottleneck (Color Codes and Bivariate-Bicycle Codes), the speedups were modest, reinforcing that `std::vector<char>` can remain highly efficient even with increased memory usage when bit packing is not the primary performance concern. Crucially, this optimization delivers comparable or superior performance to `std::vector<char>` while simultaneously reducing memory footprint, providing additional speedups where hashing performance is critical. --- ### Key Contributions * Identified the hashing of syndrome patterns as the primary remaining bottleneck in Surface Codes and Transversal CNOT Protocols, post prior optimizations (#25, #27, #34, #45). * Adopted `boost::dynamic_bitset` as a superior data structure, combining `std::vector<bool>`'s memory efficiency with high-performance bit-wise operations and built-in hashing, enabling fast access and modification operations like in `std::vector<char>` * Replaced `std::vector<char>` with `boost::dynamic_bitset` for storing syndrome patterns. * Performed extensive benchmarking to evaluate both the individual impact of this optimization and its cumulative effect with prior PRs. * Achieved significant individual speedups (e.g., 8.0%-24.7% in Surface Codes, 12.1%-26.8% in Transversal CNOT Protocols) and substantial cumulative speedups (over 2x in Color Codes, over 2.5x in Surface Codes and Transversal CNOT Protocols, and over 5x in Bivariate-Bicycle Codes). PR #47 contains the scripts I used for benchmarking and plotting the results. --------- Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com> Co-authored-by: noajshu <shutty@google.com> Co-authored-by: LaLeh <lalehbeni@google.com>
1 parent cf990df commit 8699f2d

File tree

5 files changed

+63
-25
lines changed

5 files changed

+63
-25
lines changed

WORKSPACE

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,3 +66,22 @@ http_archive(
6666
urls = ["https://github.com/bazelbuild/platforms/archive/refs/tags/0.0.6.zip"],
6767
strip_prefix = "platforms-0.0.6",
6868
)
69+
70+
71+
72+
73+
BOOST_VERSION = "1.83.0"
74+
BOOST_ARCHIVE_NAME = "boost_{}".format(BOOST_VERSION.replace(".", "_"))
75+
76+
http_archive(
77+
name = "boost",
78+
urls = [
79+
"https://archives.boost.io/release/{}/source/{}.tar.gz".format(
80+
BOOST_VERSION,
81+
BOOST_ARCHIVE_NAME,
82+
)
83+
],
84+
strip_prefix = BOOST_ARCHIVE_NAME,
85+
sha256 = "c0685b68dd44cc46574cce86c4e17c0f611b15e195be9848dfd0769a0a207628",
86+
build_file = "//external:boost.BUILD",
87+
)

external/boost.BUILD

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# external/boost.BUILD
2+
package(default_visibility = ["//visibility:public"])
3+
4+
# A cc_library for the Boost headers themselves.
5+
cc_library(
6+
name = "boost_headers",
7+
hdrs = glob(["boost/**/*.hpp"]),
8+
includes = ["."],
9+
)
10+
11+
# A specific target for dynamic_bitset, which is header-only
12+
# and depends on the main headers.
13+
cc_library(
14+
name = "dynamic_bitset",
15+
deps = [":boost_headers"],
16+
)

src/BUILD

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,7 @@ cc_library(
121121
linkopts = OPT_LINKOPTS,
122122
deps = [
123123
":libutils",
124+
"@boost//:dynamic_bitset",
124125
],
125126
)
126127

src/tesseract.cc

Lines changed: 25 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@
1515
#include "tesseract.h"
1616

1717
#include <algorithm>
18+
#include <boost/functional/hash.hpp> // For boost::hash_range
1819
#include <cassert>
20+
#include <functional> // For std::hash (though not strictly necessary here, but good practice)
1921
#include <iostream>
2022

2123
namespace {
@@ -37,6 +39,17 @@ std::ostream& operator<<(std::ostream& os, const std::vector<T>& vec) {
3739

3840
}; // namespace
3941

42+
namespace std {
43+
template <>
44+
struct hash<boost::dynamic_bitset<>> {
45+
size_t operator()(const boost::dynamic_bitset<>& bs) const {
46+
// Delegate to Boost's internal hash_value for dynamic_bitset
47+
// This is the correct and most efficient way.
48+
return boost::hash_value(bs);
49+
}
50+
};
51+
} // namespace std
52+
4053
std::string TesseractConfig::str() {
4154
auto& config = *this;
4255
std::stringstream ss;
@@ -73,7 +86,7 @@ double TesseractDecoder::get_detcost(
7386
ErrorCost ec;
7487
DetectorCostTuple dct;
7588

76-
for (size_t ei : d2e[d]) {
89+
for (int ei : d2e[d]) {
7790
ec = error_costs[ei];
7891
if (ec.min_cost >= min_cost) break;
7992

@@ -89,17 +102,6 @@ double TesseractDecoder::get_detcost(
89102
return min_cost + config.det_penalty;
90103
}
91104

92-
struct VectorCharHash {
93-
size_t operator()(const std::vector<char>& v) const {
94-
size_t seed = v.size();
95-
96-
for (char el : v) {
97-
seed = seed * 31 + static_cast<size_t>(el);
98-
}
99-
return seed;
100-
}
101-
};
102-
103105
TesseractDecoder::TesseractDecoder(TesseractConfig config_) : config(config_) {
104106
config.dem = common::remove_zero_probability_errors(config.dem);
105107
if (config.det_orders.empty()) {
@@ -206,7 +208,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections)
206208
}
207209

208210
void TesseractDecoder::flip_detectors_and_block_errors(
209-
size_t detector_order, const std::vector<size_t>& errors, std::vector<char>& detectors,
211+
size_t detector_order, const std::vector<size_t>& errors, boost::dynamic_bitset<>& detectors,
210212
std::vector<DetectorCostTuple>& detector_cost_tuples) const {
211213
for (size_t ei : errors) {
212214
size_t min_detector = std::numeric_limits<size_t>::max();
@@ -217,15 +219,15 @@ void TesseractDecoder::flip_detectors_and_block_errors(
217219
}
218220
}
219221

220-
for (size_t oei : d2e[min_detector]) {
222+
for (int oei : d2e[min_detector]) {
221223
detector_cost_tuples[oei].error_blocked = 1;
222224
if (!config.at_most_two_errors_per_detector && oei == ei) break;
223225
}
224226

225-
for (size_t d : edets[ei]) {
227+
for (int d : edets[ei]) {
226228
detectors[d] = !detectors[d];
227229
if (!detectors[d] && config.at_most_two_errors_per_detector) {
228-
for (size_t oei : d2e[d]) {
230+
for (int oei : d2e[d]) {
229231
detector_cost_tuples[oei].error_blocked = 1;
230232
}
231233
}
@@ -239,10 +241,9 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
239241
low_confidence_flag = false;
240242

241243
std::priority_queue<Node, std::vector<Node>, std::greater<Node>> pq;
242-
std::unordered_map<size_t, std::unordered_set<std::vector<char>, VectorCharHash>>
243-
visited_detectors;
244+
std::unordered_map<size_t, std::unordered_set<boost::dynamic_bitset<>>> visited_detectors;
244245

245-
std::vector<char> initial_detectors(num_detectors, false);
246+
boost::dynamic_bitset<> initial_detectors(num_detectors, false);
246247
std::vector<DetectorCostTuple> initial_detector_cost_tuples(num_errors);
247248

248249
for (size_t d : detections) {
@@ -266,7 +267,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
266267
size_t max_num_detectors = min_num_detectors + detector_beam;
267268

268269
std::vector<size_t> next_errors;
269-
std::vector<char> next_detectors;
270+
boost::dynamic_bitset<> next_detectors;
270271
std::vector<DetectorCostTuple> next_detector_cost_tuples;
271272

272273
pq.push({initial_cost, min_num_detectors, std::vector<size_t>()});
@@ -278,7 +279,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
278279

279280
if (node.num_detectors > max_num_detectors) continue;
280281

281-
std::vector<char> detectors = initial_detectors;
282+
boost::dynamic_bitset<> detectors = initial_detectors;
282283
std::vector<DetectorCostTuple> detector_cost_tuples(num_errors);
283284
flip_detectors_and_block_errors(detector_order, node.errors, detectors, detector_cost_tuples);
284285

@@ -363,7 +364,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
363364
size_t prev_ei = std::numeric_limits<size_t>::max();
364365
std::vector<double> detector_cost_cache(num_detectors, -1);
365366

366-
for (size_t ei : d2e[min_detector]) {
367+
for (int ei : d2e[min_detector]) {
367368
if (detector_cost_tuples[ei].error_blocked) continue;
368369

369370
if (prev_ei != std::numeric_limits<size_t>::max()) {
@@ -398,7 +399,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
398399
}
399400

400401
if (!next_detectors[d] && config.at_most_two_errors_per_detector) {
401-
for (size_t oei : d2e[d]) {
402+
for (int oei : d2e[d]) {
402403
next_detector_cost_tuples[oei].error_blocked =
403404
next_detector_cost_tuples[oei].error_blocked == 1
404405
? 1
@@ -426,7 +427,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
426427
}
427428
}
428429

429-
for (size_t od : eneighbors[ei]) {
430+
for (int od : eneighbors[ei]) {
430431
if (!detectors[od] || !next_detectors[od]) continue;
431432
if (detector_cost_cache[od] == -1) {
432433
detector_cost_cache[od] = get_detcost(od, detector_cost_tuples);

src/tesseract.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
#ifndef TESSERACT_DECODER_H
1616
#define TESSERACT_DECODER_H
1717

18+
#include <boost/dynamic_bitset.hpp>
1819
#include <queue>
1920
#include <string>
2021
#include <unordered_map>
@@ -101,7 +102,7 @@ struct TesseractDecoder {
101102
void initialize_structures(size_t num_detectors);
102103
double get_detcost(size_t d, const std::vector<DetectorCostTuple>& detector_cost_tuples) const;
103104
void flip_detectors_and_block_errors(size_t detector_order, const std::vector<size_t>& errors,
104-
std::vector<char>& detectors,
105+
boost::dynamic_bitset<>& detectors,
105106
std::vector<DetectorCostTuple>& detector_cost_tuples) const;
106107
};
107108

0 commit comments

Comments
 (0)