Hashing fired detectors with boost::dynamic_bitset (#57)

draganaurosgrbic · noajshu · LalehB · web-flow · commit 8699f2d5dc26 · 2025-07-30T11:16:02.000-07:00
### Hashing Syndrome Patterns with `boost::dynamic_bitset` In this PR, I address a key performance bottleneck: the hashing of fired detector patterns (syndrome patterns). I introduce the use of `boost::dynamic_bitset` from the Boost library, a data structure that combines the memory-saving bit-packing feature of `std::vector<bool>` with highly optimized bit-wise operations and built-in hashing, enabling fast access and modification operations like in `std::vector<char>`. Crucially, `boost::dynamic_bitset` also provides highly optimized, built-in functions for efficiently hashing sequences of boolean elements. --- ### Initial Optimization: `std::vector<bool>` to `std::vector<char>` The initial _Tesseract_ implementation, as documented in #25, utilized `std::vector<bool>` to store patterns of fired detectors and predicates that block specific errors from being added to the current error hypothesis. While `std::vector<bool>` optimizes memory usage by packing elements into individual bits, accessing and modifying its elements is highly inefficient due to its reliance on proxy objects that perform costly bit-wise operations (shifting, masking). Given _Tesseract_'s frequent access and modification of these elements, this caused significant performance overheads. In #25, I transitioned from `std::vector<bool>` to `std::vector<char>`. This change made boolean elements addressable bytes, enabling efficient and direct byte-level access. Although this increased memory footprint (as each boolean was stored as a full byte), it delivered substantial performance gains by eliminating `std::vector<bool>`'s proxy objects and their associated overheads for element access and modification. Speedups achieved with this initial optimization were significant: * For Color Codes, speedups reached 17.2%-32.3% * For Bivariate-Bicycle Codes, speedups reached 13.0%-22.3% * For Surface Codes, speedups reached 33.4%-42.5% * For Transversal CNOT Protocols, speedups reached 12.2%-32.4% These significant performance gains highlight the importance of choosing appropriate data structures for boolean sequences, especially in performance-sensitive applications like _Tesseract_. The remarkable 42.5% speedup achieved in Surface Codes with this initial switch underscores the substantial overhead caused by unsuitable data structures. The performance gain from removing `std::vector<bool>`'s proxy objects and their inefficient operations far outweighed any overhead from increased memory consumption. --- ### Current Bottleneck: `std::vector<char>` and Hashing Following the optimizations in #25, _Tesseract_ continued to use `std::vector<char>` for storing and managing patterns of fired detectors and predicates that block errors. Subsequently, PR #34 replaced and merged vectors of blocked errors into the `DetectorCostTuple` structure, which efficiently stores `error_blocked` and `detectors_count` as `uint32_t` fields (reasons explained in #34). These changes left vectors of fired detectors as the sole remaining `std::vector<char>` data structure in this context. After implementing and evaluating optimizations in #25, #27, #34, and #45, profiling _Tesseract_ to analyze remaining bottlenecks revealed that, aside from the `get_detcost` function, a notable bottleneck emerged: `VectorCharHash` (originally `VectorBoolHash`). This function is responsible for hashing patterns of fired detectors to prevent re-exploring previously visited syndrome states. The implementation of `VectorCharHash` involved iterating through each element, byte by byte, and accumulating the hash. Even though this function saw significant speedups with the initial switch from `std::vector<bool>` to `std::vector<char>`, hashing patterns of fired detectors still consumed considerable time. Post-optimization profiling (after #25, #27, #34, and #45) revealed that this hashing function consumed approximately 25% of decoding time in Surface Codes, 30% in Transversal CNOT Protocols, 10% in Color Codes, and 2% in Bivariate-Bicycle Codes (`get_detcost` remained the primary bottleneck for Bivariate-Bicycle Codes). Therefore, I decided to explore opportunities to further optimize this function and enhance the decoding speed. --- ### Solution: Introducing `boost::dynamic_bitset` This PR addresses the performance bottleneck of hashing fired detector patterns and mitigates the increased memory footprint from the initial switch to `std::vector<char>` by introducing the `boost::dynamic_bitset` data structure. The C++ standard library's `std::bitset` offers an ideal conceptual solution: memory-efficient bit-packed storage (like `std::vector<bool>`) combined with highly efficient access and modification operations (like `std::vector<char>`). This data structure achieves efficient access and modification by employing highly optimized bit-wise operations, thereby reducing performance overhead stemming from proxy objects in `std::vector<bool>`. However, `std::bitset` requires a static size (determined at compile-time), rendering it unsuitable for _Tesseract_'s dynamically sized syndrome patterns. The Boost library's `boost::dynamic_bitset` provides the perfect solution by offering dynamic-sized bit arrays whose dimensions can be determined at runtime. This data structure brilliantly combines the memory efficiency of `std::vector<bool>` (by packing elements into individual bits) with the performance benefits of direct element access and modification, similar to `std::vector<char>`. This is achieved by internally storing bits within a contiguous array of fundamental integer types (e.g., `unsigned long` or `uint64_t`) and accessing/modifying elements using highly optimized bit-wise operations, thus avoiding the overheads of `std::vector<bool>`'s proxy objects and costly bit-wise operations. Furthermore, `boost::dynamic_bitset` offers highly optimized, built-in hashing functions, replacing our custom, less efficient byte-by-byte hashing and resulting in a cleaner, faster implementation. --- ### Performance Evaluation: Individual Impact of Optimization I performed two types of experiments to evaluate the achieved performance gains. First, I conducted extensive benchmarks across various code families and configurations to evaluate the individual performance gains achieved by this specific optimization. Speedups achieved include: * For Surface Codes: 8.0%-24.7% * For Transversal CNOT Protocols: 12.1%-26.8% * For Color Codes: 3.6%-7.0% * For Bivariate-Bicycle Codes: 0.5%-4.8% These results highlight the highest impact in Surface Codes and Transversal CNOT Protocols, which aligns with the initial profiling data that showcased these code families were spending more time in the original `VectorCharHash` function. --- #### Speedups in Surface Codes <img width="1990" height="989" alt="img1" src="https://github.com/user-attachments/assets/04044da5-a980-4282-a6fe-4debfa815f41" /> --- #### Speedups in Transversal CNOT Protocols <img width="1990" height="989" alt="img2" src="https://github.com/user-attachments/assets/f79e4d7d-5cfc-4077-be1a-13ef92a2d65a" /> <img width="1990" height="989" alt="img3" src="https://github.com/user-attachments/assets/35a9b672-07d3-45ea-9334-23dd85760925" /> --- #### Speedups in Color Codes <img width="1990" height="989" alt="img4" src="https://github.com/user-attachments/assets/2b52c4fd-5137-47f0-9bae-7c667c740ff0" /> <img width="1990" height="989" alt="img5" src="https://github.com/user-attachments/assets/e7883dec-5a88-4b2b-914b-3d12a1843d6f" /> --- #### Speedups in Bivariate-Bicycle Codes <img width="1990" height="989" alt="img6" src="https://github.com/user-attachments/assets/bd530a3b-da17-4ac1-bf68-702aaafe6047" /> <img width="1990" height="989" alt="img7" src="https://github.com/user-attachments/assets/2d2f2576-0b16-4f0a-b8a2-221723250945" /> --- ### Performance Evaluation: Cumulative Speedup Following the evaluation of individual performance gains, I analyzed the cumulative effect of the optimizations implemented across PRs #25, #27, #34, and #45. The cumulative speedups achieved are: * For Color Codes: 40.7%-54.8% * For Bivariate-Bicycle Codes: 41.5%-80.3% * For Surface Codes: 50.0%-62.4% * For Transversal CNOT Protocols: 57.8%-63.6% These results demonstrate that my optimizations achieved over 2x speedup in Color Codes, over 2.5x speedup in Surface Codes and Transversal CNOT Protocols, and over 5x speedup in Bivariate-Bicycle Codes. --- #### Speedups in Color Codes <img width="1990" height="989" alt="img1" src="https://github.com/user-attachments/assets/cd81dc98-8599-4740-b00c-4ff396488f69" /> <img width="1990" height="989" alt="img2" src="https://github.com/user-attachments/assets/c337ddcf-44f0-4641-91df-2a6d3c586680" /> --- #### Speedups in Bivariate-Bicycle Codes <img width="1990" height="989" alt="img3" src="https://github.com/user-attachments/assets/a57cf9e2-4c2c-44e8-8a6e-1860b1544cbd" /> <img width="1990" height="989" alt="img4" src="https://github.com/user-attachments/assets/fde60159-fd7f-4893-b30d-34da844ac452" /> --- #### Speedups in Surface Codes <img width="1990" height="989" alt="img5" src="https://github.com/user-attachments/assets/57234d33-201b-41a9-b867-15e9ff87e666" /> --- #### Speedups in Transversal CNOT Protocols <img width="1990" height="989" alt="img6" src="https://github.com/user-attachments/assets/5780843d-2055-4870-9454-50184a268ad1" /> --- ### Conclusion These results demonstrate that the `boost::dynamic_bitset` optimization significantly impacts code families where the original hashing function (`VectorCharHash`) was a primary bottleneck (Surface Codes and Transversal CNOT Protocols). The substantial speedups achieved in these code families validate that `boost::dynamic_bitset` provides demonstrably more efficient hashing and bit-wise operations. For code families where hashing was less of a bottleneck (Color Codes and Bivariate-Bicycle Codes), the speedups were modest, reinforcing that `std::vector<char>` can remain highly efficient even with increased memory usage when bit packing is not the primary performance concern. Crucially, this optimization delivers comparable or superior performance to `std::vector<char>` while simultaneously reducing memory footprint, providing additional speedups where hashing performance is critical. --- ### Key Contributions * Identified the hashing of syndrome patterns as the primary remaining bottleneck in Surface Codes and Transversal CNOT Protocols, post prior optimizations (#25, #27, #34, #45). * Adopted `boost::dynamic_bitset` as a superior data structure, combining `std::vector<bool>`'s memory efficiency with high-performance bit-wise operations and built-in hashing, enabling fast access and modification operations like in `std::vector<char>` * Replaced `std::vector<char>` with `boost::dynamic_bitset` for storing syndrome patterns. * Performed extensive benchmarking to evaluate both the individual impact of this optimization and its cumulative effect with prior PRs. * Achieved significant individual speedups (e.g., 8.0%-24.7% in Surface Codes, 12.1%-26.8% in Transversal CNOT Protocols) and substantial cumulative speedups (over 2x in Color Codes, over 2.5x in Surface Codes and Transversal CNOT Protocols, and over 5x in Bivariate-Bicycle Codes). PR #47 contains the scripts I used for benchmarking and plotting the results. --------- Signed-off-by: Dragana Grbic <draganaurosgrbic@gmail.com> Co-authored-by: noajshu <shutty@google.com> Co-authored-by: LaLeh <lalehbeni@google.com>
diff --git a/WORKSPACE b/WORKSPACE
@@ -66,3 +66,22 @@ http_archive(
     urls = ["https://github.com/bazelbuild/platforms/archive/refs/tags/0.0.6.zip"],
     strip_prefix = "platforms-0.0.6",
 )
+
+
+
+
+BOOST_VERSION = "1.83.0"
+BOOST_ARCHIVE_NAME = "boost_{}".format(BOOST_VERSION.replace(".", "_"))
+
+http_archive(
+    name = "boost",
+    urls = [
+        "https://archives.boost.io/release/{}/source/{}.tar.gz".format(
+            BOOST_VERSION,
+            BOOST_ARCHIVE_NAME,
+        )
+    ],
+    strip_prefix = BOOST_ARCHIVE_NAME,
+    sha256 = "c0685b68dd44cc46574cce86c4e17c0f611b15e195be9848dfd0769a0a207628",
+    build_file = "//external:boost.BUILD",
+)
diff --git a/external/boost.BUILD b/external/boost.BUILD
@@ -0,0 +1,16 @@
+# external/boost.BUILD
+package(default_visibility = ["//visibility:public"])
+
+# A cc_library for the Boost headers themselves.
+cc_library(
+    name = "boost_headers",
+    hdrs = glob(["boost/**/*.hpp"]),
+    includes = ["."],
+)
+
+# A specific target for dynamic_bitset, which is header-only
+# and depends on the main headers.
+cc_library(
+    name = "dynamic_bitset",
+    deps = [":boost_headers"],
+)
diff --git a/src/BUILD b/src/BUILD
@@ -121,6 +121,7 @@ cc_library(
     linkopts = OPT_LINKOPTS,
     deps = [
         ":libutils",
+        "@boost//:dynamic_bitset",
     ],
 )
 
diff --git a/src/tesseract.cc b/src/tesseract.cc
@@ -15,7 +15,9 @@
 #include "tesseract.h"
 
 #include <algorithm>
+#include <boost/functional/hash.hpp>  // For boost::hash_range
 #include <cassert>
+#include <functional>  // For std::hash (though not strictly necessary here, but good practice)
 #include <iostream>
 
 namespace {
@@ -37,6 +39,17 @@ std::ostream& operator<<(std::ostream& os, const std::vector<T>& vec) {
 
 };  // namespace
 
+namespace std {
+template <>
+struct hash<boost::dynamic_bitset<>> {
+  size_t operator()(const boost::dynamic_bitset<>& bs) const {
+    // Delegate to Boost's internal hash_value for dynamic_bitset
+    // This is the correct and most efficient way.
+    return boost::hash_value(bs);
+  }
+};
+}  // namespace std
+
 std::string TesseractConfig::str() {
   auto& config = *this;
   std::stringstream ss;
@@ -73,7 +86,7 @@ double TesseractDecoder::get_detcost(
   ErrorCost ec;
   DetectorCostTuple dct;
 
-  for (size_t ei : d2e[d]) {
+  for (int ei : d2e[d]) {
     ec = error_costs[ei];
     if (ec.min_cost >= min_cost) break;
 
@@ -89,17 +102,6 @@ double TesseractDecoder::get_detcost(
   return min_cost + config.det_penalty;
 }
 
-struct VectorCharHash {
-  size_t operator()(const std::vector<char>& v) const {
-    size_t seed = v.size();
-
-    for (char el : v) {
-      seed = seed * 31 + static_cast<size_t>(el);
-    }
-    return seed;
-  }
-};
-
 TesseractDecoder::TesseractDecoder(TesseractConfig config_) : config(config_) {
   config.dem = common::remove_zero_probability_errors(config.dem);
   if (config.det_orders.empty()) {
@@ -206,7 +208,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections)
 }
 
 void TesseractDecoder::flip_detectors_and_block_errors(
-    size_t detector_order, const std::vector<size_t>& errors, std::vector<char>& detectors,
+    size_t detector_order, const std::vector<size_t>& errors, boost::dynamic_bitset<>& detectors,
     std::vector<DetectorCostTuple>& detector_cost_tuples) const {
   for (size_t ei : errors) {
     size_t min_detector = std::numeric_limits<size_t>::max();
@@ -217,15 +219,15 @@ void TesseractDecoder::flip_detectors_and_block_errors(
       }
     }
 
-    for (size_t oei : d2e[min_detector]) {
+    for (int oei : d2e[min_detector]) {
       detector_cost_tuples[oei].error_blocked = 1;
       if (!config.at_most_two_errors_per_detector && oei == ei) break;
     }
 
-    for (size_t d : edets[ei]) {
+    for (int d : edets[ei]) {
       detectors[d] = !detectors[d];
       if (!detectors[d] && config.at_most_two_errors_per_detector) {
-        for (size_t oei : d2e[d]) {
+        for (int oei : d2e[d]) {
           detector_cost_tuples[oei].error_blocked = 1;
         }
       }
@@ -239,10 +241,9 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
   low_confidence_flag = false;
 
   std::priority_queue<Node, std::vector<Node>, std::greater<Node>> pq;
-  std::unordered_map<size_t, std::unordered_set<std::vector<char>, VectorCharHash>>
-      visited_detectors;
+  std::unordered_map<size_t, std::unordered_set<boost::dynamic_bitset<>>> visited_detectors;
 
-  std::vector<char> initial_detectors(num_detectors, false);
+  boost::dynamic_bitset<> initial_detectors(num_detectors, false);
   std::vector<DetectorCostTuple> initial_detector_cost_tuples(num_errors);
 
   for (size_t d : detections) {
@@ -266,7 +267,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
   size_t max_num_detectors = min_num_detectors + detector_beam;
 
   std::vector<size_t> next_errors;
-  std::vector<char> next_detectors;
+  boost::dynamic_bitset<> next_detectors;
   std::vector<DetectorCostTuple> next_detector_cost_tuples;
 
   pq.push({initial_cost, min_num_detectors, std::vector<size_t>()});
@@ -278,7 +279,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
 
     if (node.num_detectors > max_num_detectors) continue;
 
-    std::vector<char> detectors = initial_detectors;
+    boost::dynamic_bitset<> detectors = initial_detectors;
     std::vector<DetectorCostTuple> detector_cost_tuples(num_errors);
     flip_detectors_and_block_errors(detector_order, node.errors, detectors, detector_cost_tuples);
 
@@ -363,7 +364,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
     size_t prev_ei = std::numeric_limits<size_t>::max();
     std::vector<double> detector_cost_cache(num_detectors, -1);
 
-    for (size_t ei : d2e[min_detector]) {
+    for (int ei : d2e[min_detector]) {
       if (detector_cost_tuples[ei].error_blocked) continue;
 
       if (prev_ei != std::numeric_limits<size_t>::max()) {
@@ -398,7 +399,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
         }
 
         if (!next_detectors[d] && config.at_most_two_errors_per_detector) {
-          for (size_t oei : d2e[d]) {
+          for (int oei : d2e[d]) {
             next_detector_cost_tuples[oei].error_blocked =
                 next_detector_cost_tuples[oei].error_blocked == 1
                     ? 1
@@ -426,7 +427,7 @@ void TesseractDecoder::decode_to_errors(const std::vector<uint64_t>& detections,
         }
       }
 
-      for (size_t od : eneighbors[ei]) {
+      for (int od : eneighbors[ei]) {
         if (!detectors[od] || !next_detectors[od]) continue;
         if (detector_cost_cache[od] == -1) {
           detector_cost_cache[od] = get_detcost(od, detector_cost_tuples);
diff --git a/src/tesseract.h b/src/tesseract.h
@@ -15,6 +15,7 @@
 #ifndef TESSERACT_DECODER_H
 #define TESSERACT_DECODER_H
 
+#include <boost/dynamic_bitset.hpp>
 #include <queue>
 #include <string>
 #include <unordered_map>
@@ -101,7 +102,7 @@ struct TesseractDecoder {
   void initialize_structures(size_t num_detectors);
   double get_detcost(size_t d, const std::vector<DetectorCostTuple>& detector_cost_tuples) const;
   void flip_detectors_and_block_errors(size_t detector_order, const std::vector<size_t>& errors,
-                                       std::vector<char>& detectors,
+                                       boost::dynamic_bitset<>& detectors,
                                        std::vector<DetectorCostTuple>& detector_cost_tuples) const;
 };
 

Original file line number	Diff line number	Diff line change
`@@ -121,6 +121,7 @@ cc_library(`
`121`	`121`	`linkopts = OPT_LINKOPTS,`
`122`	`122`	`deps = [`
`123`	`123`	`":libutils",`
	`124`	`+ "@boost//:dynamic_bitset",`
`124`	`125`	`],`
`125`	`126`	`)`
`126`	`127`