feat: add precision-summary field with clean QPS, P50, P95 metrics (#34)

filipecosta90 · fcostaoliveira · web-flow · commit 6ebf3388de13 · 2025-07-13T16:27:49.000+01:00
* feat: add precision-summary field with clean QPS, P50, P95 metrics

- Add precision-summary field alongside existing precision field
- Contains simplified dict with just qps, p50, p95 for each precision level
- P50/P95 values converted to milliseconds for consistency
- QPS rounded to 1 decimal place for readability
- Maintains backward compatibility with existing precision field
- Enables easier parsing and analysis of performance metrics

* Update README.md to include easy docker steps and how to check results fast.

---------

Co-authored-by: fcostaoliveira &lt;filipe@redis.com&gt;
diff --git a/README.md b/README.md
@@ -16,6 +16,84 @@ scenario against which it should be tested. A specific scenario may assume
 running the server in a single or distributed mode, a different client
 implementation and the number of client instances.
 
+
+## Quick Start
+
+### Quick Start with Docker
+
+The easiest way to run vector-db-benchmark is using Docker. We provide pre-built images on Docker Hub.
+
+```bash
+# Pull the latest image
+docker pull filipe958/vector-db-benchmark:latest
+
+# Run with help
+docker run --rm filipe958/vector-db-benchmark:latest run.py --help
+
+# Check which datasets are available
+docker run --rm filipe958/vector-db-benchmark:latest run.py --describe datasets
+
+# Basic Redis benchmark with local Redis
+docker run --rm -v $(pwd)/results:/app/results --network=host \
+  filipe958/vector-db-benchmark:latest \
+  run.py --host localhost --engines redis-default-simple --datasets glove-25-angular
+
+# At the end of the run, you will find the results in the `results` directory. Lets open the summary one, in the precision summary
+
+$ jq ".precision_summary" results/*-summary.json
+{
+  "0.91": {
+    "qps": 1924.5,
+    "p50": 49.828,
+    "p95": 58.427
+  },
+  "0.94": {
+    "qps": 1819.9,
+    "p50": 51.68,
+    "p95": 66.83
+  },
+  "0.9775": {
+    "qps": 1477.8,
+    "p50": 65.368,
+    "p95": 73.849
+  },
+  "0.9950": {
+    "qps": 1019.8,
+    "p50": 95.115,
+    "p95": 106.73
+  }
+}
+```
+
+### Using with Redis
+
+For testing with Redis, start a Redis container first:
+
+```bash
+# Start Redis container
+docker run -d --name redis-test -p 6379:6379 redis:8.2-rc1-bookworm
+
+# Run benchmark against Redis
+
+docker run --rm -v $(pwd)/results:/app/results --network=host \
+  filipe958/vector-db-benchmark:latest \
+  run.py --host localhost --engines redis-default-simple --dataset random-100
+
+# Or use the convenience script
+./docker-run.sh -H localhost -e redis-default-simple -d random-100
+
+
+# Clean up Redis container when done
+docker stop redis-test && docker rm redis-test
+```
+
+### Available Docker Images
+
+- **Latest**: `filipe958/vector-db-benchmark:latest`
+
+For detailed Docker setup and publishing information, see [DOCKER_SETUP.md](DOCKER_SETUP.md).
+
+
 ## Data sets
 
 We have a number of precomputed data sets. All data sets have been pre-split into train/test and include ground truth data for the top-100 nearest neighbors.
@@ -71,59 +149,6 @@ We have a number of precomputed data sets. All data sets have been pre-split int
 | Random Match Keyword Small Vocab-256: Small vocabulary keyword matching (no filters)                       |        256 |   1,000,000 |    10,000 |       100 | Cosine    |
 
 
-## 🐳 Docker Usage
-
-The easiest way to run vector-db-benchmark is using Docker. We provide pre-built images on Docker Hub.
-
-### Quick Start with Docker
-
-```bash
-# Pull the latest image
-docker pull filipe958/vector-db-benchmark:latest
-
-# Run with help
-docker run --rm filipe958/vector-db-benchmark:latest run.py --help
-
-
-# Basic Redis benchmark with local Redis (recommended)
-docker run --rm -v $(pwd)/results:/app/results --network=host \
-  filipe958/vector-db-benchmark:latest \
-  run.py --host localhost --engines redis-default-simple --dataset random-100
-
-# Without results output
-docker run --rm --network=host filipe958/vector-db-benchmark:latest \
-  run.py --host localhost --engines redis-default-simple --dataset random-100
-
-```
-
-### Using with Redis
-
-For testing with Redis, start a Redis container first:
-
-```bash
-# Start Redis container
-docker run -d --name redis-test -p 6379:6379 redis:8.2-rc1-bookworm
-
-# Run benchmark against Redis
-
-docker run --rm -v $(pwd)/results:/app/results --network=host \
-  filipe958/vector-db-benchmark:latest \
-  run.py --host localhost --engines redis-default-simple --dataset random-100
-
-# Or use the convenience script
-./docker-run.sh -H localhost -e redis-default-simple -d random-100
-
-
-# Clean up Redis container when done
-docker stop redis-test && docker rm redis-test
-```
-
-### Available Docker Images
-
-- **Latest**: `filipe958/vector-db-benchmark:latest`
-
-For detailed Docker setup and publishing information, see [DOCKER_SETUP.md](DOCKER_SETUP.md).
-
 ## How to run a benchmark?
 
 Benchmarks are implemented in server-client mode, meaning that the server is
diff --git a/engine/base_client/client.py b/engine/base_client/client.py
@@ -34,9 +34,16 @@ def format_precision_key(precision_value: float) -> str:
         return f"{rounded:.4f}"
 
 
-def analyze_precision_performance(search_results: Dict[str, Any]) -> Dict[str, Dict[str, Any]]:
-    """Analyze search results to find best RPS at each actual precision level achieved."""
+def analyze_precision_performance(search_results: Dict[str, Any]) -> tuple[Dict[str, Dict[str, Any]], Dict[str, Dict[str, float]]]:
+    """Analyze search results to find best RPS at each actual precision level achieved.
+
+    Returns:
+        tuple: (precision_dict, precision_summary_dict)
+        - precision_dict: Full precision analysis with config details
+        - precision_summary_dict: Simplified summary with just QPS, P50, P95
+    """
     precision_dict = {}
+    precision_summary_dict = {}
 
     # First, collect all actual precision levels achieved by experiments and format them
     precision_mapping = {}  # Maps formatted precision to actual precision
@@ -53,6 +60,8 @@ def analyze_precision_performance(search_results: Dict[str, Any]) -> Dict[str, D
         best_rps = 0
         best_config = None
         best_experiment_id = None
+        best_p50_time = 0
+        best_p95_time = 0
 
         for experiment_id, experiment_data in search_results.items():
             mean_precision = experiment_data["results"]["mean_precisions"]
@@ -66,16 +75,26 @@ def analyze_precision_performance(search_results: Dict[str, Any]) -> Dict[str, D
                     "search_params": experiment_data["params"]["search_params"]
                 }
                 best_experiment_id = experiment_id
+                best_p50_time = experiment_data["results"]["p50_time"]
+                best_p95_time = experiment_data["results"]["p95_time"]
 
         # Add to precision dict with the formatted precision as key
         if best_config is not None:
+            # Full precision analysis (existing format)
             precision_dict[formatted_precision] = {
                 "rps": best_rps,
                 "config": best_config,
                 "experiment_id": best_experiment_id
             }
 
-    return precision_dict
+            # Simplified precision summary
+            precision_summary_dict[formatted_precision] = {
+                "qps": round(best_rps, 1),
+                "p50": round(best_p50_time * 1000, 3),  # Convert to ms
+                "p95": round(best_p95_time * 1000, 3)   # Convert to ms
+            }
+
+    return precision_dict, precision_summary_dict
 
 warnings.filterwarnings("ignore", category=DeprecationWarning)
 
@@ -285,10 +304,12 @@ def run_experiment(
 
         # Add precision analysis if search results exist
         if results["search"]:
-            precision_analysis = analyze_precision_performance(results["search"])
+            precision_analysis, precision_summary = analyze_precision_performance(results["search"])
             if precision_analysis:  # Only add if we have precision data
                 results["precision"] = precision_analysis
+                results["precision_summary"] = precision_summary
                 print(f"Added precision analysis with {len(precision_analysis)} precision thresholds")
+                print(f"Added precision summary with {len(precision_summary)} precision levels")
 
         summary_file = f"{self.name}-{dataset.config.name}-summary.json"
         summary_path = RESULTS_DIR / summary_file