perf: fix multiprocessing timing measurement #37

paulorsousa · 2025-07-23T11:33:03Z

Fixes timing measurement accuracy and moves vector preprocessing outside of timed sections. The changes result in better performance measurements.

Summary

Framework Overhead Reduction: The optimizations reduce the performance penalty from -9.6% to -4.3% vs vanilla Python
Measurement Accuracy: Better timing precision reveals true performance characteristics
Headroom Available: Still 32.6% gap to redis-benchmark ceiling, indicating further optimization potential

Key Changes

Move vector-to-bytes conversion outside timing measurements: Vector preprocessing now occurs before timing starts, ensuring measurements only capture actual search performance
Fix multiprocessing timing accuracy: Track actual worker start times instead of process creation time for accurate parallel execution timing

Performance Analysis

All comparisons are made relative to Vanilla Python baseline performance (10,322 QPS) using 25 clients/processes:

Benchmark Commands

1. Base version: 9.3K QPS

docker run --network=host -v datasets:/app/datasets redis/vector-db-benchmark:latest run.py --host localhost --engines vectorsets-fp32-default --datasets glove-100-angular --parallels 100 --skip-upload

2. redis-benchmark: 14.7K QPS

docker run --rm --network=host redis/redis-stack-server redis-benchmark -c 25 -h localhost -p 6379 VSIM idx "FP32" $'\x00\x00\x80\x3f\x00\x00\x00\x3f\x00\x00\x40\x3f\x00\x00\x80\x3f\x00\x00\xa0\x3f\x00\x00\xc0\x3f\x00\x00\xe0\x3f\x00\x00\x00\x40\x00\x00\x10\x40\x00\x00\x20\x40\x00\x00\x30\x40\x00\x00\x40\x40\x00\x00\x50\x40\x00\x00\x60\x40\x00\x00\x70\x40\x00\x00\x80\x40\x00\x00\x88\x40\x00\x00\x90\x40\x00\x00\x98\x40\x00\x00\xa0\x40\x00\x00\xa8\x40\x00\x00\xb0\x40\x00\x00\xb8\x40\x00\x00\xc0\x40\x00\x00\xc8\x40' "WITHSCORES" "COUNT" "100" "EF" "64"

3. Vanilla Python: 10.3K QPS

python benchmark_vsim.py

4. This version: 9.9K QPS

docker run --network=host -v datasets:/app/datasets my-vector-db-bench run.py --host localhost --engines vectorsets-fp32-default --datasets glove-100-angular --parallels 100 --skip-upload

Performance Comparison Summary

Method	QPS	vs Vanilla Python	Performance Gap
Vanilla Python	10,322	baseline	-
Redis-benchmark	14,656	+42.0%	theoretical maximum
Original Version (prev. PR)	9,331	-9.6%	545 QPS below baseline
New Version (this PR)	9,876	-4.3%	446 QPS below baseline

Key Performance Insights

Timing Accuracy Gains

The optimization brings benchmark results significantly closer to vanilla Python baseline:

Gap reduction: Previous PR was -9.6% vs vanilla Python, now only -4.3%
Performance recovery: +545 QPS improvement (9,331 → 9,876 QPS)
Improved measurement precision: More accurate timing leads to better performance characterization

redis-benchmark

Redis-benchmark remains the performance ceiling at 14,656 QPS (+42.0% vs vanilla Python), representing the theoretical maximum for direct Redis usage without both framework and Python overhead.

This PR significantly closes the gap between our benchmark framework and vanilla Python baseline:

Before: 9,331 QPS (-9.6% vs vanilla Python, -63.6% of redis-benchmark performance)
After: 9,876 QPS (-4.3% vs vanilla Python, -67.4% of redis-benchmark performance)

- Move vector-to-bytes conversion outside timing measurements - Track actual worker start times for accurate parallel timing - Refactor worker function for compatibility with newer Python versions

perf: fix multiprocessing timing measurement

c57bad2

- Move vector-to-bytes conversion outside timing measurements - Track actual worker start times for accurate parallel timing - Refactor worker function for compatibility with newer Python versions

paulorsousa requested a review from fcostaoliveira July 23, 2025 11:33

filipecosta90 approved these changes Jul 23, 2025

View reviewed changes

fcostaoliveira merged commit a9a7488 into update.redisearch Jul 25, 2025
8 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: fix multiprocessing timing measurement #37

perf: fix multiprocessing timing measurement #37

Uh oh!

paulorsousa commented Jul 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

perf: fix multiprocessing timing measurement #37

perf: fix multiprocessing timing measurement #37

Uh oh!

Conversation

paulorsousa commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Performance Analysis

Benchmark Commands

1. Base version: 9.3K QPS

2. redis-benchmark: 14.7K QPS

3. Vanilla Python: 10.3K QPS

4. This version: 9.9K QPS

Performance Comparison Summary

Key Performance Insights

Timing Accuracy Gains

redis-benchmark

Uh oh!

Uh oh!

Uh oh!

paulorsousa commented Jul 23, 2025 •

edited

Loading