Fixed docker permissions on results folder. (#33)

filipecosta90 · fcostaoliveira · web-flow · commit cbe4c8363e31 · 2025-07-13T14:47:03.000+01:00
* Add comprehensive Docker CI/CD pipeline

- Enhanced Dockerfile with multi-stage build and security best practices
- Added Docker build, run, and test scripts with Redis-specific configurations
- Created GitHub Actions workflows for PR validation, master publishing, and release publishing
- Added docker-compose.yml for local development with Redis
- Updated documentation with Docker usage examples
- Configured for redis-performance/vector-db-benchmark Docker Hub repository
- Default configuration: engines=redis, dataset=random-100, experiment=redis-m-16-ef-64
- Multi-platform support (linux/amd64, linux/arm64)
- Security scanning with Trivy for releases

* Update Docker workflows for update-redisearch default branch

- Updated PR validation to trigger on update-redisearch branch
- Updated publishing workflow to use update-redisearch branch instead of master
- Updated Docker tags to use update-redisearch-{sha} format
- Updated documentation to reflect correct default branch

* Corrected docker repo, base branch, and test-image of redis.

* fixed missing redis container

* feat: enhance benchmark functionality with dataset discovery, validation, and performance monitoring

- Add --describe command for datasets and engines with columnar display
- Implement real-time performance summaries (QPS, P50/P95 latency)
- Add comprehensive dataset validation system with GitHub Actions
- Complete dataset metadata with vector_count and description fields
- Improve download reliability with proper HTTP headers
- Standardize precision formatting (0.01 increments up to 0.97, then 0.0025)
- Enhanced Docker configurations for better Redis testing defaults
- Add validation documentation and automated CI/CD checks

This maintains backward compatibility while significantly improving usability,
data quality, and performance insights for vector database benchmarking.

* Moved validate and update datasets to scripts folder

* Moved validate and update datasets to scripts folder

* fix: use Poetry with --no-root flag for GitHub Action dependencies

- Add Poetry installation to validate-datasets workflow
- Use --no-root to install dependencies without packaging the project
- Run validation script with 'poetry run' to access all dependencies
- Fixes ModuleNotFoundError for stopit and other dependencies when testing --describe functionality

* Added boto3 dependency

* Added basic test for RediSearch

* Updated deps to work for python 3.12. fixed deprecation warnings

* Updated poetry lock

* Adding redis-tools to the verify step (redis-cli)

* Adding python3 3.13 to the test matrix

* Using random-100 for faster testing

* Updated poetry lock

* Using random-100 for faster testing

* Added Redis Vector Sets checks on CI

* Fixed docker permissions on results folder.

---------

Co-authored-by: fcostaoliveira &lt;filipe@redis.com&gt;
diff --git a/Dockerfile b/Dockerfile
@@ -63,9 +63,6 @@ RUN apt-get update && apt-get install -y \
     wget \
     && rm -rf /var/lib/apt/lists/*
 
-# Create non-root user
-RUN groupadd -g 1001 -r appgroup && \
-    useradd -u 1001 -r -g appgroup appuser
 
 # Set working directory
 WORKDIR /app
@@ -79,11 +76,21 @@ COPY --from=builder /code /app
 
 # Create directories with proper permissions
 RUN mkdir -p /app/results /app/datasets && \
-    chown -R appuser:appgroup /app && \
+
+    chmod -R 777 /app/results /app/datasets && \
     chmod -R 755 /app
 
-# Switch to non-root user
-USER appuser
+# Create entrypoint script to handle user permissions
+RUN echo '#!/bin/bash\n\
+# Handle user permissions for volume mounts\n\
+if [ "$1" = "run.py" ]; then\n\
+    # Ensure results directory is writable\n\
+    mkdir -p /app/results\n\
+    chmod 777 /app/results\n\
+fi\n\
+exec python "$@"' > /app/entrypoint.sh && \
+    chmod +x /app/entrypoint.sh
+
 
 # Health check
 HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
@@ -93,7 +100,9 @@ HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
 EXPOSE 6379 6380
 
 # Set entrypoint
-ENTRYPOINT ["python"]
+
+ENTRYPOINT ["/app/entrypoint.sh"]
+
 
 # Default command (show help)
 CMD ["run.py", "--help"]
diff --git a/README.md b/README.md
@@ -84,14 +84,16 @@ docker pull filipe958/vector-db-benchmark:latest
 # Run with help
 docker run --rm filipe958/vector-db-benchmark:latest run.py --help
 
-# Basic Redis benchmark with local Redis
-docker run --rm --network=host filipe958/vector-db-benchmark:latest \
-  run.py --host localhost --engines redis --dataset random-100 --experiment redis-default-simple
 
-# With results output (mount current directory)
+# Basic Redis benchmark with local Redis (recommended)
 docker run --rm -v $(pwd)/results:/app/results --network=host \
   filipe958/vector-db-benchmark:latest \
-  run.py --host localhost --engines redis --dataset random-100 --experiment redis-default-simple
+  run.py --host localhost --engines redis-default-simple --dataset random-100
+
+# Without results output
+docker run --rm --network=host filipe958/vector-db-benchmark:latest \
+  run.py --host localhost --engines redis-default-simple --dataset random-100
+
 ```
 
 ### Using with Redis
@@ -103,11 +105,14 @@ For testing with Redis, start a Redis container first:
 docker run -d --name redis-test -p 6379:6379 redis:8.2-rc1-bookworm
 
 # Run benchmark against Redis
-docker run --rm --network=host filipe958/vector-db-benchmark:latest \
-  run.py --host localhost --engines redis --dataset random-100 --experiment redis-default-simple
+
+docker run --rm -v $(pwd)/results:/app/results --network=host \
+  filipe958/vector-db-benchmark:latest \
+  run.py --host localhost --engines redis-default-simple --dataset random-100
 
 # Or use the convenience script
-./docker-run.sh -H localhost -e redis -d random-100 -x redis-default-simple
+./docker-run.sh -H localhost -e redis-default-simple -d random-100
+
 
 # Clean up Redis container when done
 docker stop redis-test && docker rm redis-test
@@ -149,20 +154,18 @@ poetry install
 Run the benchmark:
 
 ```bash
-Usage: run.py [OPTIONS]
-
-  Example: python3 -m run --engines *-m-16-* --datasets glove-*
-
-Options:
-  --engines TEXT                  [default: *]
-  --datasets TEXT                 [default: *]
-  --host TEXT                     [default: localhost]
-  --skip-upload / --no-skip-upload
-                                  [default: no-skip-upload]
-  --install-completion            Install completion for the current shell.
-  --show-completion               Show completion for the current shell, to
-                                  copy it or customize the installation.
-  --help                          Show this message and exit.
+# Basic usage examples
+python run.py --engines redis-default-simple --dataset random-100
+python run.py --engines redis-default-simple --dataset glove-25-angular
+python run.py --engines "*-m-16-*" --dataset "glove-*"
+
+# Docker usage (recommended)
+docker run --rm -v $(pwd)/results:/app/results --network=host \
+  filipe958/vector-db-benchmark:latest \
+  run.py --host localhost --engines redis-default-simple --dataset random-100
+
+# Get help
+python run.py --help
 ```
 
 Command allows you to specify wildcards for engines and datasets.