Davide Marchi - August 2025
Distributed out-of-core MergeSort in C++20 using mmap
. The project implements a common I/O pipeline (index build → index sort → rewrite) across:
- Sequential (baseline)
- OpenMP (task-based mergesort)
- FastFlow (farm: emitter + workers)
- MPI + OpenMP (one rank per node; pairwise distributed merges)
The repo includes runnable scripts for experiments, a Jupyter notebook for plotting, and a PDF report. Results and logs are saved to folders so runs are reproducible.
.
├── analysis/
│ ├── seq_omp_ff_composites.ipynb # Notebook for plots/analysis
│ └── plots_seq_omp_ff/ # Plots saved by the notebook
├── report/
│ ├── report.pdf # Compiled report
│ └── imgs/ # Images used in the report
├── scripts/ # Helper scripts for sweeps
│ ├── run_array_any.sh # Single-node: seq / omp / ff
│ └── run_array_mpi.sh # Multi-node: MPI+OMP
├── results/ # CSV output (created at run time)
├── logs/ # Logs from runs (created at run time)
├── fastflow/
│ └── ff/ # FastFlow
│ └── mapping_string.sh
├── bin/ # Executables (created by make)
├── Makefile
├── utils.hpp
├── sequential_seq_mmap.cpp
├── omp_mmap.cpp
├── ff_mmap.cpp
├── mpi_omp_mmap.cpp
└── README.md
- GCC or Clang with C++20 and OpenMP
- MPI toolchain (Open MPI or MPICH) for the MPI target
- Linux with
mmap
- FastFlow headers (cloned locally as shown below)
Clone the official FastFlow repository in this project directory so it sits alongside the sources:
git clone https://github.com/fastflow/fastflow.git fastflow
Run the CPU mapping helper from inside the ff/ subfolder. This script directly updates FastFlow’s internal mapping configuration.
cd fastflow/ff
bash mapping_string.sh
make # builds all targets into ./bin
Targets produced:
bin/sequential_seq_mmap
bin/omp_mmap
bin/ff_mmap
bin/mpi_omp_mmap
make clean # remove objects
make distclean # also remove ./bin and build artifacts
All binaries print help with -h
. Example:
bin/omp_mmap -h
Usage: bin/omp_mmap [options]
-n, --records N number of records (default 1e6)
-p, --payload B maximum payload size in bytes (default 256)
-t, --threads T threads to use (0 = hw concurrency)
-c, --cutoff N task cutoff size (default 10000)
-h, --help show this help
(The same flags apply to the other executables. For the MPI+OMP binary, -t
selects threads per rank.)
Create output folders once:
mkdir -p results logs
# Sequential baseline
srun -n 1 bin/sequential_seq_mmap -n 100000000 -p 32
# OpenMP on 16 threads
srun -n 1 --cpus-per-task=16 bin/omp_mmap -n 100000000 -p 32 -t 16 -c 10000
# FastFlow on 16 threads (1 emitter + 15 workers)
# Run ff/mapping_string.sh beforehand to get a good CPU mapping.
srun -n 1 --cpus-per-task=16 bin/ff_mmap -n 100000000 -p 32 -t 16 -c 10000
One rank per node (example: 4 nodes, 16 threads per rank):
srun --mpi=pmix -N 4 -n 4 --cpus-per-task=16 bin/mpi_omp_mmap -n 100000000 -p 32 -t 16 -c 10000
(Alternatively: -N 4 --ntasks-per-node=1
.)
Run the sweep scripts with just the binary path; outputs go to results/
and logs to logs/
by default.
# Single-node sweeps (seq / omp / ff)
bash scripts/run_array_any.sh --bin bin/omp_mmap
# Multi-node sweeps (MPI + OMP)
bash scripts/run_array_mpi.sh --bin bin/mpi_omp_mmap
Each script defines default grids for records, payloads, threads, and trials. Override them by passing the appropriate flags/environment variables (see the script header).
The plotting notebook is at analysis/seq_omp_ff_composites.ipynb
. At the top, a boolean variable toggles whether to include write-time in the figures. Running the notebook saves composite figures under analysis/plots_seq_omp_ff/
.