Skip to content

Commit b7f205b

Browse files
committed
Add generation script for large CSV files
1 parent 434ba32 commit b7f205b

File tree

2 files changed

+23
-0
lines changed

2 files changed

+23
-0
lines changed

source-code/polars/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,8 @@ Polars is an alternative to pandas that is designed to have better performance.
1010
directory with the same name.
1111
1. `polars_versus_pandas_benchmarks.ipynb`: Jupyter notebook that compares the
1212
performance of polars and pandas on a variety of operations.
13+
1. `create_csv_data.py`: Python script to generate one or more large CSV files
14+
for benchmarking.
15+
1. `create_csv_data.slurm`: Slurm script to run `create_csv_data.py` on a
16+
cluster.
1317
1. `data`: Directory containing the data used in the notebook.
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#!/usr/bin/env -S bash -l
2+
#SBATCH --account=lpt2_sysadmin
3+
#SBATCH --nodes=1
4+
#SBATCH --ntasks=1
5+
#SBATCH --cpus-per-task=1
6+
#SBATCH --mem=2G
7+
#SBATCH --time=01:00:00
8+
#SBATCH --mail-user=geertjan.bex@uhasselt.be
9+
#SBATCH --mail-type=FAIL,END
10+
11+
module purge
12+
module load Python/3.11.3-GCCcore-12.3.0
13+
14+
# This should generate a file of approximately 6 GB
15+
python ./create_csv_data.py \
16+
--files 1 \
17+
--cols 100 \
18+
--rows 2500000 \
19+
large_data

0 commit comments

Comments
 (0)