Skip to content

milahu/sum-parallel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sum-parallel

summarize numbers from a text file using multiple CPU cores

status

the current sum-parallel.c is slower than sum.c

seq 10000000 >input.txt
gcc -o sum sum.c && time ./sum <input.txt
gcc -o sum-parallel sum-parallel.c && time ./sum-parallel input.txt
50000005000000

real    0m4.611s
user    0m4.553s
sys     0m0.058s
50000005000000

real    0m6.754s
user    0m12.317s
sys     0m2.780s

cpu-bound or io-bound

What do the terms "CPU bound" and "I/O bound" mean?

adding two numbers takes a single CPU cycle, memory reads take about 100 CPU cycles in 2016 hardware.

$ pv -a input.txt >/dev/null
[1.52GiB/s]

$ pv -a input.txt | ./sum
[15.4MiB/s]
50000005000000

read multiple numbers with fscanf

currently, i read only one number

fscanf(fptr, "%d", &n)

but fscanf can read multiple values

fscanf(fptr, "%d\n%d\n%d\n%d", &n1, &n2, &n3, &n4)

but no, fscanf is already buffered

https://stackoverflow.com/a/9587245/10440128

When you use fread or the other file I/O functions in the C standard library, memory is buffered in several places.

The C library will usually create a buffer for every FILE* you have open. Data is read into this buffers in large chunks. This allows fread to satisfy many small requests without having to make a large number of system calls, which are expensive. This is what people mean when they say fread is buffered.

The kernel will also buffer files that are being read in the disk cache. This reduces the time needed for the read system call, since if data is already in memory, your program won't have to wait while the kernel fetches it from the disk. The kernel will hold on to recently read files, and it may read ahead for files which are being accessed sequentially.

run multiple sum in parallel

this runs serial

$ time { n=4; for i in $(seq $n); do seq $((1 + 10000000 / n * (i - 1))) $((10000000 / n * i)) | ./sum; done | ./sum; }
50000005000000

real    0m5.251s
user    0m5.850s
sys     0m0.442s

this runs parallel

$ time { n=2; t=$(mktemp); p=; for i in $(seq $n); do seq $((1 + 10000000 / n * (i - 1))) $((10000000 / n * i)) | ./sum >>$t & p+=" $!"; done; wait $p; ./sum <$t; }
50000005000000

real    0m3.190s
user    0m6.549s
sys     0m0.577s

$ time { n=4; t=$(mktemp); p=; for i in $(seq $n); do seq $((1 + 10000000 / n * (i - 1))) $((10000000 / n * i)) | ./sum >>$t & p+=" $!"; done; wait $p; ./sum <$t; }
50000005000000

real    0m3.338s
user    0m9.707s
sys     0m0.874s

success, this is 2x faster than serial

n=4 is not faster, because i have only 2 cpu cores

$ grep -m1 "^cpu cores" /proc/cpuinfo
cpu cores       : 2

see also

About

summarize numbers from a text file using multiple CPU cores

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages