I compared the time it takes to run both scripts, with 10000000 iteration. 1m32s for the Python version and only 29s for the J's. If an algorithm is parallelizable, does J use all cores of a CPU?