-
-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Description
Bug report
Bug description:
Consider the following two Python programs that make 14 copies of a big list
([0] * 262145) * 14
([0] * 265665) * 14
On both of my windows computers, running the latest release of CPython, version 3.13.5, it takes 5 times longer to run the first program (even though its size is smaller). But on older versions of CPython, such as 3.10.11, both programs run equally fast. From my testing, this seems to be an issue only affecting windows. More investigation points to this being an issue for certain AMD processors on both windows and linux.
I also noticed that PyPy has the same issue, starting with version PyPy v7.3.17. I've not been able to reproduce it in versions <= PyPy v7.3.16.
As for what is happening, I am not sure. I think it has to do with 262145=2^18 + 1 being one off from a power of two, which somehow makes copying the list super slow.
Edit: The culprit seems to be calls to memcpy(dst, src, count)
when dst-src
is slightly larger than a power of two. The algorithm used by the current versions of both CPython and PyPy to copy lists causes this to happen over and over whenever a list of size 2^k + 1 is being copied, while the previous algorithm did not. This explains why I've been unable to reproduce the bug in older versions of CPython and PyPy. See microsoft/STL#5658 for more details about the issue of memcpy
being slow.
CPython versions tested on:
3.14, 3.13
Operating systems tested on:
Windows, Linux