From 869b2d5b0c9fbb9db559ab53cf1fa61a170835e9 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai Date: Thu, 27 Feb 2025 12:35:47 +1100 Subject: [PATCH] linux: clarify pids cgroup settings The history of this is a little complicated, but in short there is an argument to be made that several misunderstandings resulted in the spec sometimes implying (and runtimes interpreting) a pids.limit value of 0 to be equivalent to "max" or otherwise having unfortunate handling of the value. The slightly longer background is the following: 1. When commit 834fb5db52bc ("spec: linux: add support for the PIDs cgroup") added support, we did not yet have textual documentation of cgroup configuration values. In addition, we had not yet started using pointers to indicate optional fields and detect unset fields. However, the initial commit did imply that pids.limit=0 should be treated as a real value. 2. Commit 2ce2c866ffe1 ("runtime: config: linux: add cgroups information") labeled "pids.limit" as being a REQUIRED field. This may seem trivial, but consider this foreshadowing for point 5. 3. Later, commit 9b19cd2fab9e ("config: linux: update description of PidsLimit") was added to explicitly make pids.limit=0 equivalent to max (at the time there was a kernel patch proposed to make setting pids.max to 0 illegal, though it was never merged). This is often pointed to as being the reason for runtimes interpreting this behaviour this way, however... 4. Soon after, 488f174af95f ("Make optional Cgroup related config params pointers along with `omitempty` json tag.") converted it to a pointer and changed the code comment to state that the "default value" means "no limit" -- and the default value was now a pointer so the default value is nil not 0. At this stage, using 0 to mean "no limit" would arguably no longer be correct. 5. However, because the field was marked as REQUIRED in point 2, a while later commit ef9ce84cf92a ("specs-go/config: fix required items type") changed the value back to a non-pointer but didn't modify the code comment -- and so ended up codifying the "0 means no limit" behaviour. I would argue this commit is the reason why runtimes have interpreted the behaviour this way (though runc likely did it because of point 3 since I authored both patches, and other runtimes probably looked at runc to see how they should interpret this confusing history -- my bad!). So, let's finally have some clarity and add wording to conclusively state that the correct representation of max is -1 (like every other cgroup configuration value) and that users should not treat 0 as a special value of any kind. A nil value means "do not touch it" (just like every other cgroup configuration value too). Note that a pids.max value of 0 is actually different to 1 now that CLONE_INTO_CGROUP exists (at the time pids was added to the kernel and the spec, this feature didn't exist and so it may have seemed redundant to have two equivalent values -- hence my attempt to make 0 an illegal value for the kernel implementation). For the Go API, this is effectively a partial revert of commit ef9ce84cf92a ("specs-go/config: fix required items type") which turned the limit value into a bare int64. Fixes: 2ce2c866ffe1 ("runtime: config: linux: add cgroups information") Fixes: 9b19cd2fab9e ("config: linux: update description of PidsLimit") Fixes: ef9ce84cf92a ("specs-go/config: fix required items type") Signed-off-by: Aleksa Sarai --- config-linux.md | 4 +++- specs-go/config.go | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/config-linux.md b/config-linux.md index 65a18d4d5..4ce6b7088 100644 --- a/config-linux.md +++ b/config-linux.md @@ -666,7 +666,9 @@ For more information, see the kernel cgroups documentation about [pids][cgroup-v The following parameters can be specified to set up the controller: -* **`limit`** *(int64, REQUIRED)* - specifies the maximum number of tasks in the cgroup +* **`limit`** *(int64, OPTIONAL)* - specifies the maximum number of tasks in the cgroup, with `-1` indicating no limit (`max`). + +> Note: Even though it may superficially seem redundant, `0` is a valid limit value for the `pids` cgroup controller from the kernel's perspective and SHOULD be treated as such by runtimes. #### Example diff --git a/specs-go/config.go b/specs-go/config.go index 36d28032e..cab0fd8db 100644 --- a/specs-go/config.go +++ b/specs-go/config.go @@ -434,7 +434,7 @@ type LinuxCPU struct { // LinuxPids for Linux cgroup 'pids' resource management (Linux 4.3) type LinuxPids struct { // Maximum number of PIDs. Default is "no limit". - Limit int64 `json:"limit"` + Limit *int64 `json:"limit,omitempty"` } // LinuxNetwork identification and priority configuration