-
Notifications
You must be signed in to change notification settings - Fork 582
linux: clarify pids cgroup settings #1279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
config-linux.md
Outdated
* **`limit`** *(int64, REQUIRED)* - specifies the maximum number of tasks in the cgroup | ||
* **`limit`** *(int64, REQUIRED)* - specifies the maximum number of tasks in the cgroup, with `-1` indicating no limit (`max`). | ||
|
||
> Note: Even though it superficially seems redundant, `0` is a valid limit value for the `pids` cgroup controller and SHOULD NOT be treated as a special value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> Note: Even though it superficially seems redundant, `0` is a valid limit value for the `pids` cgroup controller and SHOULD NOT be treated as a special value. | |
> Note: Even though it superficially seems redundant, `0` is a valid limit value for the `pids` cgroup controller from kernel's perspective, and SHOULD NOT be treated as "no limit". An implementation may reject `0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @cyphar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still too specific to the bug we have in runc -- the core issue is that 0
SHOULD be treated as a valid value. It's not a MUST so I think the "may" here (which probably should be MAY, I guess?) is redundant and gives the wrong impression.
Maybe "special value" is too generic though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with
Note: Even though it may superficially seem redundant,
0
is a valid limit value for thepids
cgroup controller from the kernel's perspective and SHOULD be treated as such by runtimes.
8d2c7a9
to
9cbc234
Compare
9cbc234
to
aa1234e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
The history of this is a little complicated, but in short there is an argument to be made that several misunderstandings resulted in the spec sometimes implying (and runtimes interpreting) a pids.limit value of 0 to be equivalent to "max" or otherwise having unfortunate handling of the value. The slightly longer background is the following: 1. When commit 834fb5d ("spec: linux: add support for the PIDs cgroup") added support, we did not yet have textual documentation of cgroup configuration values. In addition, we had not yet started using pointers to indicate optional fields and detect unset fields. However, the initial commit did imply that pids.limit=0 should be treated as a real value. 2. Commit 2ce2c86 ("runtime: config: linux: add cgroups information") labeled "pids.limit" as being a REQUIRED field. This may seem trivial, but consider this foreshadowing for point 5. 3. Later, commit 9b19cd2 ("config: linux: update description of PidsLimit") was added to explicitly make pids.limit=0 equivalent to max (at the time there was a kernel patch proposed to make setting pids.max to 0 illegal, though it was never merged). This is often pointed to as being the reason for runtimes interpreting this behaviour this way, however... 4. Soon after, 488f174 ("Make optional Cgroup related config params pointers along with `omitempty` json tag.") converted it to a pointer and changed the code comment to state that the "default value" means "no limit" -- and the default value was now a pointer so the default value is nil not 0. At this stage, using 0 to mean "no limit" would arguably no longer be correct. 5. However, because the field was marked as REQUIRED in point 2, a while later commit ef9ce84 ("specs-go/config: fix required items type") changed the value back to a non-pointer but didn't modify the code comment -- and so ended up codifying the "0 means no limit" behaviour. I would argue this commit is the reason why runtimes have interpreted the behaviour this way (though runc likely did it because of point 3 since I authored both patches, and other runtimes probably looked at runc to see how they should interpret this confusing history -- my bad!). So, let's finally have some clarity and add wording to conclusively state that the correct representation of max is -1 (like every other cgroup configuration value) and that users should not treat 0 as a special value of any kind. A nil value means "do not touch it" (just like every other cgroup configuration value too). Note that a pids.max value of 0 is actually different to 1 now that CLONE_INTO_CGROUP exists (at the time pids was added to the kernel and the spec, this feature didn't exist and so it may have seemed redundant to have two equivalent values -- hence my attempt to make 0 an illegal value for the kernel implementation). For the Go API, this is effectively a partial revert of commit ef9ce84 ("specs-go/config: fix required items type") which turned the limit value into a bare int64. Fixes: 2ce2c86 ("runtime: config: linux: add cgroups information") Fixes: 9b19cd2 ("config: linux: update description of PidsLimit") Fixes: ef9ce84 ("specs-go/config: fix required items type") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
aa1234e
to
869b2d5
Compare
Marked as ready again (I was updating the commit to provide more historical information). |
@AkihiroSuda PTAL |
The history of this is a little complicated, but in short there is an
argument to be made that several misunderstandings resulted in the spec
sometimes implying (and runtimes interpreting) a pids.limit value of 0
to be equivalent to "max" or otherwise having unfortunate handling of
the value.
The slightly longer background is the following:
When commit 834fb5d ("spec: linux: add support for the PIDs
cgroup") added support, we did not yet have textual documentation of
cgroup configuration values. In addition, we had not yet started
using pointers to indicate optional fields and detect unset fields.
However, the initial commit did imply that pids.limit=0 should be
treated as a real value.
Commit 2ce2c86 ("runtime: config: linux: add cgroups
information") labeled "pids.limit" as being a REQUIRED field. This
may seem trivial, but consider this foreshadowing for point 5.
Later, commit 9b19cd2 ("config: linux: update description of
PidsLimit") was added to explicitly make pids.limit=0 equivalent to
max (at the time there was a kernel patch proposed to make setting
pids.max to 0 illegal, though it was never merged).
This is often pointed to as being the reason for runtimes
interpreting this behaviour this way, however...
Soon after, 488f174 ("Make optional Cgroup related config params
pointers along with
omitempty
json tag.") converted it to a pointerand changed the code comment to state that the "default value" means
"no limit" -- and the default value was now a pointer so the default
value is nil not 0. At this stage, using 0 to mean "no limit" would
arguably no longer be correct.
However, because the field was marked as REQUIRED in point 2, a while
later commit ef9ce84 ("specs-go/config: fix required items
type") changed the value back to a non-pointer but didn't modify the
code comment -- and so ended up codifying the "0 means no limit"
behaviour.
I would argue this commit is the reason why runtimes have interpreted
the behaviour this way (though runc likely did it because of point 3
since I authored both patches, and other runtimes probably looked at
runc to see how they should interpret this confusing history -- my
bad!).
So, let's finally have some clarity and add wording to conclusively
state that the correct representation of max is -1 (like every other
cgroup configuration value) and that users should not treat 0 as a
special value of any kind. A nil value means "do not touch it" (just
like every other cgroup configuration value too).
Note that a pids.max value of 0 is actually different to 1 now that
CLONE_INTO_CGROUP exists (at the time pids was added to the kernel and
the spec, this feature didn't exist and so it may have seemed redundant
to have two equivalent values -- hence my attempt to make 0 an illegal
value for the kernel implementation).
For the Go API, this is effectively a partial revert of commit
ef9ce84 ("specs-go/config: fix required items type") which turned
the limit value into a bare int64.
Fixes: 2ce2c86 ("runtime: config: linux: add cgroups information")
Fixes: 9b19cd2 ("config: linux: update description of PidsLimit")
Fixes: ef9ce84 ("specs-go/config: fix required items type")
Signed-off-by: Aleksa Sarai cyphar@cyphar.com