Skip to content

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Feb 27, 2025

The history of this is a little complicated, but in short there is an
argument to be made that several misunderstandings resulted in the spec
sometimes implying (and runtimes interpreting) a pids.limit value of 0
to be equivalent to "max" or otherwise having unfortunate handling of
the value.

The slightly longer background is the following:

  1. When commit 834fb5d ("spec: linux: add support for the PIDs
    cgroup") added support, we did not yet have textual documentation of
    cgroup configuration values. In addition, we had not yet started
    using pointers to indicate optional fields and detect unset fields.
    However, the initial commit did imply that pids.limit=0 should be
    treated as a real value.

  2. Commit 2ce2c86 ("runtime: config: linux: add cgroups
    information") labeled "pids.limit" as being a REQUIRED field. This
    may seem trivial, but consider this foreshadowing for point 5.

  3. Later, commit 9b19cd2 ("config: linux: update description of
    PidsLimit") was added to explicitly make pids.limit=0 equivalent to
    max (at the time there was a kernel patch proposed to make setting
    pids.max to 0 illegal, though it was never merged).

    This is often pointed to as being the reason for runtimes
    interpreting this behaviour this way, however...

  4. Soon after, 488f174 ("Make optional Cgroup related config params
    pointers along with omitempty json tag.") converted it to a pointer
    and changed the code comment to state that the "default value" means
    "no limit" -- and the default value was now a pointer so the default
    value is nil not 0. At this stage, using 0 to mean "no limit" would
    arguably no longer be correct.

  5. However, because the field was marked as REQUIRED in point 2, a while
    later commit ef9ce84 ("specs-go/config: fix required items
    type") changed the value back to a non-pointer but didn't modify the
    code comment -- and so ended up codifying the "0 means no limit"
    behaviour.

    I would argue this commit is the reason why runtimes have interpreted
    the behaviour this way (though runc likely did it because of point 3
    since I authored both patches, and other runtimes probably looked at
    runc to see how they should interpret this confusing history -- my
    bad!).

So, let's finally have some clarity and add wording to conclusively
state that the correct representation of max is -1 (like every other
cgroup configuration value) and that users should not treat 0 as a
special value of any kind. A nil value means "do not touch it" (just
like every other cgroup configuration value too).

Note that a pids.max value of 0 is actually different to 1 now that
CLONE_INTO_CGROUP exists (at the time pids was added to the kernel and
the spec, this feature didn't exist and so it may have seemed redundant
to have two equivalent values -- hence my attempt to make 0 an illegal
value for the kernel implementation).

For the Go API, this is effectively a partial revert of commit
ef9ce84 ("specs-go/config: fix required items type") which turned
the limit value into a bare int64.

Fixes: 2ce2c86 ("runtime: config: linux: add cgroups information")
Fixes: 9b19cd2 ("config: linux: update description of PidsLimit")
Fixes: ef9ce84 ("specs-go/config: fix required items type")
Signed-off-by: Aleksa Sarai cyphar@cyphar.com

config-linux.md Outdated
* **`limit`** *(int64, REQUIRED)* - specifies the maximum number of tasks in the cgroup
* **`limit`** *(int64, REQUIRED)* - specifies the maximum number of tasks in the cgroup, with `-1` indicating no limit (`max`).

> Note: Even though it superficially seems redundant, `0` is a valid limit value for the `pids` cgroup controller and SHOULD NOT be treated as a special value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Note: Even though it superficially seems redundant, `0` is a valid limit value for the `pids` cgroup controller and SHOULD NOT be treated as a special value.
> Note: Even though it superficially seems redundant, `0` is a valid limit value for the `pids` cgroup controller from kernel's perspective, and SHOULD NOT be treated as "no limit". An implementation may reject `0`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @cyphar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still too specific to the bug we have in runc -- the core issue is that 0 SHOULD be treated as a valid value. It's not a MUST so I think the "may" here (which probably should be MAY, I guess?) is redundant and gives the wrong impression.

Maybe "special value" is too generic though...

Copy link
Member Author

@cyphar cyphar Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with

Note: Even though it may superficially seem redundant, 0 is a valid limit value for the pids cgroup controller from the kernel's perspective and SHOULD be treated as such by runtimes.

@AkihiroSuda AkihiroSuda added this to the v1.3.0 milestone Mar 9, 2025
@AkihiroSuda AkihiroSuda mentioned this pull request Sep 2, 2025
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cyphar cyphar marked this pull request as draft September 10, 2025 23:40
The history of this is a little complicated, but in short there is an
argument to be made that several misunderstandings resulted in the spec
sometimes implying (and runtimes interpreting) a pids.limit value of 0
to be equivalent to "max" or otherwise having unfortunate handling of
the value.

The slightly longer background is the following:

1. When commit 834fb5d ("spec: linux: add support for the PIDs
   cgroup") added support, we did not yet have textual documentation of
   cgroup configuration values. In addition, we had not yet started
   using pointers to indicate optional fields and detect unset fields.
   However, the initial commit did imply that pids.limit=0 should be
   treated as a real value.

2. Commit 2ce2c86 ("runtime: config: linux: add cgroups
   information") labeled "pids.limit" as being a REQUIRED field. This
   may seem trivial, but consider this foreshadowing for point 5.

3. Later, commit 9b19cd2 ("config: linux: update description of
   PidsLimit") was added to explicitly make pids.limit=0 equivalent to
   max (at the time there was a kernel patch proposed to make setting
   pids.max to 0 illegal, though it was never merged).

   This is often pointed to as being the reason for runtimes
   interpreting this behaviour this way, however...

4. Soon after, 488f174 ("Make optional Cgroup related config params
   pointers along with `omitempty` json tag.") converted it to a pointer
   and changed the code comment to state that the "default value" means
   "no limit" -- and the default value was now a pointer so the default
   value is nil not 0. At this stage, using 0 to mean "no limit" would
   arguably no longer be correct.

5. However, because the field was marked as REQUIRED in point 2, a while
   later commit ef9ce84 ("specs-go/config: fix required items
   type") changed the value back to a non-pointer but didn't modify the
   code comment -- and so ended up codifying the "0 means no limit"
   behaviour.

   I would argue this commit is the reason why runtimes have interpreted
   the behaviour this way (though runc likely did it because of point 3
   since I authored both patches, and other runtimes probably looked at
   runc to see how they should interpret this confusing history -- my
   bad!).

So, let's finally have some clarity and add wording to conclusively
state that the correct representation of max is -1 (like every other
cgroup configuration value) and that users should not treat 0 as a
special value of any kind. A nil value means "do not touch it" (just
like every other cgroup configuration value too).

Note that a pids.max value of 0 is actually different to 1 now that
CLONE_INTO_CGROUP exists (at the time pids was added to the kernel and
the spec, this feature didn't exist and so it may have seemed redundant
to have two equivalent values -- hence my attempt to make 0 an illegal
value for the kernel implementation).

For the Go API, this is effectively a partial revert of commit
ef9ce84 ("specs-go/config: fix required items type") which turned
the limit value into a bare int64.

Fixes: 2ce2c86 ("runtime: config: linux: add cgroups information")
Fixes: 9b19cd2 ("config: linux: update description of PidsLimit")
Fixes: ef9ce84 ("specs-go/config: fix required items type")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
@cyphar cyphar marked this pull request as ready for review September 18, 2025 01:56
@cyphar
Copy link
Member Author

cyphar commented Sep 18, 2025

Marked as ready again (I was updating the commit to provide more historical information).

@cyphar
Copy link
Member Author

cyphar commented Sep 18, 2025

@AkihiroSuda PTAL

@AkihiroSuda AkihiroSuda merged commit 62106c1 into opencontainers:main Sep 18, 2025
4 checks passed
@cyphar cyphar deleted the pids-limit-0-value branch September 18, 2025 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants