-
Notifications
You must be signed in to change notification settings - Fork 582
Introduce CPUAffinity process property instead of execCPUAffinity #1296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This change introduces more generic `CPUAffinity` property of `Process` to specify desired CPU affinities while performing operations on create, start and exec operations. Signed-off-by: Alexander Kanevskiy <alexander.kanevskiy@intel.com>
ranges. For example, `0-3,7` represents CPUs 0,1,2,3, and 7. If omitted or empty for particular [lifecycle](runtime.md#lifecycle) stage, | ||
the runtime SHOULD NOT change process' CPU affinity, and the affinity is determined by the Linux kernel. | ||
The following properties are available: | ||
* **`runtimeCreate`** (string, OPTIONAL) is a list of CPUs the runtime parent process should run on during the runtime creation stage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this just be configured by the process that runs the OCI runtime so that more of the startup runs on the correct CPUs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better or worse, this might be non-trivial to do if someone is using a tool that wraps an OCI runtime and doesn't provide a knob for it (basically the same argument as runtime hooks).
I thin matching the lifecycle stages (which also match the hook names) as done here is probably the simplest approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theoretically it is possible, e.g. via systemd to restrict container runtime to run on subset of CPUs, but this gives granularity on splitting CPU affinity for containerd/cri-o itself vs. per-container hook execution. In other words, do not affect higher level runtime' CPU quota with operations that might be local to subset of containers.
regarding names: currently it is in form "whereWhat", but OCI hooks are named "whatWhere". if we feel that OCI hook naming is better, I can rename proposed fields in this PR to have more consistency with the hooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the same names as the hooks would be preferable if only for consistency.
* **`containerCreate`** (string, OPTIONAL) is a list of CPUs the process should run on during the container creation stage, | ||
after entering the container's namespaces but before the user process or any of [`createContainer` hooks](#createContainer-hooks) are executed. | ||
* **`containerStart`** (string, OPTIONAL) is a list of CPUs the process should be shoud run on during the container start stage, | ||
inside the container's namespace during `start` operation. The affinity should be applied before executing [`startContainer` hooks](#startContainer-hooks). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to differentiate between these two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could argue that the runtime should have a particular set of affinities which then should be adjusted to be more or less restrictive as late as possible to avoid latency on those cores.
I suspect most people wouldn't set both at the same time but the distinction does make some sense. Now, whether this should be done before or after applying seccomp rules is a different question...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to differentiate between these two?
the difference is that "runc create" and "runc start" might be executed in different PIDs, thus those PIDs by default would inherit CPU affinity from parent process(es). Having above in the spec allows to make sure that OCI hooks are executed (and quota accounted) on the correct "infrastructure" CPUs.
@kolyshkin PTAL |
Given issues like golang/sys#259, I think it might be nice to have a way to indicate "all CPUs" as a way to reset affinity (but without doing |
theoretically, the higher-level runtime that generates OCI spec might be feeling it with right number based on detected system information. e.g. use result of |
fyi @bitoku |
@@ -94,8 +94,12 @@ type Process struct { | |||
SelinuxLabel string `json:"selinuxLabel,omitempty" platform:"linux"` | |||
// IOPriority contains the I/O priority settings for the cgroup. | |||
IOPriority *LinuxIOPriority `json:"ioPriority,omitempty" platform:"linux"` | |||
// ExecCPUAffinity is Deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want various linters to warn about using deprecated functionality (and other linters, that insist that description starts with the Id
, to keep working fine), the deprecation notice should be added like this:
// ExecCPUAffinity specifies CPU affinity for exec processes.
//
// Deprecated: use [CPUAffinity] instead.
(I.e. after the description, on a separate paragraph, etc.)
My experience is that this rarely happens -- usually higher-level runtimes either hide new knobs like this (requiring you to specify patch Given that we have had seen practical issues with container runtimes being spawned with suboptimal CPU affinity values, I would suggest that having an "all" or "max" special value would be a good idea. Also you do not need to detect anything with |
will be affinitized to specified list of CPUs. | ||
* **`containerCreate`** (string, OPTIONAL) is a list of CPUs the process should run on during the container creation stage, | ||
after entering the container's namespaces but before the user process or any of [`createContainer` hooks](#createContainer-hooks) are executed. | ||
* **`containerStart`** (string, OPTIONAL) is a list of CPUs the process should be shoud run on during the container start stage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* **`containerStart`** (string, OPTIONAL) is a list of CPUs the process should be shoud run on during the container start stage, | |
* **`containerStart`** (string, OPTIONAL) is a list of CPUs the process should be run on during the container start stage, |
"final": "0-3,7" | ||
"CPUAffinity": { | ||
"runtimeCreate": "7", | ||
"runtimeExec": "7", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, should we add alsoi containerStart
in this example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, when it comes to formatting -- in the OCI specs we do not hard-wrap lines at N columns. Each complete sentence should be one line (this makes diffs easier to read).
* **`CPUAffinity`** (object, OPTIONAL) specifies CPU affinity used to execute processes in the runtime and the container. | ||
All the properties in this setting expects as argument a string that contain a comma-separated list, with dashes to represent | ||
ranges. For example, `0-3,7` represents CPUs 0,1,2,3, and 7. If omitted or empty for particular [lifecycle](runtime.md#lifecycle) stage, | ||
the runtime SHOULD NOT change process' CPU affinity, and the affinity is determined by the Linux kernel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be very useful to have an all
special value which indicates that the CPU affinity should be reset to include as many CPUs as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "max" makes that clearer symantically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with either, but I guess you could say that "max" implies that it might not be all.
This change introduces more generic
CPUAffinity
property ofProcess
to specify desired CPU affinities while performing operations on create, start and exec operations.As it was originally discussed in PR#1253, the existing implementation covers only
exec
usecase, where setting affinity for OCI hooks and initial container process will benefit wider set of workloads.