-
Notifications
You must be signed in to change notification settings - Fork 664
fix: link_remap files get left over on failed dumps, unique file names to avoid collisions #2681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: criu-dev
Are you sure you want to change the base?
fix: link_remap files get left over on failed dumps, unique file names to avoid collisions #2681
Conversation
…que, name is now unique Signed-off-by: Brandon Smith <brandon.smith@cedana.ai>
Do you think you could also add a zdtm test? You mentioned that you have a test that is able to trigger the problem you described. |
|
||
mntns_root = mntns_get_root_fd(nsid); | ||
|
||
fd = open("/dev/urandom", O_RDONLY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we generate this uid once for the entire dump?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change introduces generating a random suffix to prevent link_remap collisions when a dump fails and another dump is attempted on the same process
should we generate this uid once for the entire dump?
We already do this for dump_criu_run_id
. Can we use this value instead of generating a new one?
} | ||
|
||
// create file link_remap.fd_id.random_suffix id | ||
snprintf(tmp + 1, sizeof(link_name) - (size_t)(tmp - link_name) - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we customize a link name if linkat_hard returns EEXIST?
unsigned char rand_bytes[8]; | ||
if (read(fd, rand_bytes, sizeof(rand_bytes)) == sizeof(rand_bytes)) { | ||
snprintf(random_suffix, sizeof(random_suffix), "%02x%02x%02x%02x", | ||
rand_bytes[0], rand_bytes[1], rand_bytes[2], rand_bytes[3]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you read 8 bytes, if you use just four of them?
(unsigned long)(ts.tv_sec ^ ts.tv_nsec)); | ||
} | ||
|
||
// create file link_remap.fd_id.random_suffix id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use tabs for indentation
} | ||
|
||
/* Use grand parent, if parent directory does not exist. */ | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
white spaces...
Would you be able to include this test, for example, as part of ZDTM or |
#2683
I have a PR with a zdtm test open. The specific use case was a multi
process training job where multiple processes share the same semaphore, but
you’re migrating the whole training job.
…On Tue, Jun 17, 2025 at 2:18 AM Radostin Stoyanov ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In criu/files-reg.c
<#2681 (comment)>
:
> mntns_root = mntns_get_root_fd(nsid);
+ fd = open("/dev/urandom", O_RDONLY);
I have another branch directly related to this PR and fix ...
@bsmithai <https://github.com/bsmithai> I believe that you are referring
to the following branch: cedana#4 <cedana#4>
It would be good to open an issue or draft PR describing the use-case with
an example (of a real) application that you are trying to migrate. This
will help us to understand why this functionality is important and discuss
what might be the best solution.
—
Reply to this email directly, view it on GitHub
<#2681 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BBXBXMGAQ4ZVUPNXZFHDR2D3D66MDAVCNFSM6AAAAAB7D6ESQ2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDSMZUGQZTCMZXGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
A friendly reminder that this PR had no activity for 30 days. |
Problem:
create_link_remap
logic specifies before creating the hard link that these file names can be truly unique. Currently these files are generated w/ ids matching sequentially with file descriptor ids. This change introduces generating a random suffix to prevent link_remap collisions when a dump fails and another dump is attempted on the same process that has open fds with links that were remapped.Tested:
Tested w/ a process that opens and uses semaphores, the test creates remap files in
/dev/shm
that would have been created by criu on dump. A dump is done on this process and collisions are avoided. A restore is then done to verify nothing was damaged in C/R.Very simple solution but I wanted to verify e2e.
Results:
Original bug:
This branch: