Skip to content

Measure jailer startup performance #5282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ShadowCurse
Copy link
Contributor

@ShadowCurse ShadowCurse commented Jun 26, 2025

Changes

Update jailer requirements on the executable name and permissions. Now it does not need to contain firecracker in it's name, but must be marked as executable.
Add jailer startup performance test. The test is parametrized on the number of jailers started and the number of bind mounts present in the system.

Reason

We want to know what is the overhead of a jailer in isolation. Since usually there are multiple jailed VMs on the system, the test is parametrized by the number of jailers started.
The reason for adding measurements with bind mounts is because it slows down the jailer startup time. The more bind mounts are present, the more contention there is in the kernel. At the same time, bind mounts are frequently created per VM in the production environment.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@ShadowCurse ShadowCurse force-pushed the jailer_tweaks branch 5 times, most recently from 3b16c6e to 2e99a59 Compare June 26, 2025 16:31
@ShadowCurse ShadowCurse marked this pull request as ready for review June 26, 2025 16:35
@ShadowCurse ShadowCurse added Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Enhancement Indicates new feature requests labels Jun 26, 2025
@ShadowCurse ShadowCurse self-assigned this Jun 26, 2025
Now jailer will not complain if the executable does
not contain `firecracker` in it's name. This restriction
was unnecessary and it's removal is not a breaking change.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse force-pushed the jailer_tweaks branch 2 times, most recently from 0b5d3aa to 65d8ee3 Compare June 27, 2025 11:43
Copy link

codecov bot commented Jun 27, 2025

Codecov Report

Attention: Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 82.92%. Comparing base (504b94d) to head (9e5b112).

Files with missing lines Patch % Lines
src/jailer/src/main.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5282      +/-   ##
==========================================
+ Coverage   82.86%   82.92%   +0.05%     
==========================================
  Files         250      250              
  Lines       26902    26903       +1     
==========================================
+ Hits        22292    22308      +16     
+ Misses       4610     4595      -15     
Flag Coverage Δ
5.10-c5n.metal 83.36% <87.50%> (+<0.01%) ⬆️
5.10-m5n.metal ?
5.10-m6a.metal 82.57% <87.50%> (+<0.01%) ⬆️
5.10-m6g.metal 79.18% <87.50%> (+<0.01%) ⬆️
5.10-m6i.metal 83.34% <87.50%> (+<0.01%) ⬆️
5.10-m7a.metal-48xl 82.55% <87.50%> (?)
5.10-m7g.metal ?
5.10-m7i.metal-24xl 83.31% <87.50%> (?)
5.10-m7i.metal-48xl 83.31% <87.50%> (?)
5.10-m8g.metal-24xl 79.18% <87.50%> (?)
5.10-m8g.metal-48xl 79.17% <87.50%> (?)
6.1-c5n.metal 83.40% <87.50%> (+0.01%) ⬆️
6.1-m5n.metal 83.39% <87.50%> (+<0.01%) ⬆️
6.1-m6a.metal 82.62% <87.50%> (+<0.01%) ⬆️
6.1-m6g.metal 79.18% <87.50%> (+<0.01%) ⬆️
6.1-m6i.metal 83.39% <87.50%> (+<0.01%) ⬆️
6.1-m7a.metal-48xl 82.60% <87.50%> (?)
6.1-m7g.metal 79.18% <87.50%> (+<0.01%) ⬆️
6.1-m7i.metal-24xl 83.41% <87.50%> (?)
6.1-m8g.metal-24xl 79.17% <87.50%> (?)
6.1-m8g.metal-48xl 79.17% <87.50%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The executable passed to the jailer must have exec
permissions set.
Update the jailer executable check unit tests with `match!`.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add note about new jailer executable requirements.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Remove explicit panic on error. Let the error be returned from
the main.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
This binary just outputs the start and end time of
the jailer startup.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
let metadata = std::fs::metadata(&exec_file_path).unwrap();

// Check that it is executable
if metadata.permissions().mode() & 0o111 == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this requiring that's executable by all users, while in reality it just needs to be executable by the jailer? Is there any issues with just having the execve fail with permission error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks that there is at least some executable bit set.
Personally, I added this as I though it was already a requirement, just not explicitly checked and failing with a bit nicer message is better. I can drop it if people find it not necessary.

@pytest.mark.parametrize("jailers", [1, 100, 200, 300, 400])
@pytest.mark.parametrize("mounts", [0, 100, 300, 500])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long does it take to run? there's a bunch of combinations here, I doubt we'll be plotting all these in our graphs. Maybe it's best to identify a handful of combinations we're interested in (1/many jailers x no/many mounts,).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running the test gives me:

4.40s call     integration_tests/performance/test_jailer.py::test_jailer_startup[500-300]
4.20s call     integration_tests/performance/test_jailer.py::test_jailer_startup[500-200]
4.00s call     integration_tests/performance/test_jailer.py::test_jailer_startup[500-100]
3.84s call     integration_tests/performance/test_jailer.py::test_jailer_startup[500-1]
2.49s call     integration_tests/performance/test_jailer.py::test_jailer_startup[300-400]
2.33s call     integration_tests/performance/test_jailer.py::test_jailer_startup[300-300]
2.17s call     integration_tests/performance/test_jailer.py::test_jailer_startup[300-200]
2.01s call     integration_tests/performance/test_jailer.py::test_jailer_startup[300-100]
1.86s call     integration_tests/performance/test_jailer.py::test_jailer_startup[300-1]

So in total the test takes ~30 seconds.


# Create bind mount points. The exact location of them
# does not matter, they just need to exist.
mounts_paths = "/tmp/mounts"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't really matter as perf tests are run sequentially, but you could use a tempdir from mktemp (iirc we have a util for that already that also takes care of cleaning up).

for i, p in enumerate(processes):
r = p.communicate()[0]
e, s = r.split()
deltas.append(int(e) - int(s))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just have the binary spit out the delta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I can do this, but I don't see why do this. I mean, does it matter who subtracts 2 numbers?

)

processes = []
for i in range(jailers):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably do a few rounds for when we have a single jailer. Maybe do 100 rounds? or maybe do 500 sequential in one test and 500 at the same time in another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added 100 iterations for 1 jailer case

}
)
metrics.put_metric(
"p50",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should emit the datapoints directly as we do in all other tests and let the consumer compute the statistics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, fixed

@Manciukic
Copy link
Contributor

do you have a link where I could see the metrics?

The test measures p50 and p90 of jailer startup
time. It is parametrized by the number of jailers
starting up and the number of bind mount points present
in the system.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants