-
Notifications
You must be signed in to change notification settings - Fork 2.8k
healthcheck for non-systemd podman #27033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: samifruit514 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/release-note |
@samifruit514: the
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: Samuel Archambault <samuel.archambault@getmaintainx.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That cannot work, podman is not a deamon the starting process will go away, i.e. podman run -d
so spawning goroutines is simply not a viable solution.
Even the podman service is not a daemon, the expectation is that it can be stopped at any time without impacting the currently running containers.
func (c *Container) stopHealthCheckTimer() error { | ||
timer, exists := activeTimers[c.ID()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanup is generally called from a different process (podman container cleanup) spawned by common, as such the in memory map will be empty and the go routine in the service process is never stopped here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Luap99! Thanks a lot of the feedback, it is very appreciated!
So to fix this, I've added a way to stop the timer with an "health check stop file". Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I've added a "reattach" function that would recreate the timers for the running containers
Thank you
Signed-off-by: Samuel Archambault <samuel.archambault@getmaintainx.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I took a quick look at your code and added my first thoughts.
One big catch is that this will only work if you're running Podman as a service, because the goroutine gets killed as soon as the Go program exits. I'm not sure that using a goroutine is a good solution.
Maybe the Healthcheck could be triggered by Conmon or a different service, such as Cron. However, using Cron would add a dependency to the package, and I'm not sure if this is a good idea.
} | ||
|
||
// Global map to track active timers (in a real implementation, this would be part of the runtime) | ||
var activeTimers = make(map[string]*healthcheckTimer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accessing this map is a critical section, so you must use a mutex.
ctx, cancel := context.WithTimeout(context.Background(), healthConfig.Timeout) | ||
defer cancel() | ||
|
||
_, _, err = t.container.runHealthCheck(ctx, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will bypass the startup health check. Instead, you should call Runtime.HealthCheck or replicate its structure.
// createTimer systemd timers for healthchecks of a container | ||
// healthcheckTimer manages the background goroutine for healthchecks | ||
type healthcheckTimer struct { | ||
container *Container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data will become outdated. I'd prefer to store only the ID and get the container on demand. This also prevents a memory leak if the activeTimers map grows without bounds.
I agree with that, this could work if we add the logic to conmon (shouldn't be too hard) |
started working on: containers/conmon#598 @samifruit514 could you check if that works with your use case? |
holy crap, just submitted a PR in conmon containers/conmon#599 without realizing that you did something on your side, and just wanted to make sure that no more comments were posted, and then I just saw your PR :( let me look at yours... |
On a non-systemd host, healthchecks are not processed. This is about the implementation for the non-systemd. Instead of using timers with systemd, it starts a pure goroutine that do the healthchecks.
Why this is needed: when running podman with exposing the unix socket like
podman system service unix:///tmp/podman.sock
, and then using docker compose, the healthchecks are not run, so if there are dependencies in the docker compose (like depends_on), it will hang foreverDoes this PR introduce a user-facing change?