Skip to content

Conversation

samifruit514
Copy link

@samifruit514 samifruit514 commented Sep 9, 2025

On a non-systemd host, healthchecks are not processed. This is about the implementation for the non-systemd. Instead of using timers with systemd, it starts a pure goroutine that do the healthchecks.

Why this is needed: when running podman with exposing the unix socket like podman system service unix:///tmp/podman.sock, and then using docker compose, the healthchecks are not run, so if there are dependencies in the docker compose (like depends_on), it will hang forever

Does this PR introduce a user-facing change?

Added healthcheck for non-systemd hosts

@openshift-ci openshift-ci bot added the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Sep 9, 2025
Copy link
Contributor

openshift-ci bot commented Sep 9, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: samifruit514
Once this PR has been reviewed and has the lgtm label, please assign ygalblum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@samifruit514
Copy link
Author

/release-note

Copy link
Contributor

openshift-ci bot commented Sep 9, 2025

@samifruit514: the /release-note and /release-note-action-required commands have been deprecated.
Please edit the release-note block in the PR body text to include the release note. If the release note requires additional action include the string action required in the release note. For example:

```release-note
Some release note with action required.
```

In response to this:

/release-note

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added release-note and removed do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Sep 9, 2025
Signed-off-by: Samuel Archambault <samuel.archambault@getmaintainx.com>
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That cannot work, podman is not a deamon the starting process will go away, i.e. podman run -d so spawning goroutines is simply not a viable solution.
Even the podman service is not a daemon, the expectation is that it can be stopped at any time without impacting the currently running containers.

Comment on lines 98 to 99
func (c *Container) stopHealthCheckTimer() error {
timer, exists := activeTimers[c.ID()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup is generally called from a different process (podman container cleanup) spawned by common, as such the in memory map will be empty and the go routine in the service process is never stopped here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Luap99! Thanks a lot of the feedback, it is very appreciated!
So to fix this, I've added a way to stop the timer with an "health check stop file". Let me know what you think!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I've added a "reattach" function that would recreate the timers for the running containers

Thank you

Signed-off-by: Samuel Archambault <samuel.archambault@getmaintainx.com>
Copy link
Member

@Honny1 Honny1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I took a quick look at your code and added my first thoughts.

One big catch is that this will only work if you're running Podman as a service, because the goroutine gets killed as soon as the Go program exits. I'm not sure that using a goroutine is a good solution.

Maybe the Healthcheck could be triggered by Conmon or a different service, such as Cron. However, using Cron would add a dependency to the package, and I'm not sure if this is a good idea.

}

// Global map to track active timers (in a real implementation, this would be part of the runtime)
var activeTimers = make(map[string]*healthcheckTimer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing this map is a critical section, so you must use a mutex.

ctx, cancel := context.WithTimeout(context.Background(), healthConfig.Timeout)
defer cancel()

_, _, err = t.container.runHealthCheck(ctx, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will bypass the startup health check. Instead, you should call Runtime.HealthCheck or replicate its structure.

// createTimer systemd timers for healthchecks of a container
// healthcheckTimer manages the background goroutine for healthchecks
type healthcheckTimer struct {
container *Container
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This data will become outdated. I'd prefer to store only the ID and get the container on demand. This also prevents a memory leak if the activeTimers map grows without bounds.

@giuseppe
Copy link
Member

Maybe the Healthcheck could be triggered by Conmon or a different service, such as Cron. However, using Cron would add a dependency to the package, and I'm not sure if this is a good idea.

I agree with that, this could work if we add the logic to conmon (shouldn't be too hard)

@giuseppe
Copy link
Member

started working on: containers/conmon#598

@samifruit514 could you check if that works with your use case?

@samifruit514
Copy link
Author

samifruit514 commented Sep 13, 2025

holy crap, just submitted a PR in conmon containers/conmon#599 without realizing that you did something on your side, and just wanted to make sure that no more comments were posted, and then I just saw your PR :(

let me look at yours...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants