healthcheck for non-systemd podman #27033

samifruit514 · 2025-09-09T20:40:14Z

On a non-systemd host, healthchecks are not processed. This is about the implementation for the non-systemd. Instead of using timers with systemd, it starts a pure goroutine that do the healthchecks.

Why this is needed: when running podman with exposing the unix socket like podman system service unix:///tmp/podman.sock, and then using docker compose, the healthchecks are not run, so if there are dependencies in the docker compose (like depends_on), it will hang forever

Does this PR introduce a user-facing change?

Added healthcheck for non-systemd hosts

openshift-ci · 2025-09-09T20:40:20Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: samifruit514
Once this PR has been reviewed and has the lgtm label, please assign ygalblum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

samifruit514 · 2025-09-09T22:34:54Z

/release-note

openshift-ci · 2025-09-09T22:34:56Z

@samifruit514: the /release-note and /release-note-action-required commands have been deprecated.
Please edit the release-note block in the PR body text to include the release note. If the release note requires additional action include the string action required in the release note. For example:

```release-note
Some release note with action required.
```

In response to this:

/release-note

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: Samuel Archambault <samuel.archambault@getmaintainx.com>

Luap99

That cannot work, podman is not a deamon the starting process will go away, i.e. podman run -d so spawning goroutines is simply not a viable solution.
Even the podman service is not a daemon, the expectation is that it can be stopped at any time without impacting the currently running containers.

Luap99 · 2025-09-10T09:14:36Z

libpod/healthcheck_nosystemd_linux.go

+func (c *Container) stopHealthCheckTimer() error {
+	timer, exists := activeTimers[c.ID()]


cleanup is generally called from a different process (podman container cleanup) spawned by common, as such the in memory map will be empty and the go routine in the service process is never stopped here.

Hey Luap99! Thanks a lot of the feedback, it is very appreciated!
So to fix this, I've added a way to stop the timer with an "health check stop file". Let me know what you think!

Also, I've added a "reattach" function that would recreate the timers for the running containers

Thank you

Signed-off-by: Samuel Archambault <samuel.archambault@getmaintainx.com>

Honny1

Hi, I took a quick look at your code and added my first thoughts.

One big catch is that this will only work if you're running Podman as a service, because the goroutine gets killed as soon as the Go program exits. I'm not sure that using a goroutine is a good solution.

Maybe the Healthcheck could be triggered by Conmon or a different service, such as Cron. However, using Cron would add a dependency to the package, and I'm not sure if this is a good idea.

Honny1 · 2025-09-11T09:07:58Z

libpod/healthcheck_nosystemd_linux.go

+}
+
+// Global map to track active timers (in a real implementation, this would be part of the runtime)
+var activeTimers = make(map[string]*healthcheckTimer)


Accessing this map is a critical section, so you must use a mutex.

Honny1 · 2025-09-11T09:10:49Z

libpod/healthcheck_nosystemd_linux.go

+	ctx, cancel := context.WithTimeout(context.Background(), healthConfig.Timeout)
+	defer cancel()
+
+	_, _, err = t.container.runHealthCheck(ctx, false)


This will bypass the startup health check. Instead, you should call Runtime.HealthCheck or replicate its structure.

Honny1 · 2025-09-11T09:14:03Z

libpod/healthcheck_nosystemd_linux.go

-// createTimer systemd timers for healthchecks of a container
+// healthcheckTimer manages the background goroutine for healthchecks
+type healthcheckTimer struct {
+	container *Container


This data will become outdated. I'd prefer to store only the ID and get the container on demand. This also prevents a memory leak if the activeTimers map grows without bounds.

giuseppe · 2025-09-11T09:52:05Z

Maybe the Healthcheck could be triggered by Conmon or a different service, such as Cron. However, using Cron would add a dependency to the package, and I'm not sure if this is a good idea.

I agree with that, this could work if we add the logic to conmon (shouldn't be too hard)

giuseppe · 2025-09-12T16:16:43Z

started working on: containers/conmon#598

@samifruit514 could you check if that works with your use case?

samifruit514 · 2025-09-13T21:29:58Z

holy crap, just submitted a PR in conmon containers/conmon#599 without realizing that you did something on your side, and just wanted to make sure that no more comments were posted, and then I just saw your PR :(

let me look at yours...

openshift-ci bot added the do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None label Sep 9, 2025

openshift-ci bot added release-note and removed do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Sep 9, 2025

healthcheck for non-systemd

779dc3a

Signed-off-by: Samuel Archambault <samuel.archambault@getmaintainx.com>

samifruit514 force-pushed the main branch from 9f93188 to 779dc3a Compare September 9, 2025 22:39

Luap99 requested changes Sep 10, 2025

View reviewed changes

Reattach timers and stop healthchecks with stop file

5fe6281

Signed-off-by: Samuel Archambault <samuel.archambault@getmaintainx.com>

samifruit514 force-pushed the main branch from 9942ea4 to 5fe6281 Compare September 10, 2025 23:44

Honny1 reviewed Sep 11, 2025

View reviewed changes

Honny1 mentioned this pull request Sep 12, 2025

Add a design document for Conmon v3 #27053

Merged

giuseppe mentioned this pull request Sep 12, 2025

[RFC] conmon: add --timer-command to run command every N seconds containers/conmon#598

Closed

samifruit514 mentioned this pull request Sep 13, 2025

healthcheck feature containers/conmon#599

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

healthcheck for non-systemd podman #27033

healthcheck for non-systemd podman #27033

samifruit514 commented Sep 9, 2025 •

edited

Loading

Uh oh!

openshift-ci bot commented Sep 9, 2025

Uh oh!

samifruit514 commented Sep 9, 2025

Uh oh!

openshift-ci bot commented Sep 9, 2025

Uh oh!

Luap99 left a comment

Uh oh!

Luap99 Sep 10, 2025

Uh oh!

samifruit514 Sep 10, 2025

Uh oh!

samifruit514 Sep 10, 2025

Uh oh!

Honny1 left a comment

Uh oh!

Honny1 Sep 11, 2025

Uh oh!

Honny1 Sep 11, 2025

Uh oh!

Honny1 Sep 11, 2025

Uh oh!

giuseppe commented Sep 11, 2025

Uh oh!

giuseppe commented Sep 12, 2025

Uh oh!

samifruit514 commented Sep 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

		func (c *Container) stopHealthCheckTimer() error {
		timer, exists := activeTimers[c.ID()]

healthcheck for non-systemd podman #27033

Are you sure you want to change the base?

healthcheck for non-systemd podman #27033

Conversation

samifruit514 commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Does this PR introduce a user-facing change?

Uh oh!

openshift-ci bot commented Sep 9, 2025

Uh oh!

samifruit514 commented Sep 9, 2025

Uh oh!

openshift-ci bot commented Sep 9, 2025

Uh oh!

Luap99 left a comment

Choose a reason for hiding this comment

Uh oh!

Luap99 Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

samifruit514 Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

samifruit514 Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Honny1 left a comment

Choose a reason for hiding this comment

Uh oh!

Honny1 Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Honny1 Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Honny1 Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

giuseppe commented Sep 11, 2025

Uh oh!

giuseppe commented Sep 12, 2025

Uh oh!

samifruit514 commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

samifruit514 commented Sep 9, 2025 •

edited

Loading

samifruit514 commented Sep 13, 2025 •

edited

Loading