Skip to content

Commit 28b880a

Browse files
Fix docker compose up --wait failing when Trillian server isn't healthy
As noted in docker/compose#12424, compose --wait doesn't seem to honor healthchecks with restart:always, when the server crashes and restarts a few times and eventually becomes healthy. This was happening with Rekor: * MySQL was not yet healthy because the healthcheck wasn't working as expected. docker-library/mysql#930 (comment) suggested using 127.0.0.1 instead of localhost * trillian-log-server was not yet healthy even when MySQL reported as healthy, causing trillian-log-server to crash and restart a few times. There was no healthcheck for either Trillian service because the image we're using is based on Distroless, which has no curl/wget. * rekor-server tried to start up with an unhealthy trillian-log-server, and crashed. The healthcheck reported as unhealthy, and even though the server eventually became healthy because of the restart:always policy, the healthcheck reported the startup as unhealthy. This change adds healthchecks to trillian-log-server and log-signer by pulling the binaries out of the images and putting them into Debian 12 containers that include curl, so we can curl the /healthz endpoint. This also fixes the MySQL healthcheck as noted above. Now, docker compose up --wait properly waits for a healthy MySQL before starting trillian-log-server, and a healthy Trillian before starting Rekor. Also fix minor Dockerfile linting errors. Signed-off-by: Hayden B <8418760+haydentherapper@users.noreply.github.com>
1 parent 0e0e62b commit 28b880a

9 files changed

+75
-17
lines changed

Dockerfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ RUN CGO_ENABLED=0 go build -gcflags "all=-N -l" -ldflags "${SERVER_LDFLAGS}" -o
3131
RUN go test -c -ldflags "${SERVER_LDFLAGS}" -cover -covermode=count -coverpkg=./... -o rekor-server_test ./cmd/rekor-server
3232

3333
# Multi-Stage production build
34-
FROM golang:1.24.2@sha256:30baaea08c5d1e858329c50f29fe381e9b7d7bced11a0f5f1f69a1504cdfbf5e as deploy
34+
FROM golang:1.24.2@sha256:30baaea08c5d1e858329c50f29fe381e9b7d7bced11a0f5f1f69a1504cdfbf5e AS deploy
3535

3636
# Retrieve the binary from the previous stage
3737
COPY --from=builder /opt/app-root/src/rekor-server /usr/local/bin/rekor-server
@@ -40,12 +40,12 @@ COPY --from=builder /opt/app-root/src/rekor-server /usr/local/bin/rekor-server
4040
CMD ["rekor-server", "serve"]
4141

4242
# debug compile options & debugger
43-
FROM deploy as debug
43+
FROM deploy AS debug
4444
RUN go install github.com/go-delve/delve/cmd/dlv@v1.22.1
4545

4646
# overwrite server and include debugger
4747
COPY --from=builder /opt/app-root/src/rekor-server_debug /usr/local/bin/rekor-server
4848

49-
FROM deploy as test
49+
FROM deploy AS test
5050
# overwrite server with test build with code coverage
5151
COPY --from=builder /opt/app-root/src/rekor-server_test /usr/local/bin/rekor-server

Dockerfile.trillian-log-server

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Copyright 2025 The Sigstore Authors.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
FROM ghcr.io/sigstore/scaffolding/trillian_log_server:v1.7.2@sha256:ff64f73b4a8acae7546ecfb5b73c90933b614130a3b43c764a35535e4f60451b AS server
16+
17+
FROM golang:1.24.2@sha256:30baaea08c5d1e858329c50f29fe381e9b7d7bced11a0f5f1f69a1504cdfbf5e AS deploy
18+
19+
COPY --from=server /ko-app/trillian_log_server /usr/local/bin/trillian-log-server
20+
21+
ENTRYPOINT ["trillian-log-server"]

Dockerfile.trillian-log-signer

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Copyright 2025 The Sigstore Authors.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
FROM ghcr.io/sigstore/scaffolding/trillian_log_signer:v1.7.2@sha256:bfcc659dc08f87a0f4a4797edf88c93426a95f0d004032779a028bdce7b7e821 AS server
16+
17+
FROM golang:1.24.2@sha256:30baaea08c5d1e858329c50f29fe381e9b7d7bced11a0f5f1f69a1504cdfbf5e AS deploy
18+
19+
COPY --from=server /ko-app/trillian_log_signer /usr/local/bin/trillian-log-signer
20+
21+
ENTRYPOINT ["trillian-log-signer"]

docker-compose.yml

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@ services:
2424
- MYSQL_PASSWORD=zaphod
2525
restart: always # keep the MySQL server running
2626
healthcheck:
27-
# better healthcheck for mysql. See https://github.com/docker-library/mysql/issues/930.
28-
test: ["CMD", "mysqladmin", "-h", "localhost", "-u$MYSQL_USER", "-p$MYSQL_ROOT_PASSWORD", "-s", "ping"]
29-
interval: 10s
27+
# better healthcheck for MySQL. See https://github.com/docker-library/mysql/issues/930.
28+
test: ["CMD", "mysqladmin", "-h", "127.0.0.1", "-u$MYSQL_USER", "-p$MYSQL_ROOT_PASSWORD", "-s", "ping"]
29+
interval: 5s
3030
timeout: 3s
3131
retries: 15
3232
start_period: 90s
@@ -50,7 +50,9 @@ services:
5050
retries: 3
5151
start_period: 5s
5252
trillian-log-server:
53-
image: ghcr.io/sigstore/scaffolding/trillian_log_server@sha256:beffee16bb07b5cb051dc4e476d3a1063521ed5ae0b670efc7fe6f3507d94d2b # v1.6.0
53+
build:
54+
context: .
55+
dockerfile: Dockerfile.trillian-log-server
5456
command: [
5557
"--quota_system=noop",
5658
"--storage_system=mysql",
@@ -59,15 +61,23 @@ services:
5961
"--http_endpoint=0.0.0.0:8091",
6062
"--alsologtostderr",
6163
]
62-
restart: always # retry while mysql is starting up
64+
restart: always # keep the Trillian log server up
6365
ports:
6466
- "8090:8090"
6567
- "8091:8091"
6668
depends_on:
6769
mysql:
6870
condition: service_healthy
71+
healthcheck:
72+
test: ["CMD", "curl", "-f", "http://localhost:8091/healthz"]
73+
interval: 5s
74+
timeout: 3s
75+
retries: 15
76+
start_period: 15s
6977
trillian-log-signer:
70-
image: ghcr.io/sigstore/scaffolding/trillian_log_signer@sha256:79d57af375cfa997ed5452cc0c02c0396d909fcc91d11065586f119490aa9214 # v1.6.0
78+
build:
79+
context: .
80+
dockerfile: Dockerfile.trillian-log-signer
7181
command: [
7282
"--quota_system=noop",
7383
"--storage_system=mysql",
@@ -77,12 +87,18 @@ services:
7787
"--force_master",
7888
"--alsologtostderr",
7989
]
80-
restart: always # retry while mysql is starting up
90+
restart: always # keep the log signer up
8191
ports:
8292
- "8092:8091"
8393
depends_on:
8494
mysql:
8595
condition: service_healthy
96+
healthcheck:
97+
test: ["CMD", "curl", "-f", "http://localhost:8091/healthz"]
98+
interval: 5s
99+
timeout: 3s
100+
retries: 15
101+
start_period: 15s
86102
rekor-server:
87103
build:
88104
context: .
@@ -119,7 +135,7 @@ services:
119135
redis-server:
120136
condition: service_healthy
121137
trillian-log-server:
122-
condition: service_started
138+
condition: service_healthy
123139
healthcheck:
124140
test: ["CMD", "curl", "-f", "http://localhost:3000/ping"]
125141
interval: 10s

tests/client-algos-e2e-test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ function waitForRekorServer () {
3030
echo -n "* waiting up to 60 sec for system to start"
3131
count=0
3232

33-
until [ $(docker ps -a | grep -c "(healthy)") == 3 ];
33+
until [ $(docker ps -a | grep -c "(healthy)") == 5 ];
3434
do
3535
if [ $count -eq 6 ]; then
3636
echo "! timeout reached"

tests/index-test-utils.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,8 +73,8 @@ docker_up () {
7373
local count=0
7474
echo "waiting up to 2 min for system to start"
7575
until [ $(${docker_compose} ps | \
76-
grep -E "(rekor[-_]mysql|rekor[-_]redis|rekor[-_]rekor-server)" | \
77-
grep -c "(healthy)" ) == 3 ];
76+
grep -E "(rekor[-_]mysql|rekor[-_]redis|rekor[-_]rekor-server|rekor[-_]trillian)" | \
77+
grep -c "(healthy)" ) == 5 ];
7878
do
7979
if [ $count -eq 24 ]; then
8080
echo "! timeout reached"

tests/issue-872-e2e-test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ function waitForRekorServer () {
3333
echo -n "* waiting up to 60 sec for system to start"
3434
count=0
3535

36-
until [ $(docker ps -a | grep -c "(healthy)") == 3 ];
36+
until [ $(docker ps -a | grep -c "(healthy)") == 5 ];
3737
do
3838
if [ $count -eq 6 ]; then
3939
echo "! timeout reached"

tests/rekor-harness.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ function start_server () {
4646

4747
count=0
4848
echo -n "waiting up to 60 sec for system to start"
49-
until [ $(${docker_compose} ps | grep -c "(healthy)") == 3 ];
49+
until [ $(${docker_compose} ps | grep -c "(healthy)") == 5 ];
5050
do
5151
if [ $count -eq 6 ]; then
5252
echo "! timeout reached"

tests/sharding-e2e-test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ function waitForRekorServer () {
7979
count=0
8080

8181
echo -n "waiting up to 60 sec for system to start"
82-
until [ $(${docker_compose} ps | grep -c "(healthy)") == 3 ];
82+
until [ $(${docker_compose} ps | grep -c "(healthy)") == 5 ];
8383
do
8484
if [ $count -eq 6 ]; then
8585
echo "! timeout reached"

0 commit comments

Comments
 (0)