Skip to content

Conversation

@AndersonQ
Copy link
Member

@AndersonQ AndersonQ commented Nov 11, 2025

What does this PR do?

Upgates the elastic-agent helm chart to allow the collection of rotated logs, including the GZIP-compressed logs.

Why is it important?

To allow the ingestion of rotated log files now filebeat can ingest GZIP-compressed files.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Disruptive User Impact

When enabling the ingestion of rotated logs on an existing deployment, it cause a one-time re-ingestion of the logs.

The input ID when using autodiscover is now scoped by container name, instead of container id. It's done to avoid data duplication as explained below.

Context: K8s logging architecture

Logs are written to:

/var/log/pods/<namespace>_<pod_name>_<pod_uid>/<container_name>/<restart_count>.log # Active log file
/var/log/pods/<namespace>_<pod_name>_<pod_uid>/<container_name>/<restart_count>.log.TIMRESTAMP` # 1st rotation, plain file
/var/log/pods/<namespace>_<pod_name>_<pod_uid>/<container_name>/<restart_count>.log.TIMRESTAMP.gz` # subsequenmtt rotations, gziped

Legacy symlinked path:

/var/log/containers/<pod_name>_<namespace>_<container_name>-<container_id>.log -> /var/log/pods/<namespace>_<pod_name>_<pod_uid>/<container_name>/<restart_count>.log

Thus during a container restart, a CrashLoopBackoff, it's possible to have 2
containers with the same name, but different IDs. The container which just crashed
and is being removed and the new container being created.

Just for the example, lets consider:

container name: foo
container id 1: id-1
container id 2: id-2

pod name: pod
namespace: ns
pod uid: uid

This leads to the following log files and symlinks during the transition:

/var/log/pods/ns_pod_uid/foo/0.log
/var/log/pods/ns_pod_uid/foo/1.log

/var/log/containers/pod_ns_foo-id-1.log -> /var/log/pods/ns_pod_uid/foo/0.log
/var/log/containers/pod_ns_foo-id-2.log -> /var/log/pods/ns_pod_uid/foo/1.log

k8s log collection

1 filestream input per container. Input ID pattern: kubernetes-container-logs-${data.kubernetes.pod.name}-${data.kubernetes.container.id}

Thus, for each container id, one input is created.

Rotated logs

Rotated logs are scoped by container name, and restart counter, not container ID. (/var/log/pods/<pod_name><pod_uid>/<container_name>/<restart_count>.log)
To collect rotated logs, the following wildcard would be used:

/var/log/pods/<namespace>_<pod_name>_<pod_uid>/<container_name>/*.log.*
id: kubernetes-container-logs-${kubernetes.pod.name}-${kubernetes.container.id}
paths:
  - /var/log/containers/*<container_id>.log
  - /var/log/pods/<namespace>_<pod_name>_<pod_uid>/<container_name>/*.log.*

for the example above, during the container restart, there would be 2 inputs created:

id: kubernetes-container-logs-pod-id-1
paths:
  - /var/log/containers/*id-1.log
  - /var/log/pods/ns_pod_uid/foo/*.log.*
id: kubernetes-container-logs-pod-id-2
paths:
  - /var/log/containers/*id-2.log
  - /var/log/pods/ns_pod_uid/foo/*.log.*

Both inputs harvest /var/log/pods/ns_pod_uid/foo/*.log.*, duplicating the data.

Therefore, the ID pattern must be scoped by container name.

How to test this PR locally

  • start a elastic stack
  • create a kind cluster
  • deploy the agent using the current helm chart
    • on your deployment, go to app/observabilityOnboarding/kubernetes/?category=kubernetes and follow the steps. It'll be something like:
helm repo add elastic https://helm.elastic.co/ && \
helm install elastic-agent elastic/elastic-agent \
  --version 9.2.1 \
  -n kube-system \
  --set outputs.default.url=https:\/\/YOUR_DEPLOYMENT.elastic-cloud.com:443 \
  --set kubernetes.onboardingID=YOUR_ONBORDING_ID \
  --set kubernetes.enabled=true \
  --set outputs.default.type=ESPlainAuthAPI \
  --set outputs.default.api_key=$(echo "YOUR_API_KEY" | base64 -d)
  • verify all is working, container logs are bein ingested
  • deploy some flog containers
kubectl apply -f ./flog.yaml
flog.yaml:
apiVersion: batch/v1
kind: Job
metadata:
  name: flog-log-generator
spec:
  template:
    spec:
      containers:
        - name: flog-unstructured-cont-rot
          image: mingrammer/flog
          #          too small "-d" won't give kubelet time to rotate the files
          args: ["-t", "stdout", "-d", "1us", "-l"]
        - name: crashloop-1
          image: busybox
          imagePullPolicy: IfNotPresent
          command: ["sh", "-c", "echo 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vel illum dolore eu feugiat nulla facilisis at' >&2; sleep 1; exit 1"]
      restartPolicy: OnFailure
  backoffLimit: 10
  • ensure the logs are being ingested. Use filter log.file.path : *flog*. Logs are ingested from /var/log/containers/
  • package the helm chart
mage helm:package
  • monitor the k8s pods
watch kubectl get pods --all-namespaces
  • upgrade the helm chart
helm upgrade elastic-agent elastic-agent-9.3.0-beta.tgz \
  -n kube-system \
  --set outputs.default.url=https:\/\/YOUR_DEPLOYMENT.elastic-cloud.com:443 \
  --set kubernetes.onboardingID=YOUR_ONBORDING_ID \
  --set kubernetes.enabled=true \
  --set outputs.default.type=ESPlainAuthAPI \
  --set outputs.default.api_key=$(echo "YOUR_API_KEY" | base64 -d)
  • check all the agent-pernode-elastic-agent-* pod is recreated, everything is
    normal, no configuration has changed, the logs are still coming from
    /var/log/containers/
    Screenshot from 2025-11-12 10-23-02
  • check the agent config:
    • exec into the container
kubectl exec -it agent-pernode-elastic-agent-* -n kube-system -- /bin/bash
  • on the container, install some tools, grab a diagnostics and check it:
microdnf install -y less unzip
elastic-agent diagnostics
unzip -o -d /tmp/diag elastic-agent-diagnostics-*
less /tmp/diag/components/filestream-default/beat-rendered-config.yml
- check the filestream input uses the old pattern:
      id: kubernetes-container-logs-coredns-66bc5c9577-d7f2z-4e613cb4a69a7a3269711d92b0164fc37138f350a740148204eb91e7018dbe03
      index: logs-kubernetes.container_logs-default
      parsers:
        - container:
            format: auto
            stream: all
      paths:
        - /var/log/containers/*4e613cb4a69a7a3269711d92b0164fc37138f350a740148204eb91e7018dbe03.log
  • update it to collect rotated logs:
helm upgrade elastic-agent elastic-agent-9.3.0-beta.tgz \
  -n kube-system \
  --set kubernetes.containers.logs.rotated_logs=true \
  --set outputs.default.url=https:\/\/YOUR_DEPLOYMENT.elastic-cloud.com:443 \
  --set kubernetes.onboardingID=YOUR_ONBORDING_ID \
  --set kubernetes.enabled=true \
  --set outputs.default.type=ESPlainAuthAPI \
  --set outputs.default.api_key=$(echo "YOUR_API_KEY" | base64 -d)
  • check there is a spike in ingested logs and not the logs come from /var/log/pods/
    Screenshot from 2025-11-12 10-28-08

  • check the rotated logs, plain and gzip, were ingested. Use the following filters log.file.path : *flog*log.* and log.file.path : *flog*log.*.gz:
    Screenshot from 2025-11-12 10-30-37
    Screenshot from 2025-11-12 10-28-24

  • check the agent config again:

  • exec into the container

kubectl exec -it agent-pernode-elastic-agent-* -n kube-system -- /bin/bash
  • on the container, install some tools, grab a diagnostics and check it:
microdnf install -y less unzip
elastic-agent diagnostics
unzip -o -d /tmp/diag elastic-agent-diagnostics-*
less /tmp/diag/components/filestream-default/beat-rendered-config.yml
- check the filestream input uses the new pattern:
      gzip_experimental: true
      id: kubernetes-container-logs-5906df5a-7185-4fc3-a2ec-9f088c729cf0-flog-unstructured-cont-rot
      index: logs-kubernetes.container_logs-default
      parsers:
        - container:
            format: auto
            stream: all
      paths:
        - /var/log/pods/default_flog-log-generator-vkz64_5906df5a-7185-4fc3-a2ec-9f088c729cf0/flog-unstructured-cont-rot/*.log*

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@AndersonQ AndersonQ self-assigned this Nov 11, 2025
@mergify
Copy link
Contributor

mergify bot commented Nov 11, 2025

This pull request does not have a backport label. Could you fix it @AndersonQ? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@AndersonQ AndersonQ added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Nov 12, 2025
@AndersonQ AndersonQ force-pushed the gzip-update-helm-chart branch 2 times, most recently from 7f72486 to 63be5a1 Compare November 12, 2025 09:50
@AndersonQ AndersonQ changed the title [WIP][helm/elastic-agent] upgrade helm chart to collect rotated logs [helm/elastic-agent] upgrade helm chart to collect rotated logs Nov 12, 2025
Upgates the elastic-agent helm chart to allow the collection of
rotated logs, including the GZIP-compressed logs.

AI tools were used to generate the CONTRIBUTING.md
@AndersonQ AndersonQ force-pushed the gzip-update-helm-chart branch from 10b0c5c to 71681f5 Compare November 13, 2025 14:19
@AndersonQ AndersonQ marked this pull request as ready for review November 13, 2025 14:20
@AndersonQ AndersonQ requested a review from a team as a code owner November 13, 2025 14:20
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@elasticmachine
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

History

cc @AndersonQ

Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks correct to me, but you need to add the new option to values.schema.json for validation. It would be nice to include that in the contributing doc you've added too.

You should also set the backport label. I take it that this will only go into 9.3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Helm chart] allow to ingest container rotated logs

3 participants