Skip to content

Conversation

zekker6
Copy link
Contributor

@zekker6 zekker6 commented Sep 26, 2025

What this PR does

Fixes: #3653
PR implements proper retry mechanism for VictoriaLogs acquisition when using mode: tail. Previously it was unable to recover automatically and required service restart to continue data processing.

Additional context

This PR is based on work of @thebondo from #3654.

The difference between base PR is some final polishing to simplify error handling logic and fix support of since parameter as it is now possible with newer API versions - 1bb425c

Draft of docs update to reflect changes of this PR: crowdsecurity/crowdsec-docs#895

thebondo and others added 8 commits May 16, 2025 15:22
If the original HTTP get for the tail endpoint is successful, but
then the connection is lost, no retry is done. I updated the Tail
method to also retry in this case.
Upon review, the added code was pretty different in the approach
used to keep retrying compared to the approach for the QueryRange
method. I updated the method to create a new doTail that has the
same style as doQueryRange and updated Tail to use it. This has
the following effects:

- doTail will keep trying after losing a connection
- the retry interval will grow (with an upper limit) and shrink
  (with a lower limit) as connections are made and broken
- the time in the request is updated to avoid overlapping with
  previous data that was returned (missing in the first fix)
Bringing in changes from master.
Keeping up with changes from origin
The use of the ticker was unnecessary. I updated doTail to use a backoff
interval with time.After.
Keeping the fork up to date with the origin repository.
The tail query endpoint does not support start, but start_offset. I
updated the doTail method to use this parameter, and calculate the
required value each time the query is attempted based on the desired
start time for the results returned from the query.
…"since" paramater

Simplify code of backoff handling in order to reduce duplication. Also do not sleep when making the first request in order to avoid artificial delay for startup.

While at it, implement proper handling of "since" parameter. Previously, it was reset to 0 and ignored in "tail" mode since VL API did not support tailing results from the past.
Implemented full support to use "since" value when performing initial tailing and keeping track of last seen log item in order to not miss log lines when retrying the request.
Copy link

@zekker6: There are no 'kind' label on this PR. You need a 'kind' label to generate the release automatically.

  • /kind feature
  • /kind enhancement
  • /kind refactoring
  • /kind fix
  • /kind chore
  • /kind dependencies
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

Copy link

@zekker6: There are no area labels on this PR. You can add as many areas as you see fit.

  • /area agent
  • /area local-api
  • /area cscli
  • /area appsec
  • /area security
  • /area configuration
Details

I am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository.

@zekker6
Copy link
Contributor Author

zekker6 commented Sep 26, 2025

/kind fix
/area agent

Copy link

codecov bot commented Sep 26, 2025

Codecov Report

❌ Patch coverage is 61.90476% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.63%. Comparing base (02365c9) to head (ccda2be).

Files with missing lines Patch % Lines
...odules/victorialogs/internal/vlclient/vl_client.go 61.29% 17 Missing and 7 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3924   +/-   ##
=======================================
  Coverage   61.63%   61.63%           
=======================================
  Files         407      407           
  Lines       41864    41887   +23     
=======================================
+ Hits        25801    25816   +15     
- Misses      13936    13940    +4     
- Partials     2127     2131    +4     
Flag Coverage Δ
bats 45.76% <0.00%> (-0.09%) ⬇️
unit-linux 34.55% <61.90%> (+0.05%) ⬆️
unit-windows 24.39% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CrowdSec exits if the VictoriaLogs data source goes down temporarily

3 participants