Skip to content

Race condition between processing and scraping #144

Open
@leklund

Description

@leklund

There is a race condition that exists when the scrape is happening while all the per datacenter metrics are being incremented. When the results are processed from the real-time stats API it’s iterating and incrementing the metrics per-datacenter. If the scrape happens during that processing loop, the metrics that are reported won’t include all metrics for all datacenters since the response from the realtime API hasn’t finished processing yet. Therefore that scrape is reporting all the data from the last second of realtime data. I was able to easily reproduce by adding an artificial delay in the processing loop to force the scrape to happen in the middle of the loop. This can cause interesting graphs when running queries like:

(sum(rate(fastly_rt_requests_total[1m])) by(service_id)- (
sum(rate(fastly_rt_tls_total[1m]))by(service_id) ))

This line should be flat:

Screen Shot 2023-07-06 at 3 36 29 PM

A potential solution is to add some locking so that every scrape is guaranteed to have a full set of data from any given response from the API. This has some performance implications especially when running against many services.

Thanks to @mrnetops for reporting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions