Support per index cache strategy and time based caching condition

**Is your feature request related to a problem? Please describe.**
Thanks for providing such an amazing piece of work, quickwit provides everything(almost) we need for our platform. 
Our workload pattern is very similar to what is described in #5445 , at a much smaller scale. Currently we have 11 indexes, split sized-wise, 2 indexes range from 100T to 150T, 1 index is at about 20TB, others are well below 1TB.

The LRU cache strategy is very brittle against "big scans" that runs every now and then(less than 10 times every day). Some work( #5469 ) have been done to support LFU strategy which might work, but it still lacks flexibility.

In our case ,caching is not for performance, quickwit with no disk cache is blazing fast, which is where quickwit's engineering truly shines. Long range queries with term conditions (trace_id = xxx) can not be effectively cached anyways, downloading all splits to local disk won't help. 

The actual value of cache for us is that s3 requests are greatly reduced for repeated data queries, which saves money and makes some pattern economically viable(100+TPS read on last x days data). 


**Describe the solution you'd like**
To mitigate the cache churn issue, I would like quickwit to support following features
1. support customized cache strategy for each index, instead of the whole cluster(also mentioned in #5445 ). For most write-heavy, read-never workloads(yep, logs), user is not very sensitive to latency, we can simply disable cache, which saves tons of disk space.
2. support time range condition for cache fetching. New configuration "cache_within" can be specified for each index, only split with in the time range will be downloaded. For example, if someone query an index with cache_within set to 7d(7 days), only split relative to now is less than 7 days old will be downloaded and cached, other split simply stays in object storage. 

**Describe alternatives you've considered**
For feature request 1, we evaluated a potential solution that use 2 searcher cluster to handle 2 groups of index(with cache/ w.o cache). It is not hard to play with some logic in the http proxy to quickwit since we also have to do authentication anyways.

For feature request 2, we believe there is no alternative.

**Additional context**
Out of curiosity, what's the plan for this project? Are you willing to take contributions? If it is ok, we would like to try working on this issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support per index cache strategy and time based caching condition #5650

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support per index cache strategy and time based caching condition #5650

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions