-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Add time range bucketing attribute to APM took time latency metrics #135549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add time range bucketing attribute to APM took time latency metrics #135549
Conversation
This is similar to elastic#135524, but adding the attribute to the took time latency metric. That requires a bit of ceremony as the took time metric is recorded on the coordinating node, while the time range filter is parsed on each shard. We don't have mappings available on the coord node, which are needed to parse dates on the coord node. Thus we need to rely on date parsing done on the data nodes, which requires sending back the parsed value to the coord node, performing some simple reduction on it, and adding it back to the search response.
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Hi @javanna, I've created a changelog YAML for you. |
) | ||
) { | ||
QueryPhase.addCollectorsAndSearch(rankSearchContext); | ||
QueryPhase.addCollectorsAndSearch(rankSearchContext, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into this as a follow-up, need to write a test that leverages this and see if it makes sense to track timestamp from for it too, it probably does.
assertEquals("hits_only", attributes.get("query_type")); | ||
assertEquals("_score", attributes.get("sort")); | ||
assertEquals("pit", attributes.get("pit_scroll")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow-up, I have more work to do around retrievers, as they optionally hold a query too. Need to walk the retrievers tree like I do for queries, and add more tests.
} else { | ||
this.rangeTimestampFrom = Math.min(rangeTimestampFrom, this.rangeTimestampFrom); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that we need this in QueryRewriteContext, that this extends, to do the same tracking when we parse the time range filter as part of query rewrite. There's cases when the range gets rewritten to a range with no bounds (match all), for which we still want to bucket the request based on its original time range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Luca
server/src/main/java/org/elasticsearch/action/search/SearchResponse.java
Show resolved
Hide resolved
addBooleanClauses(context, booleanQueryBuilder, mustNotClauses, BooleanClause.Occur.MUST_NOT); | ||
addBooleanClauses(context, booleanQueryBuilder, shouldClauses, BooleanClause.Occur.SHOULD); | ||
try { | ||
context.setTrackRangeTimestampFrom(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we document why we disable this tracking here?
maybe secondary, but if we have only one should clause, we want to track the range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a comment, yes we could have a special case for bool with a single should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this reminds me of the current discrepancy between the boolean attribute that signals whether there is a filter on timestamp, and the range bucket. I meant to ask you what you think about this.
If a shard has all documents within the range, it is in fact not a range query when it comes to the lucene level, but rather a match_all. I thought that we'd want to track this distinction.
At the coord level, we will always extract the original range before query rewrite and set to flag to true.
At the shard level, only those shards that effectively execute a range query will have the boolean flag set to true, but all will have the range extracted. Would we want instead the two to be consistent? I even challenged at some point that we still want the query introspection if all we use it for is to figure out whether there was a filter on @timestamp
: we now have another way to do that which does not require visiting the query tree, and perhaps we don't even need the boolean flag anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking further, I don't think I will add a special case for bool with a single should clause. I already don't like how intrusive the metrics tracking is in the actual code that does stuff. I am way of adding more logic to control when to report on metrics, intermingled with the actual code that rewrites and executes queries. I think that having a time range as a single should clause it also quite an edge case. If needed we can always optimize it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks for looking into this
// range queries may get rewritten to match_all or a range with open bounds. Rewriting in that case is the only place | ||
// where we parse the date and set it to the context. We need to propagate it back from the clone into the original context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ this is very useful and a true comment, documenting the why. Thank you!
server/src/main/java/org/elasticsearch/search/query/QuerySearchResult.java
Show resolved
Hide resolved
….java Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
…hResult.java Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
This is similar to #135524, but adding the attribute to the took time latency metric.
That requires a bit of ceremony as the took time metric is recorded on the coordinating node, while the time range filter is parsed on each shard. We don't have mappings available on the coord node, which are needed to parse dates on the coord node. Thus we need to rely on date parsing done on the data nodes, which requires sending back the parsed value to the coord node, performing some simple reduction on it, and adding it back to the search response.