-
Notifications
You must be signed in to change notification settings - Fork 25.4k
[POC] Support extract_snippets function in ES|QL #132549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…doesn't completely work yet)
…the expression evaluator
@carlosdelest @jimczi I plan to reach out to you next week RE: this PR. Two main questions:
|
…it multivalue aware
cb98553
to
2872969
Compare
2872969
to
5b9347c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good and seems doable 💯
I have some concerns about:
- Removing the highlighters from the
ToEvaluator
interface - Need to have a way of moving the execution to the data nodes
- Because of that, I keep wondering if using a command would be a better approach from the implementation PoV.
I'm sure we will get more clarity on those as we move forward 👍
...ain/java/org/elasticsearch/xpack/esql/expression/function/fulltext/QueryBuilderResolver.java
Outdated
Show resolved
Hide resolved
...ain/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/ExtractSnippets.java
Outdated
Show resolved
Hide resolved
var query = """ | ||
FROM test | ||
| EVAL x = extract_snippets(content, "fox", 1, 10) | ||
| SORT x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important that we push the evaluator to the data nodes - we'll need a physical optimization rule for that. Otherwise, the evaluator won't have shard context data and won't be able to run.
That may be tougher than it looks, as we will need to maintain the dependencies from other EVALs that use the produced snippet.
We will know we have succeeded when we can do the above query without SORT:
FROM test
| EVAL x = extract_snippets(content, "fox", 1, 10)
... and it works because it has been moved below a ExchangeExec
to be pushed to the data nodes.
Highlighter highlighter = highlighters.getOrDefault(fieldType.getDefaultHighlighter(), new DefaultHighlighter()); | ||
|
||
// TODO: Consolidate these options with the ones built in the text similarity reranker | ||
SearchHighlightContext.FieldOptions.Builder optionsBuilder = new SearchHighlightContext.FieldOptions.Builder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of construction for every appendMatch
- Hopefully we can move this construction to the evaluator constructor and reuse it for every matching doc.
I don't like that appendMatch is so crowded with params - but I didn't see a better alternative at the time.
@@ -921,6 +923,10 @@ private static Map<String, Highlighter> setupHighlighters(Settings settings, Lis | |||
return unmodifiableMap(highlighters.getRegistry()); | |||
} | |||
|
|||
public static Map<String, Highlighter> getStaticHighlighters() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tough that the highlighter code is so deeply ingrained into the fetch phase 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - this was the best/cleanest solution that I could come up with, if you have better suggestions I'd be happy to talk about them!
@@ -79,6 +82,11 @@ public FoldContext foldCtx() { | |||
public List<ShardContext> shardContexts() { | |||
return shardContexts; | |||
} | |||
|
|||
@Override | |||
public Map<String, Highlighter> highlighters() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I keep struggling with this - highlighters should not be something the toEvaluator
method or other evaluators care about.
I wonder if making it part of the SearchContext
or ShardContext
would be a better option, so it's passed to the factory and is not a part of the ToEvaluator
interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea - I made that move in 838b054 - I think that's better
…expression/function/fulltext/QueryBuilderResolver.java Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>
Provides a prototype implementation of the
EXTRACT_SNIPPETS
function in ES|QL, for discussion on the best approaches to pursue.Here is an example of how to call this field:
You can break apart multiple values using
MV_EXPAND
:EXTRACT_SNIPPETS
will work ontext
andsemantic_text
fields, thoughsemantic_text
fields do not yet support char length so the whole chunk will be returned.Some notes on this POC: