-
Notifications
You must be signed in to change notification settings - Fork 25.4k
[WIP] Support extract_snippets function in ES|QL #132549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
kderusso
wants to merge
47
commits into
elastic:main
Choose a base branch
from
kderusso:kderusso/esql-extract-snippets
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 24 commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
ee56018
Initial plumbing for an ES|QL extract_snippets function
kderusso eb0a876
Add HighlighterExpressionEvaluator
kderusso 8c0f312
Pair programming session
carlosdelest 86dc82a
Create highlight query
kderusso 4f4f157
Make extract snippets rewriteable
kderusso d68c2e8
Add comments from session with Carlos
kderusso 0571100
Make translation aware and get further down the rewrite cycle (still …
kderusso 9fe7654
Move building highlight query to extract snippets
kderusso 8adea56
Cherry-pick: Initial incomplete work for creating the Highlighter in …
carlosdelest 6be55b4
Hack in highlighter so it actually produces a response
kderusso 60e3ce6
[CI] Auto commit changes from spotless
b6fb4f3
Change LuceneQueryEvaluator to use Blocks instead of Vectors to make …
carlosdelest f6a8079
Add rewritability
carlosdelest 1ca0b58
Solve params via fold
carlosdelest 34c10f5
Use SORT to push down the EVAL clause, so it's executed on local nodes
carlosdelest 02cebe7
[CI] Auto commit changes from spotless
b923a2e
Workaround for rewrite
kderusso 5b9347c
Make highlighters accessible
kderusso 44b1bc4
[CI] Auto commit changes from spotless
82412d8
Return semantic highlight results
kderusso 1bc3d16
Merge main into kderusso/esql-extract-snippets
kderusso d4ba21d
[CI] Auto commit changes from spotless
932864a
Cleanup
kderusso 632df21
[CI] Auto commit changes from spotless
838b054
Move highlighters from EvalMapper to SearchContext
kderusso 0b0487e
Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/…
kderusso eee88be
[CI] Auto commit changes from spotless
77b44d5
Cleanup how we pull field attributes in extract snippets
kderusso 5ab3c56
Fix compilation error due to auto-commit suggestion
kderusso a6a0f11
Add queryBuilder to ExtractSnippets#info
kderusso 9c7609c
Move construction of objects to ctor when possible
kderusso 4a37634
Refactor highlighting logic into util class
kderusso 675e78b
Fix EsqlNodeSubclassTests#testReplaceChildren
kderusso d5c9d91
Start adding CSV tests
kderusso bd369f7
Fix initialization error
kderusso ccda43d
Clean up duplication when creating highlighter
kderusso 35120e6
Support default parameters when not specified
kderusso de46fef
Fix char encoding bug for text fields (not semantic_text)
kderusso ff3f3c1
Merge main into kderusso/esql-extract-snippets
kderusso 5f20480
Truncate snippets that are longer than requested size
kderusso ae92c83
Fix most extractSnippets CSV tests, add some more test cases
kderusso 48c2825
Remove changes to AnalyzerTests
kderusso 80d1056
Spotless
kderusso ec3ac7a
Add preview = true
kderusso 694bf6a
Add ExtractSnippetTests and associated generated documentation
kderusso 0ef8fce
Add integration test for extract_snippets
kderusso e15f824
Merge main into kderusso/esql-extract-snippets
kderusso File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
184 changes: 184 additions & 0 deletions
184
...ompute/src/main/java/org/elasticsearch/compute/lucene/HighlighterExpressionEvaluator.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0; you may not use this file except in compliance with the Elastic License | ||
* 2.0. | ||
*/ | ||
|
||
package org.elasticsearch.compute.lucene; | ||
|
||
import org.apache.lucene.index.LeafReaderContext; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.lucene.search.Scorable; | ||
import org.apache.lucene.search.ScoreMode; | ||
import org.apache.lucene.util.BytesRef; | ||
import org.elasticsearch.compute.data.Block; | ||
import org.elasticsearch.compute.data.BlockFactory; | ||
import org.elasticsearch.compute.data.BytesRefBlock; | ||
import org.elasticsearch.compute.data.Page; | ||
import org.elasticsearch.compute.operator.DriverContext; | ||
import org.elasticsearch.compute.operator.EvalOperator; | ||
import org.elasticsearch.index.fieldvisitor.LeafStoredFieldLoader; | ||
import org.elasticsearch.index.fieldvisitor.StoredFieldLoader; | ||
import org.elasticsearch.index.mapper.MappedFieldType; | ||
import org.elasticsearch.index.mapper.SourceLoader; | ||
import org.elasticsearch.search.SearchHit; | ||
import org.elasticsearch.search.fetch.FetchContext; | ||
import org.elasticsearch.search.fetch.FetchSubPhase; | ||
import org.elasticsearch.search.fetch.subphase.highlight.DefaultHighlighter; | ||
import org.elasticsearch.search.fetch.subphase.highlight.FieldHighlightContext; | ||
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder; | ||
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField; | ||
import org.elasticsearch.search.fetch.subphase.highlight.Highlighter; | ||
import org.elasticsearch.search.fetch.subphase.highlight.SearchHighlightContext; | ||
import org.elasticsearch.search.internal.SearchContext; | ||
import org.elasticsearch.search.lookup.Source; | ||
import org.elasticsearch.xcontent.Text; | ||
|
||
import java.io.IOException; | ||
import java.io.UncheckedIOException; | ||
import java.util.Collections; | ||
import java.util.HashMap; | ||
import java.util.Map; | ||
import java.util.function.Supplier; | ||
|
||
import static org.elasticsearch.core.RefCounted.ALWAYS_REFERENCED; | ||
|
||
public class HighlighterExpressionEvaluator extends LuceneQueryEvaluator<BytesRefBlock.Builder> | ||
implements | ||
EvalOperator.ExpressionEvaluator { | ||
|
||
private final String fieldName; | ||
private final Integer numFragments; | ||
private final Integer fragmentLength; | ||
private final SearchContext searchContext; | ||
private final Map<String, Highlighter> highlighters; | ||
|
||
HighlighterExpressionEvaluator( | ||
BlockFactory blockFactory, | ||
ShardConfig[] shardConfigs, | ||
String fieldName, | ||
Integer numFragments, | ||
Integer fragmentLength, | ||
SearchContext searchContext, | ||
Map<String, Highlighter> highlighters | ||
) { | ||
super(blockFactory, shardConfigs); | ||
this.fieldName = fieldName; | ||
this.numFragments = numFragments; | ||
this.fragmentLength = fragmentLength; | ||
this.searchContext = searchContext; | ||
this.highlighters = highlighters; | ||
} | ||
|
||
@Override | ||
protected ScoreMode scoreMode() { | ||
return ScoreMode.COMPLETE; | ||
} | ||
|
||
@Override | ||
protected Block createNoMatchBlock(BlockFactory blockFactory, int size) { | ||
return blockFactory.newConstantNullBlock(size); | ||
} | ||
|
||
@Override | ||
protected BytesRefBlock.Builder createBlockBuilder(BlockFactory blockFactory, int size) { | ||
return blockFactory.newBytesRefBlockBuilder(size * numFragments); | ||
} | ||
|
||
@Override | ||
protected void appendMatch(BytesRefBlock.Builder builder, Scorable scorer, int docId, LeafReaderContext leafReaderContext, Query query) | ||
throws IOException { | ||
|
||
// TODO: Can we build a custom highlighter directly here, so we don't have to rely on fetch phase classes? | ||
|
||
// Create a source loader for highlighter use | ||
SourceLoader sourceLoader = searchContext.newSourceLoader(null); | ||
FetchContext fetchContext = new FetchContext(searchContext, sourceLoader); | ||
MappedFieldType fieldType = searchContext.getSearchExecutionContext().getFieldType(fieldName); | ||
SearchHit searchHit = new SearchHit(docId, null, null, ALWAYS_REFERENCED); | ||
Source source = Source.lazy(lazyStoredSourceLoader(leafReaderContext, docId)); | ||
Highlighter highlighter = highlighters.getOrDefault(fieldType.getDefaultHighlighter(), new DefaultHighlighter()); | ||
|
||
// TODO: Consolidate these options with the ones built in the text similarity reranker | ||
SearchHighlightContext.FieldOptions.Builder optionsBuilder = new SearchHighlightContext.FieldOptions.Builder(); | ||
kderusso marked this conversation as resolved.
Show resolved
Hide resolved
|
||
optionsBuilder.numberOfFragments(numFragments != null ? numFragments : HighlightBuilder.DEFAULT_NUMBER_OF_FRAGMENTS); | ||
optionsBuilder.fragmentCharSize(fragmentLength != null ? fragmentLength : HighlightBuilder.DEFAULT_FRAGMENT_CHAR_SIZE); | ||
optionsBuilder.preTags(new String[] { "" }); | ||
optionsBuilder.postTags(new String[] { "" }); | ||
optionsBuilder.requireFieldMatch(false); | ||
optionsBuilder.scoreOrdered(true); | ||
optionsBuilder.highlightQuery(query); | ||
SearchHighlightContext.Field field = new SearchHighlightContext.Field(fieldName, optionsBuilder.build()); | ||
|
||
FetchSubPhase.HitContext hitContext = new FetchSubPhase.HitContext(searchHit, leafReaderContext, docId, Map.of(), source, null); | ||
FieldHighlightContext highlightContext = new FieldHighlightContext( | ||
fieldName, | ||
field, | ||
fieldType, | ||
fetchContext, | ||
hitContext, | ||
query, | ||
new HashMap<>() | ||
); | ||
HighlightField highlight = highlighter.highlight(highlightContext); | ||
|
||
if (highlight != null) { | ||
boolean multivalued = highlight.fragments().length > 1; | ||
if (multivalued) { | ||
builder.beginPositionEntry(); | ||
} | ||
for (Text highlightText : highlight.fragments()) { | ||
builder.appendBytesRef(new BytesRef(highlightText.bytes().bytes())); | ||
} | ||
if (multivalued) { | ||
builder.endPositionEntry(); | ||
} | ||
} | ||
} | ||
|
||
private static Supplier<Source> lazyStoredSourceLoader(LeafReaderContext ctx, int doc) { | ||
return () -> { | ||
StoredFieldLoader rootLoader = StoredFieldLoader.create(true, Collections.emptySet()); | ||
try { | ||
LeafStoredFieldLoader leafRootLoader = rootLoader.getLoader(ctx, null); | ||
leafRootLoader.advanceTo(doc); | ||
return Source.fromBytes(leafRootLoader.source()); | ||
} catch (IOException e) { | ||
throw new UncheckedIOException(e); | ||
} | ||
}; | ||
} | ||
|
||
@Override | ||
protected void appendNoMatch(BytesRefBlock.Builder builder) { | ||
builder.appendNull(); | ||
} | ||
|
||
@Override | ||
public Block eval(Page page) { | ||
return executeQuery(page); | ||
} | ||
|
||
public record Factory( | ||
ShardConfig[] shardConfigs, | ||
String fieldName, | ||
Integer numFragments, | ||
Integer fragmentSize, | ||
SearchContext searchContext, | ||
Map<String, Highlighter> highlighters | ||
) implements EvalOperator.ExpressionEvaluator.Factory { | ||
@Override | ||
public EvalOperator.ExpressionEvaluator get(DriverContext context) { | ||
return new HighlighterExpressionEvaluator( | ||
context.blockFactory(), | ||
shardConfigs, | ||
fieldName, | ||
numFragments, | ||
fragmentSize, | ||
searchContext, | ||
highlighters | ||
); | ||
} | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tough that the highlighter code is so deeply ingrained into the fetch phase 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - this was the best/cleanest solution that I could come up with, if you have better suggestions I'd be happy to talk about them!