[POC] Support extract_snippets function in ES|QL #132549

kderusso · 2025-08-07T20:17:08Z

Provides a prototype implementation of the EXTRACT_SNIPPETS function in ES|QL, for discussion on the best approaches to pursue.

Here is an example of how to call this field:

POST _query?format=txt
{
  "query": """
  FROM books METADATA _score
  | EVAL snippets = extract_snippets(synopsis, "hobbit takes a ring on a long journey", 3, 10)
  | KEEP title, snippets, _score
  | SORT _score DESC 
  | LIMIT 10
  """
}

You can break apart multiple values using MV_EXPAND:

POST _query?format=txt
{
  "query": """
  FROM books METADATA _score
  | EVAL snippets = extract_snippets(synopsis, "hobbit takes a ring on a long journey", 1, 10)
  | MV_EXPAND snippets
  | KEEP title, snippets, _score
  | SORT _score DESC 
  | LIMIT 10
  """
}

EXTRACT_SNIPPETS will work on text and semantic_text fields, though semantic_text fields do not yet support char length so the whole chunk will be returned.

Some notes on this POC:

We've had some discussions on whether this should be a function or a command. This POC is currently implemented as a function. We think a function is OK for now, the only reason to use a command is the fact that we want to perform inference via an async call. As long as we only perform inference once (like we do for the semantic match query) a function should be OK.
This POC uses the data node to retrieve the field value for snippet extraction. This restriction could potentially be removed by using an in-memory Lucene index to perform the highlighting query if needed.

…doesn't completely work yet)

…the expression evaluator

kderusso · 2025-08-08T20:36:45Z

@carlosdelest @jimczi I plan to reach out to you next week RE: this PR.

Two main questions:

Why is the QueryBuilderResolver not doing a full rewrite?
Why is appending BytesRef only ever returning the first item? Does this need to be a List instead?

…it multivalue aware

carlosdelest

This looks good and seems doable 💯

I have some concerns about:

Removing the highlighters from the ToEvaluator interface
Need to have a way of moving the execution to the data nodes
Because of that, I keep wondering if using a command would be a better approach from the implementation PoV.

I'm sure we will get more clarity on those as we move forward 👍

...ain/java/org/elasticsearch/xpack/esql/expression/function/fulltext/QueryBuilderResolver.java

...ain/java/org/elasticsearch/xpack/esql/expression/function/scalar/string/ExtractSnippets.java

carlosdelest · 2025-08-14T12:23:51Z

...esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/plugin/ExtractSnippetsIT.java

+        var query = """
+            FROM test
+            | EVAL x = extract_snippets(content, "fox", 1, 10)
+            | SORT x


It's important that we push the evaluator to the data nodes - we'll need a physical optimization rule for that. Otherwise, the evaluator won't have shard context data and won't be able to run.

That may be tougher than it looks, as we will need to maintain the dependencies from other EVALs that use the produced snippet.

We will know we have succeeded when we can do the above query without SORT:

FROM test | EVAL x = extract_snippets(content, "fox", 1, 10)

... and it works because it has been moved below a ExchangeExec to be pushed to the data nodes.

carlosdelest · 2025-08-14T12:29:37Z

...l/compute/src/main/java/org/elasticsearch/compute/lucene/HighlighterExpressionEvaluator.java

+        Highlighter highlighter = highlighters.getOrDefault(fieldType.getDefaultHighlighter(), new DefaultHighlighter());
+
+        // TODO: Consolidate these options with the ones built in the text similarity reranker
+        SearchHighlightContext.FieldOptions.Builder optionsBuilder = new SearchHighlightContext.FieldOptions.Builder();


There's a lot of construction for every appendMatch - Hopefully we can move this construction to the evaluator constructor and reuse it for every matching doc.

I don't like that appendMatch is so crowded with params - but I didn't see a better alternative at the time.

carlosdelest · 2025-08-14T12:30:27Z

server/src/main/java/org/elasticsearch/search/SearchModule.java

@@ -921,6 +923,10 @@ private static Map<String, Highlighter> setupHighlighters(Settings settings, Lis
        return unmodifiableMap(highlighters.getRegistry());
    }

+    public static Map<String, Highlighter> getStaticHighlighters() {


It's tough that the highlighter code is so deeply ingrained into the fetch phase 😢

Yes - this was the best/cleanest solution that I could come up with, if you have better suggestions I'd be happy to talk about them!

carlosdelest · 2025-08-14T12:35:34Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/evaluator/EvalMapper.java

@@ -79,6 +82,11 @@ public FoldContext foldCtx() {
                public List<ShardContext> shardContexts() {
                    return shardContexts;
                }
+
+                @Override
+                public Map<String, Highlighter> highlighters() {


I keep struggling with this - highlighters should not be something the toEvaluator method or other evaluators care about.

I wonder if making it part of the SearchContext or ShardContext would be a better option, so it's passed to the factory and is not a part of the ToEvaluator interface.

That's a good idea - I made that move in 838b054 - I think that's better

…expression/function/fulltext/QueryBuilderResolver.java Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>

kderusso and others added 10 commits July 29, 2025 16:27

Initial plumbing for an ES|QL extract_snippets function

ee56018

Add HighlighterExpressionEvaluator

eb0a876

Pair programming session

8c0f312

Create highlight query

86dc82a

Make extract snippets rewriteable

4f4f157

Add comments from session with Carlos

d68c2e8

Make translation aware and get further down the rewrite cycle (still …

0571100

…doesn't completely work yet)

Move building highlight query to extract snippets

9fe7654

Cherry-pick: Initial incomplete work for creating the Highlighter in …

8adea56

…the expression evaluator

Hack in highlighter so it actually produces a response

6be55b4

elasticsearchmachine added the v9.2.0 label Aug 7, 2025

[CI] Auto commit changes from spotless

60e3ce6

carlosdelest and others added 6 commits August 11, 2025 14:13

Change LuceneQueryEvaluator to use Blocks instead of Vectors to make …

b6fb4f3

…it multivalue aware

Add rewritability

f6a8079

Solve params via fold

1ca0b58

Use SORT to push down the EVAL clause, so it's executed on local nodes

34c10f5

[CI] Auto commit changes from spotless

02cebe7

Workaround for rewrite

b923a2e

kderusso force-pushed the kderusso/esql-extract-snippets branch 3 times, most recently from cb98553 to 2872969 Compare August 12, 2025 19:39

Make highlighters accessible

5b9347c

kderusso force-pushed the kderusso/esql-extract-snippets branch from 2872969 to 5b9347c Compare August 12, 2025 19:40

elasticsearchmachine and others added 6 commits August 12, 2025 19:49

[CI] Auto commit changes from spotless

44b1bc4

Return semantic highlight results

82412d8

Merge main into kderusso/esql-extract-snippets

1bc3d16

[CI] Auto commit changes from spotless

d4ba21d

Cleanup

932864a

[CI] Auto commit changes from spotless

632df21

carlosdelest reviewed Aug 14, 2025

View reviewed changes

kderusso and others added 5 commits August 14, 2025 15:04

Move highlighters from EvalMapper to SearchContext

838b054

Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/…

0b0487e

…expression/function/fulltext/QueryBuilderResolver.java Co-authored-by: Carlos Delgado <6339205+carlosdelest@users.noreply.github.com>

[CI] Auto commit changes from spotless

eee88be

Cleanup how we pull field attributes in extract snippets

77b44d5

Fix compilation error due to auto-commit suggestion

5ab3c56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[POC] Support extract_snippets function in ES|QL #132549

[POC] Support extract_snippets function in ES|QL #132549

kderusso commented Aug 7, 2025 •

edited

Loading

Uh oh!

kderusso commented Aug 8, 2025

Uh oh!

carlosdelest left a comment

Uh oh!

Uh oh!

Uh oh!

carlosdelest Aug 14, 2025

Uh oh!

carlosdelest Aug 14, 2025

Uh oh!

carlosdelest Aug 14, 2025

Uh oh!

kderusso Aug 14, 2025

Uh oh!

carlosdelest Aug 14, 2025

Uh oh!

kderusso Aug 14, 2025

Uh oh!

Uh oh!

[POC] Support extract_snippets function in ES|QL #132549

Are you sure you want to change the base?

[POC] Support extract_snippets function in ES|QL #132549

Conversation

kderusso commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kderusso commented Aug 8, 2025

Uh oh!

carlosdelest left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

carlosdelest Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

carlosdelest Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

carlosdelest Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

kderusso Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

carlosdelest Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

kderusso Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kderusso commented Aug 7, 2025 •

edited

Loading