From 20359959ca400e16634f894bcca68d22bc97c89b Mon Sep 17 00:00:00 2001 From: Simran Spiller Date: Tue, 1 Jul 2025 17:14:08 +0200 Subject: [PATCH 1/3] AQL optimization: COLLECT ... AGGREGATE can utilize persistent index --- .../version-3.12/whats-new-in-3-12.md | 34 +++++++++++++++++++ .../version-3.12/whats-new-in-3-12.md | 34 +++++++++++++++++++ 2 files changed, 68 insertions(+) diff --git a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md index 77b8c26bb2..32c8e4fe42 100644 --- a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md @@ -1257,6 +1257,40 @@ to some extent. See the [`COLLECT` operation](../../aql/high-level-operations/collect.md#disableindex) for details. +--- + +Introduced in: v3.12.5 + +The `use-index-for-collect` optimizer rule has been further extended. +Queries where a `COLLECT` operation has an `AGGREGATE` clause that exclusively +refers to attributes covered by a persistent index and no other variables can +now utilize this index. + +Reading the data from the index instead of the stored documents for aggregations +can significantly increase the perform if the there are few different values. + +```aql +FOR doc IN coll + COLLECT a = doc.a AGGREGATE b = MAX(doc.b) + RETURN { a, b } +``` + +If there is a persistent index over the attributes `a` and `b`, then the query +explain output shows an `IndexCollectNode` if the optimization is applied: + +```aql +Execution plan: + Id NodeType Par Est. Comment + 1 SingletonNode 1 * ROOT + 10 IndexCollectNode 4999 - FOR doc IN coll COLLECT a = doc.`a` AGGREGATE b = MAX(doc.`b`) /* full index scan */ + 6 CalculationNode ✓ 4999 - LET #5 = { "a" : a, "b" : b } /* simple expression */ + 7 ReturnNode 4999 - RETURN #5 + +Indexes used: + By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges + 10 idx_1836452431376941056 persistent coll +``` + ## Indexing ### Multi-dimensional indexes diff --git a/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md index 77b8c26bb2..32c8e4fe42 100644 --- a/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md @@ -1257,6 +1257,40 @@ to some extent. See the [`COLLECT` operation](../../aql/high-level-operations/collect.md#disableindex) for details. +--- + +Introduced in: v3.12.5 + +The `use-index-for-collect` optimizer rule has been further extended. +Queries where a `COLLECT` operation has an `AGGREGATE` clause that exclusively +refers to attributes covered by a persistent index and no other variables can +now utilize this index. + +Reading the data from the index instead of the stored documents for aggregations +can significantly increase the perform if the there are few different values. + +```aql +FOR doc IN coll + COLLECT a = doc.a AGGREGATE b = MAX(doc.b) + RETURN { a, b } +``` + +If there is a persistent index over the attributes `a` and `b`, then the query +explain output shows an `IndexCollectNode` if the optimization is applied: + +```aql +Execution plan: + Id NodeType Par Est. Comment + 1 SingletonNode 1 * ROOT + 10 IndexCollectNode 4999 - FOR doc IN coll COLLECT a = doc.`a` AGGREGATE b = MAX(doc.`b`) /* full index scan */ + 6 CalculationNode ✓ 4999 - LET #5 = { "a" : a, "b" : b } /* simple expression */ + 7 ReturnNode 4999 - RETURN #5 + +Indexes used: + By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges + 10 idx_1836452431376941056 persistent coll +``` + ## Indexing ### Multi-dimensional indexes From abad606baaa124b05ef61b2f1574c219f05058b2 Mon Sep 17 00:00:00 2001 From: Simran Spiller Date: Fri, 11 Jul 2025 12:47:25 +0200 Subject: [PATCH 2/3] Julia's feedback --- .../3.12/release-notes/version-3.12/whats-new-in-3-12.md | 9 +++++---- .../3.13/release-notes/version-3.12/whats-new-in-3-12.md | 9 +++++---- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md index 32c8e4fe42..bfd74e5506 100644 --- a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md @@ -1264,10 +1264,10 @@ for details. The `use-index-for-collect` optimizer rule has been further extended. Queries where a `COLLECT` operation has an `AGGREGATE` clause that exclusively refers to attributes covered by a persistent index and no other variables can -now utilize this index. +now utilize this index. The index must not be sparse. Reading the data from the index instead of the stored documents for aggregations -can significantly increase the perform if the there are few different values. +can increase the performance by a factor of two. ```aql FOR doc IN coll @@ -1275,8 +1275,9 @@ FOR doc IN coll RETURN { a, b } ``` -If there is a persistent index over the attributes `a` and `b`, then the query -explain output shows an `IndexCollectNode` if the optimization is applied: +If there is a persistent index over the attributes `a` and `b`, then the above +example query has an `IndexCollectNode` in the explain output and the index +usage is indicated if the optimization is applied: ```aql Execution plan: diff --git a/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md index 32c8e4fe42..bfd74e5506 100644 --- a/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md @@ -1264,10 +1264,10 @@ for details. The `use-index-for-collect` optimizer rule has been further extended. Queries where a `COLLECT` operation has an `AGGREGATE` clause that exclusively refers to attributes covered by a persistent index and no other variables can -now utilize this index. +now utilize this index. The index must not be sparse. Reading the data from the index instead of the stored documents for aggregations -can significantly increase the perform if the there are few different values. +can increase the performance by a factor of two. ```aql FOR doc IN coll @@ -1275,8 +1275,9 @@ FOR doc IN coll RETURN { a, b } ``` -If there is a persistent index over the attributes `a` and `b`, then the query -explain output shows an `IndexCollectNode` if the optimization is applied: +If there is a persistent index over the attributes `a` and `b`, then the above +example query has an `IndexCollectNode` in the explain output and the index +usage is indicated if the optimization is applied: ```aql Execution plan: From ee881757a0c3324ae3036f0c17e4a1fe49c8b0e3 Mon Sep 17 00:00:00 2001 From: Simran Spiller Date: Tue, 15 Jul 2025 18:14:42 +0200 Subject: [PATCH 3/3] Can't call COUNT() in AGGREGATE either --- .../3.12/release-notes/version-3.12/whats-new-in-3-12.md | 5 +++-- .../3.13/release-notes/version-3.12/whats-new-in-3-12.md | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md index 6aa6522179..475762f926 100644 --- a/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.12/release-notes/version-3.12/whats-new-in-3-12.md @@ -1263,8 +1263,9 @@ for details. The `use-index-for-collect` optimizer rule has been further extended. Queries where a `COLLECT` operation has an `AGGREGATE` clause that exclusively -refers to attributes covered by a persistent index and no other variables can -now utilize this index. The index must not be sparse. +refers to attributes covered by a persistent index (and no other variables nor +contains calls of aggregation functions with constant values) can now utilize +this index. The index must not be sparse. Reading the data from the index instead of the stored documents for aggregations can increase the performance by a factor of two. diff --git a/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md b/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md index 6aa6522179..475762f926 100644 --- a/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md +++ b/site/content/3.13/release-notes/version-3.12/whats-new-in-3-12.md @@ -1263,8 +1263,9 @@ for details. The `use-index-for-collect` optimizer rule has been further extended. Queries where a `COLLECT` operation has an `AGGREGATE` clause that exclusively -refers to attributes covered by a persistent index and no other variables can -now utilize this index. The index must not be sparse. +refers to attributes covered by a persistent index (and no other variables nor +contains calls of aggregation functions with constant values) can now utilize +this index. The index must not be sparse. Reading the data from the index instead of the stored documents for aggregations can increase the performance by a factor of two.