Skip to content

Commit 2cddbef

Browse files
authored
Add p-value heuristics to significant terms aggregation (#5353)
1 parent 4c1d1a7 commit 2cddbef

File tree

5 files changed

+137
-17
lines changed

5 files changed

+137
-17
lines changed

output/openapi/elasticsearch-openapi.json

Lines changed: 21 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/openapi/elasticsearch-serverless-openapi.json

Lines changed: 21 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/schema/schema.json

Lines changed: 63 additions & 17 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

output/typescript/types.ts

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

specification/_types/aggregations/bucket.ts

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -817,6 +817,22 @@ export class ScriptedHeuristic {
817817
script: Script
818818
}
819819

820+
export class PValueHeuristic {
821+
/*
822+
* Set to false to indicate that the background set does
823+
* not contain the counts of the foreground set as they are filtered out.
824+
* @server_default true
825+
*/
826+
background_is_superset?: boolean
827+
/**
828+
* Should the results be normalized when above the given value.
829+
* Allows for consistent significance results at various scales.
830+
* Note: `0` is a special value which means no normalization
831+
* @server_default 0
832+
*/
833+
normalize_above?: long
834+
}
835+
820836
/**
821837
* @ext_doc_id search-aggregations-bucket-significanttext-aggregation
822838
*/
@@ -870,6 +886,16 @@ export class SignificantTermsAggregation extends BucketAggregationBase {
870886
* Customized score, implemented via a script.
871887
*/
872888
script_heuristic?: ScriptedHeuristic
889+
/**
890+
* Significant terms heuristic that calculates the p-value between the term existing in foreground and background sets.
891+
*
892+
* The p-value is the probability of obtaining test results at least as extreme as
893+
* the results actually observed, under the assumption that the null hypothesis is
894+
* correct. The p-value is calculated assuming that the foreground set and the
895+
* background set are independent https://en.wikipedia.org/wiki/Bernoulli_trial, with the null
896+
* hypothesis that the probabilities are the same.
897+
*/
898+
p_value?: PValueHeuristic
873899
/**
874900
* Regulates the certainty a shard has if the term should actually be added to the candidate list or not with respect to the `min_doc_count`.
875901
* Terms will only be considered if their local shard frequency within the set is higher than the `shard_min_doc_count`.

0 commit comments

Comments
 (0)