-
Notifications
You must be signed in to change notification settings - Fork 77
Add __and__ and __or__ to Query #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
PSU3D0
wants to merge
1
commit into
quickwit-oss:master
Choose a base branch
from
PSU3D0:add_andor
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we check if
self
orother
already is aBooleanQuery
of all-Must
and apend ourselves to that instead of building trees ofBooleanQuery
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why would you want this feature? Complicated syntaxes like
(query1 | query2) & query3
can be very troublesome to implement. Its also common to nest queries in Lucene.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear for the reduced efficiency compared to having a flat list of subqueries. AFAIU, Tantivy will not optimize that query tree and its execution will directly mirror the structure of the input query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was rather thinking about downcasting here. I also think we can keep this internal as an optimization instead of changing the API, i.e. downcast to see if we already have a boolean query with matching modifiers and append a clause instead of nesting.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think the idea is fine, the optimization cannot be too deep due to the
Occur
implementation.Currently we have 3 variants of
Occur
:Should
,Must
,MustNot
. The optimization can be done when we have a tuple of(Occur, BooleanQuery)
where the boolean query might contain one or more groups ofShould
,Must
,MustNot
.Their mathematics goes like the symmetric matrix (Take the header as the index 0
Occur
of the(Occur, BooleanQuery)
tuple, while the rowsOccur
as the groups inside theBooleanQuery
):Should
Must
MustNot
Should
Should
(Must, [Should, ...])
(MustNot, [Should, ...])
Must
(Should, [Must, ...])
Must
MustNot
MustNot
(Should, [MustNot, ...])
MustNot
Must
This means that we can only flatten 1 layer those 5 types of
(Occur, InnerOccurGroup)
, and for the flattening we need to extract the boolean query's inner occur group, am I right?P.S. Why I remain
(Must, [Should, ...])
and(MustNot, [Should, ...])
though they can be deduced by DeMorgan's rule is that when transforming them, the resultant query still have the same degree of nesting and there is no significant benefits so I just remain them as it is.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment was only aimed at keeping the status quo, i.e. chaining clauses using
&
and|
should result in a single boolean query as if theboolean_query
method is used instead of a unbalanced tree.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you mean you expect whenever
__and__
and__or__
are used, the above optimization evaluation rule applies while the originalQuery.boolean_query
remains the status quo?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I just want
a & b & c
andboolean_query([(Must, a), (Must, b), (Must, c)])
to produce the same internal representation using a single boolean query instead of the first form using two boolean queries.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think the
&
and|
are binary operations that they must evaluate to something first, no matter the order? Say fora & b & c
then we must either evaluate tod & c
whered = a & b
or vice versa, and there must be nesting.In case we optimistically flatten the queries with the
&
or|
syntax, saya & b & c
, while any of the queriesa
,b
,c
are boolean, the resultant query will be different from aboolean_query([(Must, a), (Must, b), (Must, c)])
.If we really want to adapt this syntax while producing the same internal representation, the
Query.boolean_query
method should also be flattened.