-
Notifications
You must be signed in to change notification settings - Fork 0
Patched DF 47.0.0 (take 1) #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c64bfb6
to
e6e3a8b
Compare
…rceDistribution) which later causes an error during EnforceSort (without our patch). The next DataFusion version 46 upgrade does the proper fix, which is to not insert the coalesce in the first place. test: recreating the iox plan: * demonstrate the insertion of coalesce after the use of column estimates, and the removal of the test scenario's forcing of rr repartitioning test: reproducer of SanityCheck failure after EnforceSorting removes the coalesce added in the EnforceDistribution fix: special case to not remove the needed coalesce
e6e3a8b
to
0b003c3
Compare
/// Same as [`repartitions_for_aggregate_after_sorted_union`], but adds a projection | ||
/// as well between the union and aggregate. This change the outcome: | ||
/// | ||
/// * we no longer get repartitioning, and instead get coalescing. | ||
#[test] | ||
fn coalesces_for_aggregate_after_sorted_union_projection() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crepererum -- in follow up to our chat earlier today.
Some context behind this test case, vs the test in iox. And how the DF patch relates to the EAR.
- we had an EAR:
- I made an iox reproducer:
- https://github.com/influxdata/influxdb_iox/pull/13574
- this is the minimal influxQL that produces the error.
- using the iox reproducer, I then made the closest DF reproducers.
- using the iox influxQL query, I determined the plan changes for the EnforceDistribution and EnforceSorting passes.
- I made unit test cases in the DF optimizers, to reproduce these changes.
- EnforceDistribution:
- a CoalesceExec was added, but only for a very specific input plan.
- the test case
coalesces_for_aggregate_after_sorted_union_projection
shows the inserted CoalesceExec.
- EnforceSort:
- Coalesce is removed, leading to the same error as the EAR: f792cfa
- EnforceDistribution:
.
** NOTE THAT THE IOX REPRODUCER IS NOT THE SAME AS THE DATAFUSION REPRODUCER.**
- the unit tests for EnforceDistribution are outdated.
- The DataFusion reproducers are based on what plans were created from the InfluxQL, at that time (a.k.a. with that DF version).
- the iox reproducer should now be creating a different plan, which then gets changed in the DF optimizer passes.
- you can see that in how the EnforceDistribution unit test
coalesces_for_aggregate_after_sorted_union_projection
has changed: - it was inserting a coalesce: fix: Distribution error failing in the SanityCheck, for a specific influxql plan. #58 (comment)
- now it doesn't. (Your PR here shows that, plus this experiment: test: Demonstrate EnforceDistribution and EnforceSorting behavior on the latest DF branch. #60 (comment))
- you can see that in how the EnforceDistribution unit test
.
If the iox InfluxQL reproducer still requires the DF patch (adding if is_coalesce_partitions(&children[0].plan) && !is_aggregation(plan)
to the EnforceSorting), then we still have the bug. Either in the iox plans we produce, or in the DataFusion optimizers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also recommend removing the old unit tests for EnforceDistribution.
The goal of those unit tests were:
- (a) to demonstrate why the EAR error was inconsistent. It was inconsistent because it only occurred when our data layout created a very specific plan. (I mention this inconsistency in the EAR: https://github.com/influxdata/EAR/issues/5822#issuecomment-2616886813).
- (b) to make a DF reproducer with the minimal plan needed. (vs the iox reproducer which is the minimal InfluxQL needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this patched DF branch (with all green tests ❤️) .
I added a bit of history as an FYI only. No changes needed!
In your next DF upgrade branch (not this one), feel free to drop any outdated unit tests which no longer reflect the plans generated by the iox reproducer.
I like that the number of patches required in DF 47 is significantly smaller than it was in DF 46 |
no longer needed |
Tracking issue: https://github.com/influxdata/influxdb_iox/issues/14640
Patches
Patches map to commits 1:1 (i.e. every patch is exactly 1 commit) and are ordered for easier correlation of the description and the respective commits. They are also grouped in 3 stages.
A: Dummy
No actual patches, can be dropped at any point:
B: CI Fixes
Need to get CI up and running before picking any actual patches:
None!
All commits afterwards should build cleanly!
C: Patches
These are the actual relevant patches:
chore: default=true for skip_physical_aggregate_schema_check, and add warn logging
:until we chase down all warnings in our iox logs (see https://github.com/influxdata/influxdb_iox/issues/12404 )
(New) Test + workaround for SanityCheck plan
:according to this slack thread, we can drop this with DataFusion version 49.
chore: skip order calculation / exponential planning
:workaround for Exponential planning time (100s of seconds) with
UNION
andORDER BY
queries apache/datafusion#13748 -- which should be fixed in DataFusion version 49fix: temporary fix to handle incorrect coalesce (inserted during EnforceDistribution) which later causes an error during EnforceSort (without our patch). The next DataFusion version 46 upgrade does the proper fix, which is to not insert the coalesce in the first place.
:There is EAR-5822 (also see https://github.com/influxdata/influxdb_iox/issues/13310 ) despite what the note in Patched DataFusion version
45.0.0
#54 and ParallelizeSorts, a subrule of EnforceSorting optimizer, should not remove necessary coalesce. apache/datafusion#14691 (comment) say, this is still required for DF version 46. Otherwise the regression test fails. Also see this slack thread.fix: reserved keywords in qualified column names
:That's fix: reserved keywords in qualified column names apache/datafusion#16584 . Can be dropped with DF 49.