-
Notifications
You must be signed in to change notification settings - Fork 664
Open
Labels
P1Important tasks that we should complete soonImportant tasks that we should complete soonhybrid-executionnew feature/request 💬Requests and pull requests for new featuresRequests and pull requests for new features
Description
With AutoSwitchBackend
enabled, operations between multiple frames (merge, concat, binary operators) go through the QueryCompilerCalculator class to determine the backend in which to place the result. This process only considers backends among those of the provided arguments.
modin/modin/core/storage_formats/base/query_compiler_calculator.py
Lines 113 to 126 in 86107d4
for qc_from in self._qc_list: | |
# Add self cost for the current query compiler | |
if type(qc_from) not in qc_from_cls_costed: | |
self_cost = qc_from.stay_cost( | |
self._api_cls_name, self._op, self._operation_arguments | |
) | |
backend_from = qc_from.get_backend() | |
if self_cost is not None: | |
self._add_cost_data(backend_from, self_cost) | |
qc_from_cls_costed.add(type(qc_from)) | |
qc_to_cls_costed = set() | |
for qc_to in self._qc_list: |
This choice was deliberate, as initially we did not believe it would make sense to move data if all arguments were already resident in the same engine. However, testing has revealed two use cases where switching to a different backend may be profitable:
- The inputs are small enough to move between backends efficiently, but the result is not. This is most apparent in the pathological case of a cross join operation, where inputs with 1000 rows may result in an output of 1,000,000 that takes a long time to transfer between backends if a later operation would do so.
- A backend does not support a multi-argument method. Databases are (generally) bad at matrix multiplication, so code that attempts to multiply two dataframes as matrices should be strongly encouraged to move the data to a local execution environment before performing the computation.
Metadata
Metadata
Assignees
Labels
P1Important tasks that we should complete soonImportant tasks that we should complete soonhybrid-executionnew feature/request 💬Requests and pull requests for new featuresRequests and pull requests for new features