Skip to content

FEAT: Consider automatic switching for backends other than those of the provided arguments #7675

@sfc-gh-joshi

Description

@sfc-gh-joshi

With AutoSwitchBackend enabled, operations between multiple frames (merge, concat, binary operators) go through the QueryCompilerCalculator class to determine the backend in which to place the result. This process only considers backends among those of the provided arguments.

for qc_from in self._qc_list:
# Add self cost for the current query compiler
if type(qc_from) not in qc_from_cls_costed:
self_cost = qc_from.stay_cost(
self._api_cls_name, self._op, self._operation_arguments
)
backend_from = qc_from.get_backend()
if self_cost is not None:
self._add_cost_data(backend_from, self_cost)
qc_from_cls_costed.add(type(qc_from))
qc_to_cls_costed = set()
for qc_to in self._qc_list:

This choice was deliberate, as initially we did not believe it would make sense to move data if all arguments were already resident in the same engine. However, testing has revealed two use cases where switching to a different backend may be profitable:

  1. The inputs are small enough to move between backends efficiently, but the result is not. This is most apparent in the pathological case of a cross join operation, where inputs with 1000 rows may result in an output of 1,000,000 that takes a long time to transfer between backends if a later operation would do so.
  2. A backend does not support a multi-argument method. Databases are (generally) bad at matrix multiplication, so code that attempts to multiply two dataframes as matrices should be strongly encouraged to move the data to a local execution environment before performing the computation.

Metadata

Metadata

Assignees

Labels

P1Important tasks that we should complete soonhybrid-executionnew feature/request 💬Requests and pull requests for new features

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions