Skip to content

Commit 285987e

Browse files
committed
Update scheduling README
1 parent 16ab390 commit 285987e

File tree

1 file changed

+21
-16
lines changed
  • quickwit/quickwit-control-plane/src/indexing_scheduler/scheduling

1 file changed

+21
-16
lines changed

quickwit/quickwit-control-plane/src/indexing_scheduler/scheduling/README.md

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,10 @@
33
Quickwit needs to assign indexing tasks to a set of indexers nodes.
44
We call the result of this decision the indexing physical plan.
55

6-
This needs to be done under the constraints of:
7-
- not exceeding the maximum load of each node. (O)
8-
96
We also want to observe some interesting properties such as:
107
- (A) we want to avoid moving indexing tasks from one indexer to another one needlessly.
118
- (B) we want a source to be spread amongst as few nodes as possible
12-
- (C) we prefer to respect some margin on the capacity of all nodes.
9+
- (C) we want to balance the load between nodes as soon as the load is significative (>30%)
1310
- (D) when we are working with the Ingest API source, we prefer to colocate indexers on
1411
the ingesters holding the data.
1512

@@ -50,24 +47,29 @@ And indexer has:
5047
- a maximum total load (that we will need to measure or configure).
5148

5249
The problem is now greatly simplified.
53-
A solution is a sparse matrix of `(num_indexers, num_sources)` that holds a number of shards to be run.
50+
A solution is a sparse matrix of `(num_indexers, num_sources)` that holds a number of shards to be indexed.
5451
The different constraint and wanted properties can all be re-expressed. For instance:
55-
- We want the dot product of the load per shard vector with each row, to be lower than the maximum load
56-
of each node. (O)
52+
- We want the dot product of the load per shard vector with each row, to be close to the average load of each node (C)
5753
- We do not want a large distance between the two solution matrixes (A)
58-
- We want that matrix as sparse as possible (B).
54+
- We want that matrix as sparse as possible (B)
55+
56+
Note that the constraint (C) is enforced differently depending on the load:
57+
- shards can be placed freely on nodes up to 30% of their capacity
58+
- above this threshold, we try to assign shards to indexers so that the total load on each indexer is close to the average load
59+
60+
To express the affinity constraint (D) we could similarly define a matrix of `(num_indexers, num_sources)` with affinity scores and compute a distance with the solution matrix.
5961

60-
The actual cost function we would craft is however not linear. For instance, the benefit of keeping
61-
some free capacity for a given node is clearly not a linear function. In fact, keeping some imbalance
62-
could be a good thing.
62+
The actual cost function we would craft is however not linear, it is the combination of multiple distances like those discribed above.
6363

6464
# The heuristic
6565

6666
We use the following heuristic.
6767

68+
While assigning shards to node, we try to ensure that workloads are balanced (except for very small cluster loads). This is achieved by calculating a virtual capacity for each indexer. We calculate 120% of the total load on the entire cluster then divide it up proportionally between indexers according to their capacity. By respecting this virtual capacity when assigning shards to indexers, we make sure that all indexers have a load close to the average load.
69+
6870
## Phase 1: Remove extraneous shards
6971

70-
Starting from the existing solution, we first reduce it to make sure we do not have too many shards assigned.
72+
Starting from the existing solution, we first reduce it to make sure we do not have too many shards assigned. This happens when a source was scaled down or deleted.
7173
This is done by reducing the number of shard wherever needed, picking in priority nodes with few shards.
7274

7375
We call the resulting solution "reduced solution". The reduced solution is usually not a valid solution as some shard
@@ -78,18 +80,21 @@ previous solution.
7880

7981
## Phase 2: Enforce nodes maximum load
8082

81-
We then remove entire sources, in order to match the constraint (O).
83+
We then remove entire sources from nodes where the load is higher than the capcity (load <30%) or virtual capacity (load >30%).
8284
For every given node, we remove in priority sources that have an overall small load on the node.
8385

8486
Matrix-wise, note that phase 1 and phase 2 creates a matrix lower or equal to the previous solution.
8587

8688
## Phase 3: Greedy assignment
8789

88-
At this point we have reach a solution that fits on the cluster, but we possibly have several missing shards.
90+
At this point we have reached a solution that fits on the cluster, but we possibly has several missing shards.
8991
We therefore use a greedy algorithm to allocate these shard. We assign the shards source by source, in the order of decreasing total load.
90-
We assign the source to the node with largest remaining load capacity.
9192

92-
If this phase fails, it is ok to log an error, and stop assigning sources.
93+
We try assigning shards to indexers while trying to respect their virtual capacity. Because of the uneven size of shards and the greedy approach, this problem might not have a solution. In that case we iteratively grow the virtual capacity by 20% until the solution fits.
94+
95+
Shards for each source are placed in two steps:
96+
- in a first iteration we assign shards that have affinity scores (D)
97+
- in a second iteration we assign the rest of the shards starting with the node having the highest capacity
9398

9499
## Phase 4: Optimization
95100

0 commit comments

Comments
 (0)