doc: Add docs for RF binning (#1319)

ahuber21 · web-flow · commit 6b70cac4e8fe · 2023-06-06T11:42:22.000+01:00
diff --git a/doc/sources/guide/acceleration.rst b/doc/sources/guide/acceleration.rst
@@ -29,15 +29,58 @@ TSNE
 ----
 
 TSNE algorithm consists of two components: KNN and Gradient Descent.
-The overall accelration of TSNE depends on the acceleration of each of these algorithms.
+The overall acceleration of TSNE depends on the acceleration of each of these algorithms.
 
 - The KNN part of the algorithm supports all parameters except:
- 
+
   - ``metric`` != `'euclidean'` or `'minkowski'` with ``p`` != `2`
 - The Gradient Descent part of the algorithm supports all parameters except:
- 
+
   - ``n_components`` = `3`
   - ``method`` = `'exact'`
   - ``verbose`` != `0`
 
-To get better performance, use parameters supported by both components.
+To get better performance, use parameters supported by both components.
+
+.. _acceleration_rf:
+
+Random Forest
+-------------
+
+Random Forest models accelerated with |intelex| and using the `hist` splitting
+method discretize training data by creating a histogram with a configurable
+number of bins. The following keyword arguments can be used to influence the
+created histogram.
+
+.. list-table::
+   :widths: 10 10 10 30
+   :header-rows: 1
+   :align: left
+
+   * - Keyword argument
+     - Possible values
+     - Default value
+     - Description
+   * - ``maxBins``
+     - `[0, inf)`
+     - ``256``
+     - Number of bins in the histogram with the discretized training data. The
+       value ``0`` disables data discretization.
+   * - ``minBinSize``
+     - `[1, inf)`
+     - ``5``
+     - Minimum number of training data points in each bin after discretization.
+   * - ``binningStrategy``
+     - ``quantiles, averages``
+     - ``quantiles``
+     - Selects the algorithm used to calculate bin edges. ``quantiles``
+       results in bins with a similar amount of training data points. ``averages``
+       divides the range of values observed in the training data set into
+       equal-width bins of size `(max - min) / maxBins`.
+
+Note that using discretized training data can greatly accelerate model training
+times, especially for larger data sets. However, due to the reduced fidelity of
+the data, the resulting model can present worse performance metrics compared to
+a model trained on the original data. In such cases, the number of bins can be
+increased with the ``maxBins`` parameter, or binning can be disabled entirely by
+setting ``maxBins=0``.