Skip to content

Commit 6b70cac

Browse files
authored
doc: Add docs for RF binning (#1319)
1 parent c87da9b commit 6b70cac

File tree

1 file changed

+47
-4
lines changed

1 file changed

+47
-4
lines changed

doc/sources/guide/acceleration.rst

Lines changed: 47 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,58 @@ TSNE
2929
----
3030

3131
TSNE algorithm consists of two components: KNN and Gradient Descent.
32-
The overall accelration of TSNE depends on the acceleration of each of these algorithms.
32+
The overall acceleration of TSNE depends on the acceleration of each of these algorithms.
3333

3434
- The KNN part of the algorithm supports all parameters except:
35-
35+
3636
- ``metric`` != `'euclidean'` or `'minkowski'` with ``p`` != `2`
3737
- The Gradient Descent part of the algorithm supports all parameters except:
38-
38+
3939
- ``n_components`` = `3`
4040
- ``method`` = `'exact'`
4141
- ``verbose`` != `0`
4242

43-
To get better performance, use parameters supported by both components.
43+
To get better performance, use parameters supported by both components.
44+
45+
.. _acceleration_rf:
46+
47+
Random Forest
48+
-------------
49+
50+
Random Forest models accelerated with |intelex| and using the `hist` splitting
51+
method discretize training data by creating a histogram with a configurable
52+
number of bins. The following keyword arguments can be used to influence the
53+
created histogram.
54+
55+
.. list-table::
56+
:widths: 10 10 10 30
57+
:header-rows: 1
58+
:align: left
59+
60+
* - Keyword argument
61+
- Possible values
62+
- Default value
63+
- Description
64+
* - ``maxBins``
65+
- `[0, inf)`
66+
- ``256``
67+
- Number of bins in the histogram with the discretized training data. The
68+
value ``0`` disables data discretization.
69+
* - ``minBinSize``
70+
- `[1, inf)`
71+
- ``5``
72+
- Minimum number of training data points in each bin after discretization.
73+
* - ``binningStrategy``
74+
- ``quantiles, averages``
75+
- ``quantiles``
76+
- Selects the algorithm used to calculate bin edges. ``quantiles``
77+
results in bins with a similar amount of training data points. ``averages``
78+
divides the range of values observed in the training data set into
79+
equal-width bins of size `(max - min) / maxBins`.
80+
81+
Note that using discretized training data can greatly accelerate model training
82+
times, especially for larger data sets. However, due to the reduced fidelity of
83+
the data, the resulting model can present worse performance metrics compared to
84+
a model trained on the original data. In such cases, the number of bins can be
85+
increased with the ``maxBins`` parameter, or binning can be disabled entirely by
86+
setting ``maxBins=0``.

0 commit comments

Comments
 (0)