-
Notifications
You must be signed in to change notification settings - Fork 22
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Detected by @PicoCentauri
Problem
import numpy as np
from skmatter.feature_selection import FPS
np.random.seed(0)
n_samples = 10
n_features = 15
X = np.random.rand(n_samples , n_features )
X[:, 3] = np.random.rand(10) * 1e-13
X[:, 4] = np.random.rand(10) * 1e-13
selector_problem = FPS(n_to_select=len(X.T)).fit(X)
print(selector_problem.selected_idx_)
print(selector_problem.get_select_distance())
print()
# this selector does not have the problem because we stop before the score threshold
selector = FPS(n_to_select=len(X.T), score_threshold=1e-9).fit(X)
print(selector.selected_idx_)
print(selector.get_select_distance())
Out:
[ 0 8 3 6 14 2 13 9 7 11 1 10 12 5 8]
[ inf 1.77635684e-15 2.16390745e+00 1.62400552e+00
1.43445978e+00 1.23482177e+00 1.03370164e+00 9.21863706e-01
7.95155761e-01 7.87817521e-01 7.37837489e-01 6.52674372e-01
6.11845170e-01 5.65607255e-01 1.77635684e-15]
/home/alexgo/code/scikit-matter/src/skmatter/_selection.py:210: UserWarning: Score threshold of 1e-09 reached.Terminating search at 14 / 15.
warnings.warn(
[ 0 8 3 6 14 2 13 9 7 11 1 10 12]
[ inf 2.75832232 2.16390745 1.62400552 1.43445978 1.23482177
1.03370164 0.92186371 0.79515576 0.78781752 0.73783749 0.65267437
0.61184517]
You can see in the first selector that 8 is reselected and sets the wrong score. This is because we do not filter for not selected points in the GreedySelector base class when choosing the next point.
scikit-matter/src/skmatter/_selection.py
Line 371 in d56ccbd
max_score_idx = np.argmax(scores) |
So when the scores are all (numerical) zero, then points that have been already selected can be reselected.
Solution
One could add selected_idx_
to the GreedySelector base class and change
the argmax in the function above that it only considers the not selected indices.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working