@@ -141,7 +141,13 @@ We have tried to resolve any conflicts in the *best* possible manner.
141
141
Each dataset consists of 200-1050 observations in 2 dimensions.
142
142
143
143
144
- 3 . [ ` other ` ] ( catalogue/other.md ) includes:
144
+ 3 . [ ` mnist ` ] ( catalogue/mnist.md ) -
145
+ LeCun's MNIST database of handwritten digits
146
+ and Zalando's Fashion-MNIST dataset.
147
+
148
+
149
+
150
+ 4 . [ ` other ` ] ( catalogue/other.md ) includes:
145
151
146
152
* ` hdbscan ` - a dataset used for demonstrating the outputs of the
147
153
[ Python implementation] ( https://github.com/scikit-learn-contrib/hdbscan )
@@ -172,7 +178,7 @@ We have tried to resolve any conflicts in the *best* possible manner.
172
178
(TODO: help needed).
173
179
174
180
175
- 4 . [ ` sipu ` ] ( catalogue/sipu.md ) -
181
+ 5 . [ ` sipu ` ] ( catalogue/sipu.md ) -
176
182
datasets available at the SIPU (Speech and Image Processing Unit,
177
183
School of Computing, University of Eastern Finland) website
178
184
@@ -190,7 +196,7 @@ We have tried to resolve any conflicts in the *best* possible manner.
190
196
We excluded the ` DIM ` -sets as they turn out to be too easy
191
197
for most algorithms.
192
198
193
- 5 . [ ` uci ` ] ( catalogue/uci.md ) -
199
+ 6 . [ ` uci ` ] ( catalogue/uci.md ) -
194
200
a selection of datasets available at the University of California, Irvine,
195
201
[ Machine Learning Repository] ( http://archive.ics.uci.edu/ml/ )
196
202
(Dua and Graff, 2019)
@@ -201,23 +207,23 @@ We have tried to resolve any conflicts in the *best* possible manner.
201
207
also listed in the SIPU repository.
202
208
Note that "the" Iris dataset is available elsewhere (see ` other ` ).
203
209
204
- 6 . [ ` wut ` ] ( catalogue/wut.md ) -
210
+ 7 . [ ` wut ` ] ( catalogue/wut.md ) -
205
211
authored by the fantastic students
206
212
of Marek Gagolewski's Python for Data Analysis course at
207
213
Warsaw University of Technology:
208
214
Przemysław Kosewski, Jędrzej Krauze, Eliza Kaczorek, Anna Gierlak,
209
215
Adam Wawrzyniak, Aleksander Truszczyński, Mateusz Kobyłka and Michał Maciąg.
210
216
211
217
212
- 7 . [ ` g2mg ` ] ( catalogue/g2mg.md ) -
218
+ 8 . [ ` g2mg ` ] ( catalogue/g2mg.md ) -
213
219
a modified version of ` G2 ` -sets from SIPU with variances
214
220
dependent on datasets' dimensionalities, i.e., s* np.sqrt(d/2),
215
221
which makes these problems more difficult.
216
222
217
223
Each dataset consists of 2048 observations belonging
218
224
to either of two Gaussian clusters in 1, 2, ..., 128 dimensions.
219
225
220
- 8 . [ ` h2mg ` ] ( catalogue/h2mg.md ) -
226
+ 9 . [ ` h2mg ` ] ( catalogue/h2mg.md ) -
221
227
two Gaussian-like hubs with spread dependent on datasets' dimensionalities
222
228
223
229
Each dataset consists of 2048 observations in 1, 2, ..., 128 dimensions.
@@ -231,85 +237,88 @@ We have tried to resolve any conflicts in the *best* possible manner.
231
237
## List of Datasets
232
238
233
239
234
- | | dataset | n| d|
235
- | :--| :----------------------| ------:| --:|
236
- | 1 | fcps/atom | 800| 3|
237
- | 2 | fcps/chainlink | 1000| 3|
238
- | 3 | fcps/engytime | 4096| 2|
239
- | 4 | fcps/hepta | 212| 3|
240
- | 5 | fcps/lsun | 400| 2|
241
- | 6 | fcps/target | 770| 2|
242
- | 7 | fcps/tetra | 400| 3|
243
- | 8 | fcps/twodiamonds | 800| 2|
244
- | 9 | fcps/wingnut | 1016| 2|
245
- | 10 | graves/dense | 200| 2|
246
- | 11 | graves/fuzzyx | 1000| 2|
247
- | 12 | graves/line | 250| 2|
248
- | 13 | graves/parabolic | 1000| 2|
249
- | 14 | graves/ring | 1000| 2|
250
- | 15 | graves/ring_noisy | 1050| 2|
251
- | 16 | graves/ring_outliers | 1030| 2|
252
- | 17 | graves/zigzag | 250| 2|
253
- | 18 | graves/zigzag_noisy | 300| 2|
254
- | 19 | graves/zigzag_outliers | 280| 2|
255
- | 20 | other/chameleon_t4_8k | 8000| 2|
256
- | 21 | other/chameleon_t5_8k | 8000| 2|
257
- | 22 | other/chameleon_t7_10k | 10000| 2|
258
- | 23 | other/chameleon_t8_8k | 8000| 2|
259
- | 24 | other/hdbscan | 2309| 2|
260
- | 25 | other/iris | 150| 4|
261
- | 26 | other/iris5 | 105| 4|
262
- | 27 | other/square | 1000| 2|
263
- | 28 | sipu/a1 | 3000| 2|
264
- | 29 | sipu/a2 | 5250| 2|
265
- | 30 | sipu/a3 | 7500| 2|
266
- | 31 | sipu/aggregation | 788| 2|
267
- | 32 | sipu/birch1 | 100000| 2|
268
- | 33 | sipu/birch2 | 100000| 2|
269
- | 34 | sipu/compound | 399| 2|
270
- | 35 | sipu/d31 | 3100| 2|
271
- | 36 | sipu/flame | 240| 2|
272
- | 37 | sipu/jain | 373| 2|
273
- | 38 | sipu/pathbased | 300| 2|
274
- | 39 | sipu/r15 | 600| 2|
275
- | 40 | sipu/s1 | 5000| 2|
276
- | 41 | sipu/s2 | 5000| 2|
277
- | 42 | sipu/s3 | 5000| 2|
278
- | 43 | sipu/s4 | 5000| 2|
279
- | 44 | sipu/spiral | 312| 2|
280
- | 45 | sipu/unbalance | 6500| 2|
281
- | 46 | sipu/worms_2 | 105600| 2|
282
- | 47 | sipu/worms_64 | 105000| 64|
283
- | 48 | uci/ecoli | 336| 7|
284
- | 49 | uci/glass | 214| 9|
285
- | 50 | uci/ionosphere | 351| 34|
286
- | 51 | uci/sonar | 208| 60|
287
- | 52 | uci/statlog | 2310| 19|
288
- | 53 | uci/wdbc | 569| 30|
289
- | 54 | uci/wine | 178| 13|
290
- | 55 | uci/yeast | 1484| 8|
291
- | 56 | wut/circles | 4000| 2|
292
- | 57 | wut/cross | 2000| 2|
293
- | 58 | wut/graph | 2500| 2|
294
- | 59 | wut/isolation | 9000| 2|
295
- | 60 | wut/labirynth | 3546| 2|
296
- | 61 | wut/mk1 | 300| 2|
297
- | 62 | wut/mk2 | 1000| 2|
298
- | 63 | wut/mk3 | 600| 3|
299
- | 64 | wut/mk4 | 1500| 3|
300
- | 65 | wut/olympic | 5000| 2|
301
- | 66 | wut/smile | 1000| 2|
302
- | 67 | wut/stripes | 5000| 2|
303
- | 68 | wut/trajectories | 10000| 2|
304
- | 69 | wut/trapped_lovers | 5000| 3|
305
- | 70 | wut/twosplashes | 400| 2|
306
- | 71 | wut/windows | 2977| 2|
307
- | 72 | wut/x1 | 120| 2|
308
- | 73 | wut/x2 | 120| 2|
309
- | 74 | wut/x3 | 185| 2|
310
- | 75 | wut/z1 | 192| 2|
311
- | 76 | wut/z2 | 900| 2|
312
- | 77 | wut/z3 | 1000| 2|
240
+ | | dataset | n| d|
241
+ | :--| :----------------------| ------:| ---:|
242
+ | 1 | fcps/atom | 800| 3|
243
+ | 2 | fcps/chainlink | 1000| 3|
244
+ | 3 | fcps/engytime | 4096| 2|
245
+ | 4 | fcps/hepta | 212| 3|
246
+ | 5 | fcps/lsun | 400| 2|
247
+ | 6 | fcps/target | 770| 2|
248
+ | 7 | fcps/tetra | 400| 3|
249
+ | 8 | fcps/twodiamonds | 800| 2|
250
+ | 9 | fcps/wingnut | 1016| 2|
251
+ | 10 | graves/dense | 200| 2|
252
+ | 11 | graves/fuzzyx | 1000| 2|
253
+ | 12 | graves/line | 250| 2|
254
+ | 13 | graves/parabolic | 1000| 2|
255
+ | 14 | graves/ring | 1000| 2|
256
+ | 15 | graves/ring_noisy | 1050| 2|
257
+ | 16 | graves/ring_outliers | 1030| 2|
258
+ | 17 | graves/zigzag | 250| 2|
259
+ | 18 | graves/zigzag_noisy | 300| 2|
260
+ | 19 | graves/zigzag_outliers | 280| 2|
261
+ | 20 | mnist/digits | 70000| 784|
262
+ | 21 | mnist/fashion | 70000| 784|
263
+ | 22 | other/chameleon_t4_8k | 8000| 2|
264
+ | 23 | other/chameleon_t5_8k | 8000| 2|
265
+ | 24 | other/chameleon_t7_10k | 10000| 2|
266
+ | 25 | other/chameleon_t8_8k | 8000| 2|
267
+ | 26 | other/hdbscan | 2309| 2|
268
+ | 27 | other/iris | 150| 4|
269
+ | 28 | other/iris5 | 105| 4|
270
+ | 29 | other/square | 1000| 2|
271
+ | 30 | sipu/a1 | 3000| 2|
272
+ | 31 | sipu/a2 | 5250| 2|
273
+ | 32 | sipu/a3 | 7500| 2|
274
+ | 33 | sipu/aggregation | 788| 2|
275
+ | 34 | sipu/birch1 | 100000| 2|
276
+ | 35 | sipu/birch2 | 100000| 2|
277
+ | 36 | sipu/compound | 399| 2|
278
+ | 37 | sipu/d31 | 3100| 2|
279
+ | 38 | sipu/flame | 240| 2|
280
+ | 39 | sipu/jain | 373| 2|
281
+ | 40 | sipu/pathbased | 300| 2|
282
+ | 41 | sipu/r15 | 600| 2|
283
+ | 42 | sipu/s1 | 5000| 2|
284
+ | 43 | sipu/s2 | 5000| 2|
285
+ | 44 | sipu/s3 | 5000| 2|
286
+ | 45 | sipu/s4 | 5000| 2|
287
+ | 46 | sipu/spiral | 312| 2|
288
+ | 47 | sipu/unbalance | 6500| 2|
289
+ | 48 | sipu/worms_2 | 105600| 2|
290
+ | 49 | sipu/worms_64 | 105000| 64|
291
+ | 50 | uci/ecoli | 336| 7|
292
+ | 51 | uci/glass | 214| 9|
293
+ | 52 | uci/ionosphere | 351| 34|
294
+ | 53 | uci/sonar | 208| 60|
295
+ | 54 | uci/statlog | 2310| 19|
296
+ | 55 | uci/wdbc | 569| 30|
297
+ | 56 | uci/wine | 178| 13|
298
+ | 57 | uci/yeast | 1484| 8|
299
+ | 58 | wut/circles | 4000| 2|
300
+ | 59 | wut/cross | 2000| 2|
301
+ | 60 | wut/graph | 2500| 2|
302
+ | 61 | wut/isolation | 9000| 2|
303
+ | 62 | wut/labirynth | 3546| 2|
304
+ | 63 | wut/mk1 | 300| 2|
305
+ | 64 | wut/mk2 | 1000| 2|
306
+ | 65 | wut/mk3 | 600| 3|
307
+ | 66 | wut/mk4 | 1500| 3|
308
+ | 67 | wut/olympic | 5000| 2|
309
+ | 68 | wut/smile | 1000| 2|
310
+ | 69 | wut/stripes | 5000| 2|
311
+ | 70 | wut/trajectories | 10000| 2|
312
+ | 71 | wut/trapped_lovers | 5000| 3|
313
+ | 72 | wut/twosplashes | 400| 2|
314
+ | 73 | wut/windows | 2977| 2|
315
+ | 74 | wut/x1 | 120| 2|
316
+ | 75 | wut/x2 | 120| 2|
317
+ | 76 | wut/x3 | 185| 2|
318
+ | 77 | wut/z1 | 192| 2|
319
+ | 78 | wut/z2 | 900| 2|
320
+ | 79 | wut/z3 | 1000| 2|
321
+
313
322
314
323
315
324
0 commit comments