Skip to content

Commit e3cc62e

Browse files
Added Ground truth (#25)
* Updated DCASE2025 Task2 * Update README.md Fixed some typos * Previous DCASE scripts have been unified into legacy * Added support for Additional training dataset * Fixed the function to download datasets * Supported Evaluation dataset for DCASE2025T2 * Added Ground truth --------- Co-authored-by: Noboru Harada <64912994+noboru2000@users.noreply.github.com>
1 parent c8b122c commit e3cc62e

18 files changed

+10064
-21
lines changed

README.md

Lines changed: 38 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ This system consists of three main scripts (01_train.sh, 02a_test.sh, and 02b_te
2323
- data\_download\_2025add.sh **Updated on (2025/05/15)**
2424
- "Additional train dataset for Evaluation":
2525
- This script downloads Addition data files and puts them into `data/dcase2025t2/eval\_data/raw/train/`.
26-
- data\_download\_2025eval.sh **Newly added!! (2025/06/01)**
26+
- data\_download\_2025eval.sh **Updated on (2025/06/01)**
2727
- "Additional test dataset for Evaluation"
2828
- This script downloads evaluation data files and puts them into `data/dcase2025t2/eval\_data/raw/test`.
2929

@@ -38,7 +38,7 @@ This system consists of three main scripts (01_train.sh, 02a_test.sh, and 02b_te
3838
- This script makes a CSV file for each section, including the anomaly scores for each WAV file in the directories `data/dcase2025t2/dev_data/raw/<machine_type>/test/`.
3939
- The CSV files will be stored in the directory `results/`.
4040
- It also makes a csv file including AUC, pAUC, precision, recall, and F1-score for each section.
41-
- "Evaluation" mode: **Newly added!! (2025/06/01)**
41+
- "Evaluation" mode: **Updated on (2025/06/01)**
4242
- This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2025t2/eval_data/raw/<machine_type>/test/`. (These directories will be made available with the "evaluation dataset".)
4343
- The CSV files are stored in the directory `results/`.
4444

@@ -47,7 +47,7 @@ This system consists of three main scripts (01_train.sh, 02a_test.sh, and 02b_te
4747
- This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2025t2/dev_data/raw/<machine_type>/test/`.
4848
- The CSV files will be stored in the directory `results/`.
4949
- It also makes a csv file including AUC, pAUC, precision, recall, and F1-score for each section.
50-
- "Evaluation" mode: **Newly added!! (2025/06/01)**
50+
- "Evaluation" mode: **Updated on (2025/06/01)**
5151
- This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2025t2/eval_data/raw/<machine_type>/test/`. (These directories will be made available with the "evaluation dataset".)
5252
- The CSV files are stored in the directory `results/`.
5353

@@ -68,11 +68,11 @@ Clone this repository from GitHub.
6868
We will launch the datasets in three stages. Therefore, please download the datasets in each stage:
6969

7070
+ DCASE 2025 Challenge Task 2
71-
+ "Development Dataset" **New! (2025/04/01)**
71+
+ "Development Dataset" **Updated on (2025/04/01)**
7272
+ Download "dev\_data_<machine_type>.zip" from [https://zenodo.org/records/15097779](https://zenodo.org/records/15097779).
73-
+ "Additional Training Dataset", i.e., the evaluation dataset for training **New! (2025/05/15)**
73+
+ "Additional Training Dataset", i.e., the evaluation dataset for training **Updated on (2025/05/15)**
7474
+ Download "eval\_data_<machine_type>_train.zip" from [https://zenodo.org/records/15392814](https://zenodo.org/records/15392814).
75-
+ "Evaluation Dataset", i.e., the evaluation dataset for test **New! (2025/06/01)**
75+
+ "Evaluation Dataset", i.e., the evaluation dataset for test **Updated on (2025/06/01)**
7676
+ Download "eval\_data_<machine_type>_test.zip" from [https://zenodo.org/records/15519362](https://zenodo.org/records/15519362).
7777

7878
+ DCASE 2024 Challenge Task 2 (C.f., for DCASE2024T2, see [README_legacy](README_legacy.md))
@@ -139,15 +139,15 @@ We will launch the datasets in three stages. Therefore, please download the data
139139
+ section\_00\_test\_0000.wav
140140
+ ...
141141
+ section\_00\_test\_0199.wav
142-
<!-- + test_rename/ (convert from test directory using `tools/rename.py`)
142+
+ test_rename/ (convert from test directory using `tools/rename.py`)
143143
+ /section\_00\_source\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
144144
+ ...
145145
+ /section\_00\_source\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
146146
+ ...
147147
+ /section\_00\_target\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
148148
+ ...
149149
+ /section\_00\_target\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
150-
+ ... -->
150+
+ ...
151151
+ attributes\_00.csv (attributes CSV for section 00)
152152
+ \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.)
153153

@@ -246,7 +246,7 @@ $ 01_train_2025t2.sh -e
246246

247247
Models are trained by using the additional training dataset `data/dcase2025t2/raw/eval_data/<machine_type>/train/`.
248248

249-
### 9. Run the test script for the evaluation dataset **Newly added!! (2025/06/01)**
249+
### 9. Run the test script for the evaluation dataset **Updated on (2025/06/01)**
250250

251251
### 9.1. Testing with the Simple Autoencoder mode
252252

@@ -258,7 +258,7 @@ $ 02a_test_2025t2.sh -e
258258

259259
Anomaly scores are calculated using the evaluation dataset, i.e., `data/dcase2025t2/eval_data/raw/<machine_type>/test/`. The anomaly scores are stored as CSV files in the directory `results/`. You can submit the CSV files for the challenge. From the submitted CSV files, we will calculate AUC, pAUC, and your ranking.
260260

261-
<!-- If you use [rename script](./tools/rename_eval_wav.py) to generate `test_rename` directory, AUC and pAUC are also calculated. -->
261+
If you use [rename script](./tools/rename_eval_wav.py) to generate `test_rename` directory, AUC and pAUC are also calculated.
262262

263263
### 9.2. Testing with the Selective Mahalanobis mode
264264

@@ -270,7 +270,7 @@ $ 02b_test_2025t2.sh -e
270270

271271
Anomaly scores are calculated using the evaluation dataset, i.e., `data/dcase2025t2/eval_data/raw/<machine_type>/test/`. The anomaly scores are stored as CSV files in the directory `results/`. You can submit the CSV files for the challenge. From the submitted CSV files, we will calculate AUC, pAUC, and your ranking.
272272

273-
<!-- If you use [rename script](./tools/rename_eval_wav.py) to generate `test_rename` directory, AUC and pAUC are also calculated. -->
273+
If you use [rename script](./tools/rename_eval_wav.py) to generate `test_rename` directory, AUC and pAUC are also calculated.
274274

275275
### 10. Summarize results
276276

@@ -285,7 +285,7 @@ After the summary, the results are exported in CSV format to `results/dev_data/b
285285

286286
If you want to change, summarize results directory or export directory, edit `03_summarize_results.sh`.
287287

288-
<!-- After the executed `02a_test_2025t2.sh`, `02b_test_2025t2.sh`, or both. Run the summarize script `03_summarize_results.sh` with the option `DCASE2025T2 -d` or `DCASE2025T2 -e`.
288+
After the executed `02a_test_2025t2.sh`, `02b_test_2025t2.sh`, or both. Run the summarize script `03_summarize_results.sh` with the option `DCASE2025T2 -d` or `DCASE2025T2 -e`.
289289

290290
```dotnetcli
291291
# Summarize development dataset 2025
@@ -294,7 +294,7 @@ $ 03_summarize_results.sh DCASE2025T2 -d
294294

295295
After the summary, the results are exported in CSV format to `results/dev_data/baseline/summarize/DCASE2025T2` or `results/eval_data/baseline/summarize/DCASE2025T2`.
296296

297-
If you want to change, summarize results directory or export directory, edit `03_summarize_results.sh`. -->
297+
If you want to change, summarize results directory or export directory, edit `03_summarize_results.sh`.
298298

299299
## Legacy support
300300

@@ -328,6 +328,18 @@ We developed and tested the source code on Ubuntu 22.04.5 LTS.
328328

329329
## Change Log
330330

331+
### [4.3.0](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v4.3.0)
332+
333+
#### Added Ground truth
334+
335+
- Added a link to the DCASE2025 task2 evaluator that calculates the official score.
336+
- [dcase2025_task2_evaluator](https://github.com/nttcslab/dcase2025_task2_evaluator)
337+
- Added DCASE2025 Task2 Ground Truth data.
338+
- [DCASE2025 task2 Ground truth data](datasets/eval_data_list_2025.csv)
339+
- Added DCASE2025 Task2 Ground truth attributes.
340+
- [DCASE2025 task2 Ground truth Attributes](datasets/ground_truth_attributes/dcase2025t2)
341+
- The legacy script has been updated to be compatible with DCASE2025 Task2.
342+
331343
### [4.2.0](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v4.2.0)
332344

333345
#### Added
@@ -450,36 +462,42 @@ We developed and tested the source code on Ubuntu 22.04.5 LTS.
450462

451463
## Ground truth attribute
452464

453-
<!-- ### Public ground truth of evaluation dataset
465+
### Public ground truth of evaluation dataset
454466

455467
The following code was used to calculate the official score. Among these is evaluation datasets ground truth.
456468

457-
- [dcase2024_task2_evaluator](https://github.com/nttcslab/dcase2024_task2_evaluator)
469+
- [dcase2025_task2_evaluator](https://github.com/nttcslab/dcase2025_task2_evaluator)
458470

459471
### Ground truth for evaluation datasets in this repository
460472

461473
This repository have evaluation data's ground truth csv. this csv is using to rename evaluation datasets.
462474
You can calculate AUC and other score if add ground truth to evaluation datasets file name. *Usually, rename function is executed along with [download script](#description) and [auto download function](#41-enable-auto-download-dataset).
463475

464-
- [DCASE2024 task2 ground truth](datasets/eval_data_list_2024.csv) -->
476+
- [DCASE2025 task2 ground truth](datasets/eval_data_list_2025.csv)
465477

466478
### Ground truth attributes
467479

468-
<!-- Attribute information is hidden by default for the following machine types:
480+
Attribute information is hidden by default for the following machine types:
469481

470482
- dev data
471-
- gearbox
483+
- bearing
472484
- slider
485+
- ToyTrain
486+
- eval_data
487+
- AutoTrash
488+
- Polisher
489+
- ScrewFeeder
490+
- ToyPet
473491

474492
You can view the hidden attributes in the following directory:
475493

476-
- [DCASE2025 task2 Ground truth Attributes](datasets/ground_truth_attributes) -->
494+
- [DCASE2025 task2 Ground truth Attributes](datasets/ground_truth_attributes)
477495

478496
## Citation
479497

480498
If you use this system, please cite all the following four papers:
481499

482-
+ Tomoya Nishida, Noboru Harada, Daisuke Niizumi, Davide Albertini, Roberto Sannino, Simone Pradolini, Filippo Augusti, Keisuke Imoto, Kota Dohi, Harsh Purohit, Takashi Endo, and Yohei Kawaguchi. Description and discussion on DCASE 2024 challenge task 2: first-shot unsupervised anomalous sound detection for machine condition monitoring. In arXiv e-prints: 2406.07250, 2024. [URL](https://arxiv.org/pdf/2406.07250.pdf)
500+
+ Tomoya Nishida, Noboru Harada, Daisuke Niizumi, Davide Albertini, Roberto Sannino, Simone Pradolini, Filippo Augusti, Keisuke Imoto, Kota Dohi, Harsh Purohit, Takashi Endo, and Yohei Kawaguchi. Description and discussion on DCASE 2025 challenge task 2: first-shot unsupervised anomalous sound detection for machine condition monitoring. In arXiv e-prints: 2506.10097, 2025. [URL](https://arxiv.org/pdf/2506.10097.pdf)
483501
+ Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, and Shoichiro Saito. ToyADMOS2: another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions. In Proceedings of the Detection and Classification of Acoustic Scenes and Events Workshop (DCASE), 1–5. Barcelona, Spain, November 2021. [URL](https://dcase.community/documents/workshop2021/proceedings/DCASE2021Workshop_Harada_6.pdf)
484502
+ Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, and Yohei Kawaguchi. MIMII DG: sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task. In Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022). Nancy, France, November 2022. [URL](https://dcase.community/documents/workshop2022/proceedings/DCASE2022Workshop_Dohi_62.pdf)
485503
+ Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, and Masahiro Yasuda. First-shot anomaly detection for machine condition monitoring: a domain generalization baseline. Proceedings of 31st European Signal Processing Conference (EUSIPCO), pages 191–195, 2023. [URL](https://eurasip.org/Proceedings/Eusipco/Eusipco2023/pdfs/0000191.pdf)

data_download_2025eval.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,3 +15,6 @@ for machine_type in \
1515
wget "https://zenodo.org/records/15519362/files/eval_data_${machine_type}_test.zip"
1616
unzip "eval_data_${machine_type}_test.zip"
1717
done
18+
19+
# Adds reference labels to test data.
20+
python tools/rename_eval_wav.py --dataset_parent_dir=data --dataset_type=DCASE2025T2

0 commit comments

Comments
 (0)