Skip to content

Commit da65073

Browse files
Supported Additional Training Dataset for DCASE2025T2 (#22)
* Updated DCASE2025 Task2 * Update README.md Fixed some typos * Previous DCASE scripts have been unified into legacy * Added support for Additional training dataset --------- Co-authored-by: Noboru Harada <64912994+noboru2000@users.noreply.github.com>
1 parent 274ec7e commit da65073

File tree

7 files changed

+144
-37
lines changed

7 files changed

+144
-37
lines changed

01_train_2025t2.sh

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,16 @@ then
3939
"
4040
elif [ "${dev_eval}" = "-e" ] || [ "${dev_eval}" = "--eval" ]
4141
then
42-
echo dcase2025 task2 eval data are not publish
43-
exit
42+
dataset_list="\
43+
DCASE2025T2ToyRCCar \
44+
DCASE2025T2ToyPet \
45+
DCASE2025T2HomeCamera \
46+
DCASE2025T2AutoTrash \
47+
DCASE2025T2Polisher \
48+
DCASE2025T2ScrewFeeder \
49+
DCASE2025T2BandSealer \
50+
DCASE2025T2CoffeeGrinder \
51+
"
4452
fi
4553

4654
for dataset in $dataset_list; do

README.md

Lines changed: 55 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -17,23 +17,23 @@ Differences between the previous dcase2022\_baseline\_ae and this version are as
1717
This system consists of three main scripts (01_train.sh, 02a_test.sh, and 02b_test.sh) with some helper scripts for DCASE2025T2 (For DCASE2024T2 and DCASE2023T2, see [README_legacy](README_legacy.md)):
1818

1919
- Helper scripts for DCASE2025T2
20-
- data\_download\_2025dev.sh **Newly added!! (2025/04/01)**
20+
- data\_download\_2025dev.sh **Updated on (2025/04/01)**
2121
- "Development dataset":
2222
- This script downloads development data files and puts them into `data/dcase2025t2/dev\_data/raw/train/` and `data/dcase2025t2/dev\_data/raw/test/`.
23-
<!-- - data\_download\_2025add.sh
23+
- data\_download\_2025add.sh **Newly added!! (2025/05/15)**
2424
- "Additional train dataset for Evaluation":
25-
- This script downloads Addition data files and puts them into "data/dcase2025t2/eval\_data/raw/train/".
26-
- data\_download\_2025eval.sh
25+
- This script downloads Addition data files and puts them into `data/dcase2025t2/eval\_data/raw/train/`.
26+
<!-- - data\_download\_2025eval.sh
2727
- "Additional test dataset for Evaluation"
2828
- This script downloads evaluation data files and puts them into "data/dcase2025t2/eval\_data/raw/test". -->
2929

30-
- 01_train_2025t2.sh **Newly added!! (2025/04/01)**
31-
- "Development" mode:
30+
- 01_train_2025t2.sh
31+
- "Development" mode: **Updated on (2025/04/01)**
3232
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2025t2/dev_data/raw/<machine_type>/train/<section_id>`.
33-
<!-- - "Evaluation" mode:
34-
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2025t2/eval_data/raw/<machine_type>/train/<section_id>`. -->
33+
- "Evaluation" mode: **Newly added!! (2025/05/15)**
34+
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2025t2/eval_data/raw/<machine_type>/train/<section_id>`.
3535

36-
- 02a_test_2025t2.sh (Use MSE as a score function for the Simple Autoencoder mode) **Newly added!! (2025/04/01)**
36+
- 02a_test_2025t2.sh (Use MSE as a score function for the Simple Autoencoder mode) **Updated on (2025/04/01)**
3737
- "Development" mode:
3838
- This script makes a CSV file for each section, including the anomaly scores for each WAV file in the directories `data/dcase2025t2/dev_data/raw/<machine_type>/test/`.
3939
- The CSV files will be stored in the directory `results/`.
@@ -42,7 +42,7 @@ This system consists of three main scripts (01_train.sh, 02a_test.sh, and 02b_te
4242
- This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2025t2/eval_data/raw/<machine_type>/test/`. (These directories will be made available with the "evaluation dataset".)
4343
- The CSV files are stored in the directory `results/`. -->
4444

45-
- 02b_test_2025t2.sh (Use Mahalanobis distance as a score function for the Selective Mahalanobis mode) **Newly added!! (2025/04/01)**
45+
- 02b_test_2025t2.sh (Use Mahalanobis distance as a score function for the Selective Mahalanobis mode) **Updated on (2025/04/01)**
4646
- "Development" mode:
4747
- This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2025t2/dev_data/raw/<machine_type>/test/`.
4848
- The CSV files will be stored in the directory `results/`.
@@ -70,9 +70,9 @@ We will launch the datasets in three stages. Therefore, please download the data
7070
+ DCASE 2025 Challenge Task 2
7171
+ "Development Dataset" **New! (2025/04/01)**
7272
+ Download "dev\_data_<machine_type>.zip" from [https://zenodo.org/records/15097779](https://zenodo.org/records/15097779).
73-
<!-- + "Additional Training Dataset", i.e., the evaluation dataset for training
74-
+ Download "eval\_data_<machine_type>_train.zip" from []().
75-
+ "Evaluation Dataset", i.e., the evaluation dataset for test
73+
+ "Additional Training Dataset", i.e., the evaluation dataset for training **New! (2025/05/15)**
74+
+ Download "eval\_data_<machine_type>_train.zip" from [https://zenodo.org/records/15392814](https://zenodo.org/records/15392814).
75+
<!-- + "Evaluation Dataset", i.e., the evaluation dataset for test
7676
+ Download "eval\_data_<machine_type>_test.zip" from [](). -->
7777

7878
+ DCASE 2024 Challenge Task 2 (C.f., for DCASE2024T2, see [README_legacy](README_legacy.md))
@@ -121,32 +121,35 @@ We will launch the datasets in three stages. Therefore, please download the data
121121
+ ...
122122
+ section\_00\_target\_test\_anomaly\_0049\_.wav
123123
+ attributes\_00.csv (attributes CSV for section 00)
124-
+ + \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.)
125-
<!-- + data/dcase2025t2/eval\_data/raw/
124+
+ \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.)
125+
+ data/dcase2025t2/eval\_data/raw/
126126
+ \<machine\_type0\_of\_additional\_dataset\>/
127+
+ supplemental/ (after launch of the additional training dataset)
128+
+ section\_00\_machine\_0001\_.wav
129+
+ ...
130+
+ section\_00\_machine\_0100\_.wav
127131
+ train/ (after launch of the additional training dataset)
128132
+ section\_00\_source\_train\_normal\_0000\_.wav
129133
+ ...
130134
+ section\_00\_source\_train\_normal\_0989\_.wav
131135
+ section\_00\_target\_train\_normal\_0000\_.wav
132136
+ ...
133137
+ section\_00\_target\_train\_normal\_0009\_.wav
138+
<!-- + test/ (after launch of the evaluation dataset)
139+
+ section\_00\_test\_0000.wav
140+
+ ...
141+
+ section\_00\_test\_0199.wav
142+
+ test_rename/ (convert from test directory using `tools/rename.py`)
143+
+ /section\_00\_source\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
144+
+ ...
145+
+ /section\_00\_source\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
146+
+ ...
147+
+ /section\_00\_target\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
148+
+ ...
149+
+ /section\_00\_target\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
150+
+ ... -->
134151
+ attributes\_00.csv (attributes CSV for section 00)
135-
+ test/ (after launch of the evaluation dataset)
136-
+ section\_00\_test\_0000.wav
137-
+ ...
138-
+ section\_00\_test\_0199.wav
139-
+ /test_rename (convert from test directory using `tools/rename.py`)
140-
+ /section\_00\_source\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
141-
+ ...
142-
+ /section\_00\_source\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
143-
+ ...
144-
+ /section\_00\_target\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
145-
+ ...
146-
+ /section\_00\_target\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
147-
+ ...
148-
+ attributes\_00.csv (attributes CSV for section 00)
149-
+ \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.) -->
152+
+ \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.)
150153

151154
### 4. Change parameters
152155

@@ -233,15 +236,15 @@ arithmetic mean,00,0.88,0.5078,0.5063157894736842,0.5536842105263158,0.492631578
233236
harmonic mean,00,0.88,0.5078,0.5063157894736842,0.5536842105263158,0.4926315789473684,0.0,0.0,0.0,0.0,0.0,0.
234237
```
235238

236-
### 8. Run training script for the additional training dataset (after May 15, 2025)
239+
### 8. Run training script for the additional training dataset **Newly added!! (2025/05/15)**
237240

238-
<!-- After the additional training dataset is launched, download and unzip it. Move it to `data/dcase2025t2/eval_data/raw/<machine_type>/train/`. Run the training script `01_train_2025t2.sh` with the option `-e`.
241+
After the additional training dataset is launched, download and unzip it. Move it to `data/dcase2025t2/eval_data/raw/<machine_type>/train/`. Run the training script `01_train_2025t2.sh` with the option `-e`.
239242

240243
```dotnetcli
241244
$ 01_train_2025t2.sh -e
242245
```
243246

244-
Models are trained by using the additional training dataset `data/dcase2025t2/raw/eval_data/<machine_type>/train/`. -->
247+
Models are trained by using the additional training dataset `data/dcase2025t2/raw/eval_data/<machine_type>/train/`.
245248

246249
### 9. Run the test script for the evaluation dataset (after June 1, 2025)
247250

@@ -271,17 +274,28 @@ If you use [rename script](./tools/rename_eval_wav.py) to generate `test_rename`
271274

272275
### 10. Summarize results
273276

274-
After the executed `02a_test_2025t2.sh`, `02b_test_2025t2.sh`, or both. Run the summarize script `03_summarize_results.sh` with the option `DCASE2025T2 -d` or `DCASE2025T2 -e`.
277+
After the executed `02a_test_2025t2.sh`, `02b_test_2025t2.sh`, or both. Run the summarize script `03_summarize_results.sh` with the option `DCASE2025T2 -d`.
275278

276279
```dotnetcli
277280
# Summarize development dataset 2025
278281
$ 03_summarize_results.sh DCASE2025T2 -d
279282
```
280283

281-
After the summary, the results are exported in CSV format to `results/dev_data/baseline/summarize/DCASE2025T2` or `results/eval_data/baseline/summarize/DCASE2025T2`.
284+
After the summary, the results are exported in CSV format to `results/dev_data/baseline/summarize/DCASE2025T2`.
282285

283286
If you want to change, summarize results directory or export directory, edit `03_summarize_results.sh`.
284287

288+
<!-- After the executed `02a_test_2025t2.sh`, `02b_test_2025t2.sh`, or both. Run the summarize script `03_summarize_results.sh` with the option `DCASE2025T2 -d` or `DCASE2025T2 -e`.
289+
290+
```dotnetcli
291+
# Summarize development dataset 2025
292+
$ 03_summarize_results.sh DCASE2025T2 -d
293+
```
294+
295+
After the summary, the results are exported in CSV format to `results/dev_data/baseline/summarize/DCASE2025T2` or `results/eval_data/baseline/summarize/DCASE2025T2`.
296+
297+
If you want to change, summarize results directory or export directory, edit `03_summarize_results.sh`. -->
298+
285299
## Legacy support
286300

287301
This version takes the legacy datasets provided in DCASE2020 task2, DCASE2021 task2, DCASE2022 task2, DCASE2023 task2, and DCASE2024 task2 dataset for inputs.
@@ -314,6 +328,12 @@ We developed and tested the source code on Ubuntu 22.04.5 LTS.
314328

315329
## Change Log
316330

331+
### [4.1.0](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v4.1.0)
332+
333+
#### Added
334+
335+
- Provides support for the additional training datasets to be used in DCASE2025T2.
336+
317337
### [4.0.1](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v4.0.1)
318338

319339
#### Previous DCASE Task2 scripts have been moved into legacy

data_download_2025add.sh

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
mkdir -p "data/dcase2025t2/eval_data/raw"
2+
3+
# download dev data
4+
cd "data/dcase2025t2/eval_data/raw"
5+
for machine_type in \
6+
"ToyRCCar" \
7+
"ToyPet" \
8+
"HomeCamera" \
9+
"AutoTrash" \
10+
"Polisher" \
11+
"ScrewFeeder" \
12+
"BandSealer" \
13+
"CoffeeGrinder" \
14+
; do
15+
wget "https://zenodo.org/records/15392814/files/eval_data_${machine_type}_train.zip"
16+
unzip "eval_${machine_type}_train.zip"
17+
done

datasets/datasets.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,14 @@ def __init__(self, args):
104104

105105
class Datasets:
106106
DatasetsDic = {
107+
'DCASE2025T2ToyRCCar':DCASE202XT2,
108+
'DCASE2025T2ToyPet':DCASE202XT2,
109+
'DCASE2025T2HomeCamera':DCASE202XT2,
110+
'DCASE2025T2AutoTrash':DCASE202XT2,
111+
'DCASE2025T2Polisher':DCASE202XT2,
112+
'DCASE2025T2ScrewFeeder':DCASE202XT2,
113+
'DCASE2025T2BandSealer':DCASE202XT2,
114+
'DCASE2025T2CoffeeGrinder':DCASE202XT2,
107115
'DCASE2025T2ToyCar':DCASE202XT2,
108116
'DCASE2025T2ToyTrain':DCASE202XT2,
109117
'DCASE2025T2bearing':DCASE202XT2,

datasets/download_path_2025.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,27 @@ DCASE2025T2:
2020
valve:
2121
dev:
2222
- https://zenodo.org/records/15097779/files/dev_valve.zip
23+
ToyRCCar:
24+
eval:
25+
- https://zenodo.org/records/15392814/files/eval_data_ToyRCCar_train.zip
26+
ToyPet:
27+
eval:
28+
- https://zenodo.org/records/15392814/files/eval_data_ToyPet_train.zip
29+
HomeCamera:
30+
eval:
31+
- https://zenodo.org/records/15392814/files/eval_data_HomeCamera_train.zip
32+
AutoTrash:
33+
eval:
34+
- https://zenodo.org/records/15392814/files/eval_data_AutoTrash_train.zip
35+
Polisher:
36+
eval:
37+
- https://zenodo.org/records/15392814/files/eval_data_Polisher_train.zip
38+
ScrewFeeder:
39+
eval:
40+
- https://zenodo.org/records/15392814/files/eval_data_ScrewFeeder_train.zip
41+
BandSealer:
42+
eval:
43+
- https://zenodo.org/records/15392814/files/eval_data_BandSealer_train.zip
44+
CoffeeGrinder:
45+
eval:
46+
- https://zenodo.org/records/15392814/files/eval_data_CoffeeGrinder_train.zip

datasets/loader_common.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -473,6 +473,7 @@ def is_enabled_pickle(pickle_path):
473473
"DCASE2024T2_dev":"datasets/machine_type_2024_dev.yaml",
474474
"DCASE2024T2_eval":"datasets/machine_type_2024_eval.yaml",
475475
"DCASE2025T2_dev":"datasets/machine_type_2025_dev.yaml",
476+
"DCASE2025T2_eval":"datasets/machine_type_2025_eval.yaml",
476477
}
477478

478479
def get_machine_type_dict(dataset_name, mode=True):
@@ -488,6 +489,8 @@ def get_machine_type_dict(dataset_name, mode=True):
488489
yaml_path = YAML_PATH["DCASE2024T2_eval"]
489490
elif dataset_name == "DCASE2025T2" and mode:
490491
yaml_path = YAML_PATH["DCASE2025T2_dev"]
492+
elif dataset_name == "DCASE2025T2" and not mode:
493+
yaml_path = YAML_PATH["DCASE2025T2_eval"]
491494
else:
492495
raise KeyError()
493496

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
DCASE2025T2:
2+
machine_type:
3+
ToyRCCar:
4+
eval:
5+
- "00"
6+
ToyPet:
7+
eval:
8+
- "00"
9+
HomeCamera:
10+
eval:
11+
- "00"
12+
AutoTrash:
13+
eval:
14+
- "00"
15+
Polisher:
16+
eval:
17+
- "00"
18+
ScrewFeeder:
19+
eval:
20+
- "00"
21+
BandSealer:
22+
eval:
23+
- "00"
24+
CoffeeGrinder:
25+
eval:
26+
- "00"
27+
section_keyword: section

0 commit comments

Comments
 (0)