Supported Additional Training Dataset for DCASE2025T2 (#22)

YuriMusashijima · noboru2000 · web-flow · commit da65073d3173 · 2025-05-15T19:49:09.000+09:00
* Updated DCASE2025 Task2

* Update README.md

Fixed some typos

* Previous DCASE scripts have been unified into legacy

* Added support for Additional training dataset

---------

Co-authored-by: Noboru Harada &lt;64912994+noboru2000@users.noreply.github.com&gt;
diff --git a/01_train_2025t2.sh b/01_train_2025t2.sh
@@ -39,8 +39,16 @@ then
     "
 elif [ "${dev_eval}" = "-e" ] || [ "${dev_eval}" = "--eval" ]
 then
-    echo dcase2025 task2 eval data are not publish
-    exit
+    dataset_list="\
+        DCASE2025T2ToyRCCar \
+        DCASE2025T2ToyPet \
+        DCASE2025T2HomeCamera \
+        DCASE2025T2AutoTrash \
+        DCASE2025T2Polisher \
+        DCASE2025T2ScrewFeeder \
+        DCASE2025T2BandSealer \
+        DCASE2025T2CoffeeGrinder \
+    "
 fi
 
 for dataset in $dataset_list; do
diff --git a/README.md b/README.md
@@ -17,23 +17,23 @@ Differences between the previous dcase2022\_baseline\_ae and this version are as
 This system consists of three main scripts (01_train.sh, 02a_test.sh, and 02b_test.sh) with some helper scripts for DCASE2025T2 (For DCASE2024T2 and DCASE2023T2, see [README_legacy](README_legacy.md)):
 
 - Helper scripts for DCASE2025T2
-  - data\_download\_2025dev.sh **Newly added!! (2025/04/01)**
+  - data\_download\_2025dev.sh **Updated on (2025/04/01)**
     - "Development dataset":
       - This script downloads development data files and puts them into `data/dcase2025t2/dev\_data/raw/train/` and `data/dcase2025t2/dev\_data/raw/test/`.
-  <!-- - data\_download\_2025add.sh
+  - data\_download\_2025add.sh **Newly added!! (2025/05/15)**
     - "Additional train dataset for Evaluation":
-      - This script downloads Addition data files and puts them into "data/dcase2025t2/eval\_data/raw/train/".
-  - data\_download\_2025eval.sh
+      - This script downloads Addition data files and puts them into `data/dcase2025t2/eval\_data/raw/train/`.
+  <!-- - data\_download\_2025eval.sh
     - "Additional test dataset for Evaluation"
       - This script downloads evaluation data files and puts them into "data/dcase2025t2/eval\_data/raw/test".  -->
 
-- 01_train_2025t2.sh **Newly added!! (2025/04/01)**
-  - "Development" mode:
+- 01_train_2025t2.sh
+  - "Development" mode: **Updated on (2025/04/01)**
     - This script trains a model for each machine type for each section ID by using the directory `data/dcase2025t2/dev_data/raw/<machine_type>/train/<section_id>`.
-  <!-- - "Evaluation" mode:
-    - This script trains a model for each machine type for each section ID by using the directory `data/dcase2025t2/eval_data/raw/<machine_type>/train/<section_id>`. -->
+  - "Evaluation" mode: **Newly added!! (2025/05/15)**
+    - This script trains a model for each machine type for each section ID by using the directory `data/dcase2025t2/eval_data/raw/<machine_type>/train/<section_id>`.
 
-- 02a_test_2025t2.sh (Use MSE as a score function for the Simple Autoencoder mode) **Newly added!! (2025/04/01)**
+- 02a_test_2025t2.sh (Use MSE as a score function for the Simple Autoencoder mode) **Updated on (2025/04/01)**
   - "Development" mode:
     - This script makes a CSV file for each section, including the anomaly scores for each WAV file in the directories `data/dcase2025t2/dev_data/raw/<machine_type>/test/`.
     - The CSV files will be stored in the directory `results/`.
@@ -42,7 +42,7 @@ This system consists of three main scripts (01_train.sh, 02a_test.sh, and 02b_te
     - This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2025t2/eval_data/raw/<machine_type>/test/`. (These directories will be made available with the "evaluation dataset".)
     - The CSV files are stored in the directory `results/`. -->
 
-- 02b_test_2025t2.sh (Use Mahalanobis distance as a score function for the Selective Mahalanobis mode) **Newly added!! (2025/04/01)**
+- 02b_test_2025t2.sh (Use Mahalanobis distance as a score function for the Selective Mahalanobis mode) **Updated on (2025/04/01)**
   - "Development" mode:
     - This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2025t2/dev_data/raw/<machine_type>/test/`.
     - The CSV files will be stored in the directory `results/`.
@@ -70,9 +70,9 @@ We will launch the datasets in three stages. Therefore, please download the data
   + DCASE 2025 Challenge Task 2
     + "Development Dataset" **New! (2025/04/01)**
       + Download "dev\_data_<machine_type>.zip" from [https://zenodo.org/records/15097779](https://zenodo.org/records/15097779).
-    <!-- + "Additional Training Dataset", i.e., the evaluation dataset for training 
-      + Download "eval\_data_<machine_type>_train.zip" from []().
-    + "Evaluation Dataset", i.e., the evaluation dataset for test
+    + "Additional Training Dataset", i.e., the evaluation dataset for training  **New! (2025/05/15)**
+      + Download "eval\_data_<machine_type>_train.zip" from [https://zenodo.org/records/15392814](https://zenodo.org/records/15392814).
+    <!-- + "Evaluation Dataset", i.e., the evaluation dataset for test
       + Download "eval\_data_<machine_type>_test.zip" from [](). -->
 
   + DCASE 2024 Challenge Task 2 (C.f., for DCASE2024T2, see [README_legacy](README_legacy.md))
@@ -121,32 +121,35 @@ We will launch the datasets in three stages. Therefore, please download the data
           + ...
           + section\_00\_target\_test\_anomaly\_0049\_.wav
         + attributes\_00.csv (attributes CSV for section 00)
-      + + \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.)
-   <!-- + data/dcase2025t2/eval\_data/raw/
+      + \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.)
+   + data/dcase2025t2/eval\_data/raw/
      + \<machine\_type0\_of\_additional\_dataset\>/
+        + supplemental/ (after launch of the additional training dataset)
+          + section\_00\_machine\_0001\_.wav
+          + ...
+          + section\_00\_machine\_0100\_.wav
         + train/ (after launch of the additional training dataset)
           + section\_00\_source\_train\_normal\_0000\_.wav
           + ...
           + section\_00\_source\_train\_normal\_0989\_.wav
           + section\_00\_target\_train\_normal\_0000\_.wav
           + ...
           + section\_00\_target\_train\_normal\_0009\_.wav
+        <!-- + test/ (after launch of the evaluation dataset)
+          + section\_00\_test\_0000.wav
+          + ...
+          + section\_00\_test\_0199.wav
+        + test_rename/ (convert from test directory using `tools/rename.py`)
+          + /section\_00\_source\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
+          + ...
+          + /section\_00\_source\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
+          + ...
+          + /section\_00\_target\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav 
+          + ...
+          + /section\_00\_target\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav 
+          + ... -->
         + attributes\_00.csv (attributes CSV for section 00)
-      + test/ (after launch of the evaluation dataset)
-        + section\_00\_test\_0000.wav
-        + ...
-        + section\_00\_test\_0199.wav
-      + /test_rename (convert from test directory using `tools/rename.py`)
-        + /section\_00\_source\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav
-        + ...
-        + /section\_00\_source\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav
-        + ...
-        + /section\_00\_target\_test\_normal\_\<0000\~0200\>\_\<attribute\>.wav 
-        + ...
-        + /section\_00\_target\_test\_anomaly\_\<0000\~0200\>\_\<attribute\>.wav 
-        + ...
-      + attributes\_00.csv (attributes CSV for section 00)
-     + \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.) -->
+     + \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.)
 
 ### 4. Change parameters
 
@@ -233,15 +236,15 @@ arithmetic mean,00,0.88,0.5078,0.5063157894736842,0.5536842105263158,0.492631578
 harmonic mean,00,0.88,0.5078,0.5063157894736842,0.5536842105263158,0.4926315789473684,0.0,0.0,0.0,0.0,0.0,0.
 ```
 
-### 8. Run training script for the additional training dataset (after May 15, 2025)
+### 8. Run training script for the additional training dataset **Newly added!! (2025/05/15)**
 
-<!-- After the additional training dataset is launched, download and unzip it. Move it to `data/dcase2025t2/eval_data/raw/<machine_type>/train/`. Run the training script `01_train_2025t2.sh` with the option `-e`.
+After the additional training dataset is launched, download and unzip it. Move it to `data/dcase2025t2/eval_data/raw/<machine_type>/train/`. Run the training script `01_train_2025t2.sh` with the option `-e`.
 
 ```dotnetcli
 $ 01_train_2025t2.sh -e
 ```
 
-Models are trained by using the additional training dataset `data/dcase2025t2/raw/eval_data/<machine_type>/train/`. -->
+Models are trained by using the additional training dataset `data/dcase2025t2/raw/eval_data/<machine_type>/train/`.
 
 ### 9. Run the test script for the evaluation dataset (after June 1, 2025)
 
@@ -271,17 +274,28 @@ If you use [rename script](./tools/rename_eval_wav.py) to generate `test_rename`
 
 ### 10. Summarize results
 
-After the executed `02a_test_2025t2.sh`, `02b_test_2025t2.sh`, or both. Run the summarize script `03_summarize_results.sh` with the option `DCASE2025T2 -d` or `DCASE2025T2 -e`.
+After the executed `02a_test_2025t2.sh`, `02b_test_2025t2.sh`, or both. Run the summarize script `03_summarize_results.sh` with the option `DCASE2025T2 -d`.
 
 ```dotnetcli
 # Summarize development dataset 2025
 $ 03_summarize_results.sh DCASE2025T2 -d
 ```
 
-After the summary, the results are exported in CSV format to `results/dev_data/baseline/summarize/DCASE2025T2` or `results/eval_data/baseline/summarize/DCASE2025T2`.
+After the summary, the results are exported in CSV format to `results/dev_data/baseline/summarize/DCASE2025T2`.
 
 If you want to change, summarize results directory or export directory, edit `03_summarize_results.sh`.
 
+<!-- After the executed `02a_test_2025t2.sh`, `02b_test_2025t2.sh`, or both. Run the summarize script `03_summarize_results.sh` with the option `DCASE2025T2 -d` or `DCASE2025T2 -e`.
+
+```dotnetcli
+# Summarize development dataset 2025
+$ 03_summarize_results.sh DCASE2025T2 -d
+```
+
+After the summary, the results are exported in CSV format to `results/dev_data/baseline/summarize/DCASE2025T2` or `results/eval_data/baseline/summarize/DCASE2025T2`.
+
+If you want to change, summarize results directory or export directory, edit `03_summarize_results.sh`. -->
+
 ## Legacy support
 
 This version takes the legacy datasets provided in DCASE2020 task2, DCASE2021 task2, DCASE2022 task2, DCASE2023 task2, and DCASE2024 task2 dataset for inputs.
@@ -314,6 +328,12 @@ We developed and tested the source code on Ubuntu 22.04.5 LTS.
 
 ## Change Log
 
+### [4.1.0](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v4.1.0)
+
+#### Added
+
+- Provides support for the additional training datasets to be used in DCASE2025T2.
+
 ### [4.0.1](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v4.0.1)
 
 #### Previous DCASE Task2 scripts have been moved into legacy
diff --git a/data_download_2025add.sh b/data_download_2025add.sh
@@ -0,0 +1,17 @@
+mkdir -p "data/dcase2025t2/eval_data/raw"
+
+# download dev data
+cd "data/dcase2025t2/eval_data/raw"
+for machine_type in \
+    "ToyRCCar" \
+    "ToyPet" \
+    "HomeCamera" \
+    "AutoTrash" \
+    "Polisher" \
+    "ScrewFeeder" \
+    "BandSealer" \
+    "CoffeeGrinder" \
+; do
+wget "https://zenodo.org/records/15392814/files/eval_data_${machine_type}_train.zip"
+unzip "eval_${machine_type}_train.zip"
+done
diff --git a/datasets/datasets.py b/datasets/datasets.py
@@ -104,6 +104,14 @@ def __init__(self, args):
 
 class Datasets:
     DatasetsDic = {
+        'DCASE2025T2ToyRCCar':DCASE202XT2,
+        'DCASE2025T2ToyPet':DCASE202XT2,
+        'DCASE2025T2HomeCamera':DCASE202XT2,
+        'DCASE2025T2AutoTrash':DCASE202XT2,
+        'DCASE2025T2Polisher':DCASE202XT2,
+        'DCASE2025T2ScrewFeeder':DCASE202XT2,
+        'DCASE2025T2BandSealer':DCASE202XT2,
+        'DCASE2025T2CoffeeGrinder':DCASE202XT2,
         'DCASE2025T2ToyCar':DCASE202XT2,
         'DCASE2025T2ToyTrain':DCASE202XT2,
         'DCASE2025T2bearing':DCASE202XT2,
diff --git a/datasets/download_path_2025.yaml b/datasets/download_path_2025.yaml
@@ -20,3 +20,27 @@ DCASE2025T2:
   valve:
     dev:
       - https://zenodo.org/records/15097779/files/dev_valve.zip
+  ToyRCCar:
+    eval:
+      - https://zenodo.org/records/15392814/files/eval_data_ToyRCCar_train.zip
+  ToyPet:
+    eval:
+      - https://zenodo.org/records/15392814/files/eval_data_ToyPet_train.zip
+  HomeCamera:
+    eval:
+      - https://zenodo.org/records/15392814/files/eval_data_HomeCamera_train.zip
+  AutoTrash:
+    eval:
+      - https://zenodo.org/records/15392814/files/eval_data_AutoTrash_train.zip
+  Polisher:
+    eval:
+      - https://zenodo.org/records/15392814/files/eval_data_Polisher_train.zip
+  ScrewFeeder:
+    eval:
+      - https://zenodo.org/records/15392814/files/eval_data_ScrewFeeder_train.zip
+  BandSealer:
+    eval:
+      - https://zenodo.org/records/15392814/files/eval_data_BandSealer_train.zip
+  CoffeeGrinder:
+    eval:
+      - https://zenodo.org/records/15392814/files/eval_data_CoffeeGrinder_train.zip
diff --git a/datasets/loader_common.py b/datasets/loader_common.py
@@ -473,6 +473,7 @@ def is_enabled_pickle(pickle_path):
     "DCASE2024T2_dev":"datasets/machine_type_2024_dev.yaml",
     "DCASE2024T2_eval":"datasets/machine_type_2024_eval.yaml",
     "DCASE2025T2_dev":"datasets/machine_type_2025_dev.yaml",
+    "DCASE2025T2_eval":"datasets/machine_type_2025_eval.yaml",
 }
 
 def get_machine_type_dict(dataset_name, mode=True):
@@ -488,6 +489,8 @@ def get_machine_type_dict(dataset_name, mode=True):
         yaml_path = YAML_PATH["DCASE2024T2_eval"]
     elif dataset_name == "DCASE2025T2" and mode:
         yaml_path = YAML_PATH["DCASE2025T2_dev"]
+    elif dataset_name == "DCASE2025T2" and not mode:
+        yaml_path = YAML_PATH["DCASE2025T2_eval"]
     else: 
         raise KeyError()
     
diff --git a/datasets/machine_type_2025_eval.yaml b/datasets/machine_type_2025_eval.yaml
@@ -0,0 +1,27 @@
+DCASE2025T2:
+  machine_type:
+    ToyRCCar:
+      eval:
+        - "00"
+    ToyPet:
+      eval:
+        - "00"
+    HomeCamera:
+      eval:
+        - "00"
+    AutoTrash:
+      eval:
+        - "00"
+    Polisher:
+      eval:
+        - "00"
+    ScrewFeeder:
+      eval:
+        - "00"
+    BandSealer:
+      eval:
+        - "00"
+    CoffeeGrinder:
+      eval:
+        - "00"
+  section_keyword: section