Skip to content

Commit 887e67a

Browse files
Supported Additional Training Dataset (#11)
* Update README.md * Update README_legacy.md * supported additional training dataset * Update README.md --------- Co-authored-by: Noboru Harada <64912994+noboru2000@users.noreply.github.com>
1 parent b4185b2 commit 887e67a

File tree

7 files changed

+150
-17
lines changed

7 files changed

+150
-17
lines changed

01_train_2024t2.sh

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,17 @@ then
3131
dataset_list="DCASE2024T2bearing DCASE2024T2fan DCASE2024T2gearbox DCASE2024T2slider DCASE2024T2ToyCar DCASE2024T2ToyTrain DCASE2024T2valve"
3232
elif [ "${dev_eval}" = "-e" ] || [ "${dev_eval}" = "--eval" ]
3333
then
34-
echo eval data has not been published yet.
35-
exit 1
34+
dataset_list="\
35+
DCASE2024T23DPrinter \
36+
DCASE2024T2AirCompressor \
37+
DCASE2024T2Scanner \
38+
DCASE2024T2ToyCircuit \
39+
DCASE2024T2HoveringDrone \
40+
DCASE2024T2HairDryer \
41+
DCASE2024T2ToothBrush \
42+
DCASE2024T2RoboticArm \
43+
DCASE2024T2BrushlessMotor \
44+
"
3645
fi
3746

3847
for dataset in $dataset_list; do

README.md

Lines changed: 61 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -19,21 +19,26 @@ This system consists of three main scripts (01_train.sh, 02a_test.sh, and 02b_te
1919
- Helper scripts for DCASE2024T2
2020
- data\_download\_2024dev.sh
2121
- "Development dataset":
22-
- This script downloads development data files and puts them into "data/dcase2024t2/dev\_data/raw/train/" and "data/dcase2024t2/dev\_data/raw/test/". **Newly added!!**
22+
- This script downloads development data files and puts them into "data/dcase2024t2/dev\_data/raw/train/" and "data/dcase2024t2/dev\_data/raw/test/".
23+
- data\_download\_2024add.sh **Newly added!!**
24+
- "Additional train dataset for Evaluation":
25+
- This script downloads Addition data files and puts them into "data/dcase2024t2/eval\_data/raw/train/". **Newly added!!**
2326

2427
- 01_train_2024t2.sh
2528
- "Development" mode:
26-
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2024t2/dev_data/raw/<machine_type>/train/<section_id>`. **Newly added!!**
29+
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2024t2/dev_data/raw/<machine_type>/train/<section_id>`.
30+
- "Evaluation" mode:
31+
- This script trains a model for each machine type for each section ID by using the directory `data/dcase2024t2/eval_data/raw/<machine_type>/train/<section_id>`. **Newly added!!**
2732

2833
- 02a_test_2024t2.sh (Use MSE as a score function for the Simple Autoencoder mode)
2934
- "Development" mode:
30-
- This script makes a CSV file for each section, including the anomaly scores for each WAV file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`. **Newly added!!**
35+
- This script makes a CSV file for each section, including the anomaly scores for each WAV file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`.
3136
- The CSV files will be stored in the directory `results/`.
3237
- It also makes a csv file including AUC, pAUC, precision, recall, and F1-score for each section.
3338

3439
- 02b_test_2024t2.sh (Use Mahalanobis distance as a score function for the Selective Mahalanobis mode)
3540
- "Development" mode:
36-
- This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`. **Newly added!!**
41+
- This script makes a CSV file for each section, including the anomaly scores for each wav file in the directories `data/dcase2024t2/dev_data/raw/<machine_type>/test/`.
3742
- The CSV files will be stored in the directory `results/`.
3843
- It also makes a csv file including AUC, pAUC, precision, recall, and F1-score for each section.
3944

@@ -55,8 +60,9 @@ We will launch the datasets in three stages. Therefore, please download the data
5560

5661
+ DCASE 2024 Challenge Task 2
5762
+ "Development Dataset" **New! (2024/04/01)**
58-
+ Download "dev\_data_<machine_type>.zip" from
59-
[https://zenodo.org/records/10902294](https://zenodo.org/records/10902294).
63+
+ Download "dev\_data_<machine_type>.zip" from [https://zenodo.org/records/10902294](https://zenodo.org/records/10902294).
64+
+ "Additional Training Dataset", i.e., the evaluation dataset for training **New! (2024/05/15)**
65+
+ Download "eval\_data_<machine_type>_train.zip" from [https://zenodo.org/records/11183284](https://zenodo.org/records/11183284).
6066

6167
+ For DCASE 2023 Challenge Task 2
6268
(C.f., for DCASE2023T2, see [README_legacy](README_legacy.md))
@@ -96,6 +102,16 @@ We will launch the datasets in three stages. Therefore, please download the data
96102
+ attributes\_00.csv (attributes CSV for section 00)
97103
+ gearbox/ (The other machine types have the same directory structure as fan.)
98104
+ data/dcase2024t2/eval\_data/raw/
105+
+ \<machine\_type0\_of\_additional\_dataset\>/
106+
+ train/ (after launch of the additional training dataset)
107+
+ section\_00\_source\_train\_normal\_0000\_.wav
108+
+ ...
109+
+ section\_00\_source\_train\_normal\_0989\_.wav
110+
+ section\_00\_target\_train\_normal\_0000\_.wav
111+
+ ...
112+
+ section\_00\_target\_train\_normal\_0009\_.wav
113+
+ attributes\_00.csv (attributes CSV for section 00)
114+
+ \<machine\_type1\_of\_additional\_dataset\> (The other machine types have the same directory structure as \<machine\_type0\_of\_additional\_dataset\>/.)
99115

100116
### 4. Change parameters
101117

@@ -242,7 +258,7 @@ The Legacy support scripts are similar to the main scripts. These are in `tools`
242258

243259
## Dependency
244260

245-
We developed and tested the source code on Ubuntu 18.04.6 LTS.
261+
We developed and tested the source code on Ubuntu 20.04.4 LTS.
246262

247263
### Software package
248264

@@ -264,12 +280,33 @@ We developed and tested the source code on Ubuntu 18.04.6 LTS.
264280
- fasteners == 0.18
265281

266282
## Change Log
283+
### [3.1.0](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.1.0)
284+
285+
#### Added
286+
287+
- Provides support for the additional training datasets to be used in DCASE2024T2.
288+
289+
### [3.0.2](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.0.2)
290+
291+
#### Added
292+
293+
- Added information about ground truth and citations for each year's task in README.md and README_legacy.md.
294+
295+
### [3.0.1](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.0.1)
296+
297+
#### Added
298+
299+
- Provides support for the legacy datasets used in DCASE2020, 2021, 2022, and 2023.
300+
301+
#### Fixed
302+
303+
- Fixed a typo in README.md in the previous release, v3.0.0.
267304

268305
### [3.0.0](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v3.0.0)
269306

270307
#### Added
271308

272-
- provides support for the datasets used in DCASE2024.
309+
- Provides support for the development datasets used in DCASE2024.
273310

274311
### [2.0.1](https://github.com/nttcslab/dcase2023_task2_baseline_ae/releases/tag/v2.0.1)
275312

@@ -282,7 +319,22 @@ We developed and tested the source code on Ubuntu 18.04.6 LTS.
282319

283320
#### Added
284321

285-
- provides support for the legacy datasets used in DCASE2020, 2021, 2022, and 2023.
322+
- Provides support for the legacy datasets used in DCASE2020, 2021, 2022, and 2023.
323+
324+
## Truth attribute of evaluation data
325+
326+
### Public ground truth
327+
328+
The following code was used to calculate the official score. Among these is evaluation datasets ground truth.
329+
330+
- [dcase2023_task2_evaluator](https://github.com/nttcslab/dcase2023_task2_evaluator)
331+
332+
### In this repository
333+
334+
This repository have evaluation data's ground truth csv. this csv is using to rename evaluation datasets.
335+
You can calculate AUC and other score if add ground truth to evaluation datasets file name. *Usually, rename function is executed along with [download script](#description) and [auto download function](#41-enable-auto-download-dataset).
336+
337+
- [DCASE2023 task2](datasets/eval_data_list_2023.csv)
286338

287339
## Truth attribute of evaluation data
288340

@@ -309,6 +361,3 @@ If you use this system, please cite all the following four papers:
309361
+ Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, Shoichiro Saito, "ToyADMOS2: Another Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection under Domain Shift Conditions," in Proc. DCASE 2022 Workshop, 2022. [URL](https://dcase.community/documents/workshop2021/proceedings/DCASE2021Workshop_Harada_6.pdf)
310362
+ Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, and Yohei Kawaguchi, "MIMII DG: sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task," in Proc. DCASE 2022 Workshop, 2022. [URL](https://dcase.community/documents/workshop2022/proceedings/DCASE2022Workshop_Dohi_62.pdf)
311363
+ Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi and Masahiro Yasuda, "First-Shot Anomaly Sound Detection for Machine Condition Monitoring: A Domain Generalization Baseline," 2023 31st European Signal Processing Conference (EUSIPCO), Helsinki, Finland, 2023, pp. 191-195, doi: 10.23919/EUSIPCO58844.2023.10289721. [URL](https://ieeexplore.ieee.org/document/10289721)
312-
313-
314-

data_download_2024add.sh

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
mkdir -p "data/dcase2023t2/eval_data/raw"
2+
3+
# download eval data
4+
cd "data/dcase2024t2/eval_data/raw"
5+
for machine_type in 3DPrinter AirCompressor Scanner ToyCircuit HoveringDrone HairDryer ToothBrush RoboticArm BrushlessMotor; do
6+
wget "https://zenodo.org/records/11183284/files/eval_data_${machine_type}_train.zip"
7+
unzip "eval_data_${machine_type}_train.zip"
8+
done
9+

datasets/datasets.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,15 @@ def __init__(self, args):
104104

105105
class Datasets:
106106
DatasetsDic = {
107+
'DCASE2024T23DPrinter':DCASE202XT2,
108+
'DCASE2024T2AirCompressor':DCASE202XT2,
109+
'DCASE2024T2Scanner':DCASE202XT2,
110+
'DCASE2024T2ToyCircuit':DCASE202XT2,
111+
'DCASE2024T2HoveringDrone':DCASE202XT2,
112+
'DCASE2024T2HairDryer':DCASE202XT2,
113+
'DCASE2024T2ToothBrush':DCASE202XT2,
114+
'DCASE2024T2RoboticArm':DCASE202XT2,
115+
'DCASE2024T2BrushlessMotor':DCASE202XT2,
107116
'DCASE2024T2ToyCar':DCASE202XT2,
108117
'DCASE2024T2ToyTrain':DCASE202XT2,
109118
'DCASE2024T2bearing':DCASE202XT2,

datasets/download_path_2024.yaml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,30 @@ DCASE2024T2:
2020
valve:
2121
dev:
2222
- https://zenodo.org/record/10902294/files/dev_valve.zip
23+
3DPrinter:
24+
eval:
25+
- https://zenodo.org/records/11183284/files/eval_data_3DPrinter_train.zip
26+
AirCompressor:
27+
eval:
28+
- https://zenodo.org/records/11183284/files/eval_data_AirCompressor_train.zip
29+
Scanner:
30+
eval:
31+
- https://zenodo.org/records/11183284/files/eval_data_Scanner_train.zip
32+
ToyCircuit:
33+
eval:
34+
- https://zenodo.org/records/11183284/files/eval_data_ToyCircuit_train.zip
35+
HoveringDrone:
36+
eval:
37+
- https://zenodo.org/records/11183284/files/eval_data_HoveringDrone_train.zip
38+
HairDryer:
39+
eval:
40+
- https://zenodo.org/records/11183284/files/eval_data_HairDryer_train.zip
41+
ToothBrush:
42+
eval:
43+
- https://zenodo.org/records/11183284/files/eval_data_ToothBrush_train.zip
44+
RoboticArm:
45+
eval:
46+
- https://zenodo.org/records/11183284/files/eval_data_RoboticArm_train.zip
47+
BrushlessMotor:
48+
eval:
49+
- https://zenodo.org/records/11183284/files/eval_data_BrushlessMotor_train.zip

datasets/loader_common.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -414,7 +414,7 @@ def download_raw_data(
414414
for split_data_path in split_data_path_list:
415415
shutil.copytree(split_data_path, test_data_path, dirs_exist_ok=True)
416416

417-
if data_type == "eval":
417+
if data_type == "eval" and dataset != "DCASE2024T2":
418418
rename_wav(
419419
dataset_parent_dir=root,
420420
dataset_type=dataset,
@@ -468,6 +468,7 @@ def is_enabled_pickle(pickle_path):
468468
"DCASE2023T2_dev":"datasets/machine_type_2023_dev.yaml",
469469
"DCASE2023T2_eval":"datasets/machine_type_2023_eval.yaml",
470470
"DCASE2024T2_dev":"datasets/machine_type_2024_dev.yaml",
471+
"DCASE2024T2_eval":"datasets/machine_type_2024_eval.yaml",
471472
}
472473

473474
def get_machine_type_dict(dataset_name, mode=True):
@@ -480,8 +481,7 @@ def get_machine_type_dict(dataset_name, mode=True):
480481
elif dataset_name == "DCASE2024T2" and mode:
481482
yaml_path = YAML_PATH["DCASE2024T2_dev"]
482483
elif dataset_name == "DCASE2024T2" and not mode:
483-
raise ValueError("DCASE2024T2 eval data has not been published yet.")
484-
# yaml_path = YAML_PATH["DCASE2024T2_eval"]
484+
yaml_path = YAML_PATH["DCASE2024T2_eval"]
485485
else:
486486
raise KeyError()
487487

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
DCASE2024T2:
2+
machine_type:
3+
3DPrinter:
4+
eval:
5+
- "00"
6+
AirCompressor:
7+
eval:
8+
- "00"
9+
Scanner:
10+
eval:
11+
- "00"
12+
ToyCircuit:
13+
eval:
14+
- "00"
15+
HoveringDrone:
16+
eval:
17+
- "00"
18+
HairDryer:
19+
eval:
20+
- "00"
21+
ToothBrush:
22+
eval:
23+
- "00"
24+
RoboticArm:
25+
eval:
26+
- "00"
27+
BrushlessMotor:
28+
eval:
29+
- "00"
30+
section_keyword: section

0 commit comments

Comments
 (0)