ADATime Tutorial

Tutorial: How to use the Domain Adaptation Benchmarking Suite ADATime?

Our base evaluation framework is ADATime, which is a domain adaptation benchmarking framework for unsupervised domain adaptation, focussing on time-series data. It was developed by Ragab & Eldele et. al.. If you use ADATime please cite their paper. Even though ADATime has been developed for time-series data and unsupervised domain adaptation, there are many ways it can be applied to further data types and domain adaptation scenarios. Moreover, its modular design enables the community to contribute to it easily, by adding available models, algorithms and datasets. Figure 1. In ADATime you can benchmark different datasets, backbone networks and domain adaptation algorithms against each other.

Within ADATimes terminology, a dataset refers to a dataset, which consists of multiple domains. However the domains of one dataset, should have the same input channels and the same labels. A model consists of a Backbone Network and a classifier network. Both the Backbone and Classifier Network are adapted by the domain adaptation algorithm. The output of the Backbone Network is fed to the Classiifer Network, which is a final layer that outputs the classification of the network. The Classifier Network is specific to the dataset. The Backbone Network is viewed as a feature extractor. Finally ADATime has algorithms, which are performing the domain adaptation training strategy. These algorithms typically have a specific loss function. They may also add special layers during the training process.

ADATime does not only support evaluation, based on the average performance over all evaluated scenarios, but also hyper parameter tuning with WandB based on risk scores.

What is already provided in ADATime?

Datasets

Since ADATime was developed to benchmark domain adaptation algorithms for time-series classification, the already provided datasets are time-series datasets. There are four datasets on activity recognition (UCIHAR, HHAR, WISDM, FD) and one on Sleep stage classification (Sleep-EDF).
To avoid running out of space for the repository, the data is not published within this repository, it can be downloaded from an external provider. All pre-processed datasets compatible with ADATime will be linked here. Each of these datasets is partitioned into subdomains, in this specific case one domain contains the data of one subject. So the existing datasets enable you to benchmark in subject-to-subject adaptation scenarios. You can configure time-series specific pre-processing, for now only standard normalization is provided. Feel free to extend these options. Currently domain adaptation can only be benchmarked within subdomains of an original dataset.

Backbone Network

For Backbone Networks, ADATime already includes a CNN, a TCN and ResNet18. A user can easily add additional Backbone Networks, using Pythons Pytorch.

Algorithms

ADATime already includes discrepancy, generative and MMD unsupervised domain adaptation algorithms.

Discrepancy Based	Generative Adversial Bases	other
Domain Deep Confusion (DCC)	Domain-Adversial Training of Neural Network (DANN)	Time Series Domain Adaptation via Sparse Associative Structure Alignment (SASA)
Correlation Alignment via Deep Neural Networks (Deep-CORAL)	Conditional Adversial Domain Adaptation (CDAN)
Higher-order Moment Matching (HoMM)	DIRT-T
Minimum Discrepancy Estimation for Deep Domain Adaptation (MMDA)	Convolutional deep Domain Adaptation for Time Series data (CoDAT)S
Deep Subdomain Adaptation (DSAN)	Adversarial Spectral Kernel Matching (ADvSKM)

Evaluation

Evaluation in ADATime is done based on a number of runs for each scenario. One run consists of initialization based on the run_id, training a model on the source and target train data and then evaluating the trained model on the source and target test data. For each run, the accuracy, f1 score and AUROC are tracked.

Figure 2: ADATime Run

You may want to perform multiple runs for one scenario, as the random_seeding based on the run_id influences the training of the model. By checking various random seeds, you can get a more balanced performance evaluation. You also get the accuracy, f1 score and AUROC averaged over all runs and scenarios. To have a final metric to compare the performance of the model and domain adaptation algorithm. For each scenario, the model with the run with the highest source risk is deemed to be the best model. The metrics for this specific model are tracked separately during the testing phase.

Hyperparameter tuning is supported by ADATime, it is done based on risks. This is also done based on the chosen risk averaged over all performed runs for each scenario of tha sweep. You can find a more detailed description of the risks and a guide on when to use which here.

Figure 3: ADATime - Risk based Hyperparameter Tuning

Use Case 1: Benchmarking for Model and Algorithm Selection for specific Use Case

Datasets

Dataset Selection form already supported Datasets

To get a quick overview of the performance of the models and algorithms provided within ADATime you can benchmark them against each other based on the existing and prepared datasets. A maintained list of datasets already supported can be found here.

Download the corresponding datasets and unpack the folder into a folder “ADATIME_data” in the ADATime super-repository. Don’t rename the folder containing the train_x.pt and test_x.pt files. This name is used to reference all parameters specific to the dataset within ADATime.

alt text

Each number represents a subset of the dataset, within ADATime such a subset is considered a domain. Currently it is only possible to perform domain generalization between two domains from the same dataset. A combination of two sub datasets is called a “scenario”. Choose your desired scenarios and remember them. Proceed to prepare all datasets you want to evaluate in this way.

How to Prepare your own Dataset?

To add new dataset (e.g., NewData), it should be placed in a folder named: NewData in the datasets directory.

Since "NewData" has several domains, each domain should be split into train/test splits with naming style as "train_x.pt" and "test_x.pt".

The structure of data files should in dictionary form as follows: train.pt = {"samples": data, "labels: labels}, and similarly for test.pt.

A more detailed tutorial will follow.

Selection of Backbone Model, Algorithm and Scenarios?

Think about the backbone network and the algorithm you want to evaluate. You can choose them from this list. You can configure the dataset, the backbone network, discriminator network, training strategy and the algorithms.

What if I do not want to configure so many different things?

If you do not want to experiment with different settings here, you can also just configure the scenarios your network and algorithm combination is evaluated on. But be warned, that this will not necessarily be the ideal configuration and you could miss ideal performance. You can set the scenarios you want to evaluate under ./configs/data_model_configs.py. Configure the attributes of your chosen dataset and alter the scenarios-attribute.

Configurations specific to your Dataset of Choice

Let’s make sure to configure your backbone network and algorithm. We start with the model and scenarios for this open ./configs/data_model_configs.py Select the class corresponding to your dataset of choice and we will walk through the configuration together.

alt text

Dataset and Scenario Specific Configurations

The first few attributes correspond to dataset and scenario specific configurations. Update the scenarios list with the scenarios of your choice. The attributes class_names and sequence_len, num_classes do not need to be changed, they correspond to the datasets class-labels and the sequence length of the time series put in. shuffle, drop_last and normalize configure the datasets pre-processing. shuffle and drop_last correspond to parameters from Pytorchs DataLoader. If shuffle is set to True, the training data is re-shuffled at every training epoch. drop_last specifies whether the last column of the input data should be dropped, this should be true if the dataset contains the labels in that column. normalize configures whether the data should be normalized,via standard normalization applied on the training and testing datasets. These attributes are referenced in [./dataloader/dataloader.py] as dataset_configs.

alt text

The other attributes of the Dataset-Class specify the configurations of the model. They are referenced in [./models/model.py] as configs. Please check the documentation of the backbone network you chose, for attributes you can configure here. The discriminator, depends on the algorithm you chose. You can configure the parameters disc_hid_dim, hidden_sim and DSKN_dics_hid in the corresponding dataset class as well in [./configs/data_model_configs.py].

Configure Training and Algorithm specific Hyperparameters

If you want to, you can configure the training strategy and algorithm-specific parameters in [./configs/hparams.py]. Similar to [./configs/data_model_configs.py] these configurations are grouped by dataset. For each dataset there is a class named after the dataset.

Let’s first have a look at the more general training-specific attributes you can configure. This can be set in the dictionary train_param. You can configure the number of training epochs (num_epochs), the batch size (batch_size), the weight decay (weight_decay), step size (step_size) and learning rate decay (lr_decay). These parameters are shared among all algorithms. They are referred to in [./algorithms/algorithm.py] within the dictionary hparams. The learning rate is an algorithm specific attribute.

You can also specify algorithm-specific parameters. For these please check out the documentation of the algorithms.

Evaluation

Congratulations, you have completed all data preparation and configuration settings. Each model and algorithm combination is evaluated separately. So, if you want to compare different model and algorithm configurations against each other, you need to evaluate each combination separately and then make your own conclusions based on their performance.

There are two options for the evaluation. You can either train and evaluate a single model, with a specific set of parameters, or you can benchmark the model and algorithm using a WanDB Sweep. A WandB Sweep performs hyper parameter optimization and visualizes the search, giving you a better insight into the model (and algorithms) behaviour. WanDB is similar to FlowML.

Simple Evaluation - No WandB Sweep

Training and testing a configuration are two separate processes.
Open the command line in the ADATime folder.

Training

You first need to train your model on each scenario. Afterwards you can load the trained models from

python3 main.py 
--phase train
--save_dir experiments_log
--exp_name exp2 
--da_method DANN 
--data_path ../ADATime_data
--dataset HHAR 
--backbone CNN 
--num_runs 2
--device cuda

The backbone network (--backbone) will be trained using the specified algorithm (--da_method) on the dataset (--dataset) for each scenario specified in the datasets configs (./configs/data_model_configs.py). For each scenario there will be command line output, in which you can follow along the training progress in each epoch. The num_runs refers to the number of runs. A run refers to one machine learning experiment. Within this context more specifically the number of runs determines how often a model is trained and evaluated (once for each run_id) for each scenario. No hyperparameter tuning is performed here. For each scenario, the results of the training will be saved within one folder: ./expriments_logs/dataset/DATASET/ALGORITHM/SOURCE_to_TARGET_run_RUN_ID. The log-file marks the exact time the training was finished. The training also results in a checkpoint.pt-file, which contains the parameters of the trained model.

alt text

After training you can see details on performance of each run and scenario in results.csv and get the reported risk scores in risks.csv.

alt text

Testing

python3 main.py 
--phase test 
--save_dir experiments_log
--exp_name exp2 
--da_method DANN 
--data_path ../ADATime_data
--dataset HHAR 
--backbone CNN 
--num_runs 2
--device cuda

It is important to use the exact same configurations as for the training, otherwise the suite will not find the trained models.

alt text

The metrics here correspond to the best average accuracy, F1 score and area under the curve over all scenarios. Also the averaged results of the last run are displayed. The evaluation will als be documented in the files last_results.csv and best_results.csv.

alt text

Evaluation and Hyperparameter Tuning with WandB Sweep

Configuring a WandB Sweep

You can define the search space for hyper parameter tuning with WandB in ./configs/sweep_params.py
You can set the search space for general training parameters…

alt text

… and also algorithm specific search parameters. Which parameters you can set here of course depends on your chosen algorithm.

alt text

Start the Sweep

Create a project on WanDB in which you want to visualize your sweep. Remember the name. Open a command line in the ADATime folder:

python3 main_sweep.py 
--da_method DANN 
--data_path ../ADATime_data
--dataset HHAR 
--backbone CNN 
--num_runs 1
--num_sweeps 3 
--device cuda
--exp_name sweep_exp1
--sweep_project_wandb TEST 
--wandb_entity xxx
--hp_search_strategy random
--metric_to_minimize src_risk
--save_dir experiments/sweep_logs

If everything was successful, you should see the following in your command line…

alt text

… and an active run in your WandB project space.

alt text

After a successful sweep this awaits in your command line:

alt text

And you can inspect your run in WandB. alt text

Use Case 2: Benchmarking for Methods Development

How to add a new Model (Backbone/Classifier)?

As further explained in the introduction, for ADATime machine learning models consist of two parts, the Backbone Network and the Classifier Network(s).

Adding a model is straightforward, as long as you conform to the Backbone and Classifier Network structure. You can add your model to the models/models.py file. There are a few things to keep in mind when implementing a new Backbone Network:

Pytorch
The model should be implemented using Pytorch, as certain loss functions supported by ADATime expect their input to be a pytorch.nn.Module. So, if you want to reuse those, keep that in mind.
The model outputs, not the final classification, but input for the Classifier-model.

Config

It is good practice to not hard-code your models, but have them configurable. So, design their initialization function with a config-dictionary in mind. This will influence how re-usable and flexible your model is. Of course, all parameters that can be set via your config-dictionary are natural candidates for hyper-parameter tuning. -For inspiration on configuration-setting you can take a look at ./configs/data_model_config.py. -If you want your model to be usable out-of-the-box with the existing benchmarking datasets, you should provide some baseline configs here. -The name of the class refers to the corresponding dataset.

Finally you need to register your new model with the app. Add your model to the backbone_options in ``àpp/app.py```.

In order for the app to find your model, if you did not directly add it to the models/models.py-file, add a wrapper class there, which has the same name, as the name you registered in backbone_options.

Like in any other language that supports inheritance, you are of course more than welcome to inherit from other models

How to add an Algorithm?

The hardest part about adding a new algorithm is probably coming up with one 😉 Similar to adding a new model, you can add the implementation of your algorithm into ./algorithms/algorithms.py Your algorithm is a class and inherits from the superclass Algorithm. As such you need to implement the two (abstract) functions __init__(self, backbone, configs, hparams, device) and training_epoch(self, src_loader, trg_loader, avg_meter, epoch). __init__( self, backbone, configs, hparams, device), is the init function of your algorithm. Like any other __init__()-function it is called when your algorithm-class is constructed. You need to set the following attributes:

optimizer – select your preferred optimizer from the PyTorch environment
lr_scheduler – select your preferred learning rate scheluder from the PyTorch environment
hparams – you should set this to the hparams-parameter - hparams is a class that contains configurations for your algorithm. When running ADATime, the hparams parameter is set in ./configs/hparams.py In the config-file the hparams specific to your algorithm should be added to each dataset specific class, as a dictionary entry named after your algorithm in the self.alg_params attribute. Add this entry to each dataset, so your algorithm is compatible with all supported datasets. Similarly you may add an entry to the sweep_alg_hparams of ./config/sweep_params.py to support wandb-sweeps.
device – you should set this to the device-parameter, this is the device the algorithm will be performed on, e.g. on your cpu, or a cuda-device If your algorithm needs additional attributes, please add them here.

training_epoch(self, src_loader, trg_loader, avg_meter, epoch) is the function in which your domain adaptation algorithm is performed. So, implementing this is on you 😊

Many Domain Adaptation algorithms, like DANN or CORAL rely on special loss-functions that combine loss from the source and target domain. Please add any special loss-functions you require. However, please add them under ./models/loss.py. Like in any other language that supports inheritance, you are of course more than welcome to inherit from other domains.

Contact: ekaterina.kutafina at uni-koeln.de, mayra.elwes at uk-koeln.de

Algorithm2Domain

Home
Terminology
What can we benchmark?
Metrics
Benchmarking Suites

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ADATime Tutorial

Tutorial: How to use the Domain Adaptation Benchmarking Suite ADATime?

What is already provided in ADATime?

Datasets

Backbone Network

Algorithms

Evaluation

Use Case 1: Benchmarking for Model and Algorithm Selection for specific Use Case

Datasets

Dataset Selection form already supported Datasets

How to Prepare your own Dataset?

Selection of Backbone Model, Algorithm and Scenarios?

What if I do not want to configure so many different things?

Configurations specific to your Dataset of Choice

Dataset and Scenario Specific Configurations

Configure Training and Algorithm specific Hyperparameters

Evaluation

Simple Evaluation - No WandB Sweep

Training

Testing

Evaluation and Hyperparameter Tuning with WandB Sweep

Configuring a WandB Sweep

Start the Sweep

Use Case 2: Benchmarking for Methods Development

How to add a new Model (Backbone/Classifier)?

How to add an Algorithm?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Algorithm2Domain

Clone this wiki locally