Benchmarking Suites

We aim to collect a curated set of benchmarking suites that measure the performance of different methods in different out-of-distribution scenarios. Those different scenarios can be from the domain adaptation domain, the few-shot-learning domain or from zero-shot learning/domain generalization domain. If you feel the benchmarkins suite/evaluation framework you are developing could fit, please feel free to contact us or submit a pull request with a link to your software to this wiki page. After discussion we will be happy to share your contribution on this platform.

TBD Definition of Benchmarking Suite

From our point of view a benchmarking suite conists of different

datasets
machine learning models
possibly adaptation mechansisms, to account for out-of-distribution data
metrics, that measure the out-of-distribution performance

List of Recommended Becnhmarking Suites

ADATime

ADATIME is a benchmarking suite designed to evaluate unsupervised domain adaptation (UDA) methods on time series data systematically and fairly. It standardizes neural network architectures and datasets to ensure consistent evaluations across various UDA techniques. The suite includes implementations of 11 state-of-the-art UDA algorithms and introduces realistic model selection approaches that do not rely on labeled target data. Evaluations span five representative datasets, covering 50 cross-domain scenarios. Findings indicate that, with appropriate hyper-parameter tuning, visual domain adaptation methods can perform comparably to those specifically designed for time series data. ADATIME is implemented in PyTorch and is publicly available, providing a valuable resource for advancing domain adaptation in time series analysis.
- Tutorial: How to use the Domain Adaptation Benchmarking Suite ADATime?

Guidelines on Out-of-Distribution Benchmarking

TBD. This will contain a list of curated guidelines for the developement of evaluation frameworks, with a focus on evaluating performance for Out-of-Distribution settings.

Contact: ekaterina.kutafina at uni-koeln.de, mayra.elwes at uk-koeln.de

Algorithm2Domain

Home
Terminology
What can we benchmark?
Metrics
Benchmarking Suites

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmarking Suites

List of Recommended Becnhmarking Suites

Guidelines on Out-of-Distribution Benchmarking

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Algorithm2Domain

Clone this wiki locally