The omics technology revolution has generated massive volumes of biological data that require differential analysis for correct interpretation. These approaches necessitate the implementation of computational tools and methodologies to derive biological meaning from this type of data across various biological contexts.
This 32-hour intensive course offers practical and up-to-date training in data science applied to the analysis of omics data, such as metagenomics, transcriptomics, and proteomics.
The primary objective is to train researchers, bioinformaticians, and professionals in the biological and health sciences in the management of omics data analysis tools and methodologies to extract meaningful information. Through interactive lectures, practical exercises, and the use of real data, participants will develop skills to explore, visualize, and interpret omics data, in addition to applying biological network models to address questions concerning the data utilized.
The course is structured over four days, beginning with an introduction to the fundamentals of data science and the particular characteristics of omics data. Topics covered will include processing techniques, analysis of metagenomic, transcriptomic, and proteomic data, visualization, functional enrichment, and multi-omics integration. A specific module will be dedicated to the application, analysis, and visualization of biological networks derived from the utilized and analyzed data.
The course culminates with a module where participants will apply everything learned to a real-world case study, working with public data. The practical sessions will be conducted in Python, utilizing Jupyter Notebooks and other visualization tools.
Computational microbiology, networks, databases, Python, programming, data, pipelines, data science.
-
Empowering bioinformatics communities with Nextflow and nf-core Björn E Langer, Andreia Amaral, Marie-Odile Baudement, Franziska Bonath, Mathieu Charles, Praveen Krishna Chitneedi, Emily L Clark, Paolo Di Tommaso, Sarah Djebali, Philip A Ewels, Sonia Eynard, James A Fellows Yates, Daniel Fischer, Evan W Floden, Sylvain Foissac, Gisela Gabernet, Maxime U Garcia, Gareth Gillard, Manu Kumar Gundappa, Cervin Guyomar, Christopher Hakkaart, Friederike Hanssen, Peter W Harrison, Matthias Hörtenhuber, Cyril Kurylo, Christa Kühn, Sandrine Lagarrigue, Delphine Lallias, Daniel J Macqueen, Edmund Miller, Júlia Mir-Pedrol, Gabriel Costa Monteiro Moreira, Sven Nahnsen, Harshil Patel, Alexander Peltzer, Frederique Pitel, Yuliaxis Ramayo-Caldas, Marcel da Câmara Ribeiro-Dantas, Dominique Rocha, Mazdak Salavati, Alexey Sokolov, Jose Espinosa-Carrasco, Cedric Notredame, The Nf-Core Community resource
-
nf-core/taxprofiler Sofia Stamouli, Moritz E. Beber, Tanja Normark, Thomas A. Christensen II, Lili Andersson-Li, Maxime Borry, Mahwash Jamy, nf-core community, James A. Fellows Yates resource
-
nf-core/rnaseq Philip A. Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso, Sven Nahnsen resource
-
quantms: a cloud-based pipeline for quantitative proteomics enables the reanalysis of public proteomics data *Chengxin Dai, Julianus Pfeuffer, Hong Wang, Ping Zheng, Lukas Käll, Timo Sachsenberg, Vadim Demichev, Mingze Bai, Oliver Kohlbacher, Yasset Perez-Riverol * resource
-
A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches Ana R Baião, Zhaoxiang Cai, Rebecca C Poulos, Phillip J Robinson, Roger R Reddel, Qing Zhong, Susana Vinga, Emanuel Gonçalves
-
Scikit-Bio A community-driven Python library for bioinformatics, providing versatile data structures, algorithms and educational resources for Biology.
-
HMDB 5.0: the Human Metabolome Database for 2022 David S Wishart, AnChi Guo, Eponine Oler, Fei Wang, Afia Anjum, Harrison Peters, Raynard Dizon, Zinat Sayeeda, Siyang Tian, Brian L Lee, Mark Berjanskii, Robert Mah, Mai Yamamoto, Juan Jovel, Claudia Torres-Calzada, Mickel Hiebert-Giesbrecht, Vicki W Lui, Dorna Varshavi, Dorsa Varshavi, Dana Allen, David Arndt, Nitya Khetarpal, Aadhavya Sivakumaran, Karxena Harford, Selena Sanford, Kristen Yee, Xuan Cao, Zachary Budinski, Jaanus Liigand, Lun Zhang, Jiamin Zheng, Rupasri Mandal, Naama Karu, Maija Dambrova, Helgi B Schiöth, Russell Greiner, Vasuk Gautam resource
-
MicroPhenoDB Associates Metagenomic Data with Pathogenic Microbes, Microbial Core Genes, and Human Disease Phenotypes Guocai Yao, Wenliang Zhang, Minglei Yang, Huan Yang, Jianbo Wang, Haiyue Zhang, Lai Wei, Zhi Xie, Weizhong Li resource
-
The National Microbiome Data Collaborative: enabling microbiome science Elisha M Wood-Charlson, Anubhav, Deanna Auberry, Hannah Blanco, Mark I Borkum, Yuri E Corilo, Karen W Davenport, Shweta Deshpande, Ranjeet Devarakonda, Meghan Drake, William D Duncan, Mark C Flynn, David Hays, Bin Hu, Marcel Huntemann, Po-E Li, Mary Lipton, Chien-Chi Lo, David Millard, Kayd Miller, Paul D Piehowski, Samuel Purvine, T B K Reddy, Migun Shakya, Jagadish Chandrabose Sundaramurthi, Pajau Vangay, Yaxing Wei, Bruce E Wilson, Shane Canon, Patrick S G Chain, Kjiersten Fagnan, Stanton Martin, Lee Ann McCue, Christopher J Mungall, Nigel J Mouncey, Mary E Maxon, Emiley A Eloe-Fadrosh resource
- Basics:
- Data Science:
- Visualization:
-
- interactive python basics tutorial
-
Springboard - Data Analysis with Python, SQL, and R
- starts with - Solo Learn and Design of Computer Programs
-
- Python introduction with a focus on scientific computing
In this course we use Google Colab to execute notebooks. Notebooks are text files allowing the combination of Text, Code and the output of code. Colab offers an extended set of pre-installed tools. See the tutorial series.
Anaconda offers for your private computer an extended installations, including most tools you will ever need for Python.
Some of the slides and notebooks have been inspired or reused from the Data Science Platform at the Informatics Platform the Novo Nordisk Foundation Center for Biosustainability at the Technical University of Denmark. Other relevant courses can be found in the Biosustain GitHub (e.g., R viz, Python viz, Nextflow training, Proteomics, Transcriptomics, Metagenomics, Bash, ...).
Some notebooks have been inspired by the course Python Tsunami at the Center for Health Data Science (HeaDS) at the University of Copenhagen.