Banzai Dada2 Pipeline

Bioinformatic pipeline for processing amplicon data using Dada2. Initially based on the Banzai pipeline https://github.com/jimmyodonnell/banzai, this shell script has been adapted at MBARI to use a different set of programs as well as a different taxonomic annotation strategy.

MBARI Authors and Contributors

Reiko Michisaki, Kathleen Pitz, Markus Min

Input Files:

Metadata File (.csv)
Parameter File
Fastq files of forward and reverse read amplicon sequence data, organized by sample in folders

Overview:

Primer sequences are removed from fastq files through atropos (https://github.com/jdidion/atropos)
Fastq files are then fed into the Dada2 program for quality filtering, denoising, chimera removal, read merging and ASV creation (https://benjjneb.github.io/dada2/)
Taxonomy is assigned through blastn searches to NCBI GenBank’s non-redundant nucleotide database (nt) and hits are filtered by MEGAN6 and custom scripts, as described below (https://github.com/husonlab/megan-ce)

Ouput Folder:

Analysis folder named by datetime of run which contains all intermediate and final products as well as copies of the initial input files used to intiate the run.

Detailed Description

The Banzai Dada2 Pipeline is a custom shell script adapted from the banzai pipeline and run on a per-plate basis ¹. Within the script, primer sequences are first removed from fastq files through the program Atropos ². Fastq files are then fed into the Dada2 program ³. Dada2 models error on a per-Illumina run basis, controlling for read quality and picking amplicon sequence variant (ASV) sequences that represent biological variability rather than sequencing error ³. Within Dada2 reads are trimmed to remove low-quality regions and filtered by quality score, sequencing errors are modeled and removed, and reads are then merged and chimeric sequences removed. Taxonomy is assigned to the resulting ASV sequences through blastn searches to NCBI GenBank’s non-redundant nucleotide database (nt) ^4,5. Blast results are filtered using MEGAN6’s lowest common ancestor (LCA) algorithm ⁶. Only hits with ≥80% sequence identity and whose bitscores were within the top 2% of the highest bitscore value for each OTU are considered by MEGAN6. The MEGAN6 parameter LCA percent is set to 0.8, allowing for up to 20% of top hits to be off target and still have the majority taxonomy assigned. This parameter value was chosen to allow for minor numbers of incorrectly annotated GenBank entries – effectively allowing for OTUs which had many high-quality hits to a taxa to still be assigned to that taxa even if there existed a high-bitscore hit to another GenBank sequence annotated to an unrelated taxa. We decided this was more advantageous than the disadvantage caused by ignoring small numbers of true closely related sequences. Furthermore, post-MEGAN6 filtering is performed to ensure only contigs with a hit of ≥97% sequence identity are annotated to the species level. Only contigs with a hit of ≥95% sequence identity are annotated to the genus level. Annotations are elevated to the next highest taxonomic level for contigs that fail these conditions.

O’Donnell JL, Kelly RP, Lowell NC, Port JA. Indexed PCR primers induce template- Specific bias in Large-Scale DNA sequencing studies. PLoS One. 2016;11(3):1–11.
Didion JP, Martin M, Collins FS. Atropos: Specific, sensitive, and speedy trimming of sequencing reads. PeerJ. 2017 Aug 30;2017(8):e3720.
Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods. 2016 Jun 29;13(7):581–3.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009 Dec 15;10(1):421.
Agarwala R, Barrett T, Beck J, Benson DA, Bollin C, Bolton E, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2018 Jan 1;46(D1):D8–13.
Huson DH, Beier S, Flade I, Górska A, El-Hadidi M, Mitra S, et al. MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data. PLoS Comput Biol. 2016;12(6):1–12.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Example_input_files		Example_input_files
Example_output_folder		Example_output_folder
Pipeline_scripts		Pipeline_scripts
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Banzai Dada2 Pipeline

MBARI Authors and Contributors

Input Files:

Overview:

Ouput Folder:

Detailed Description

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

MBARI-BOG/BOG-Banzai-Dada2-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Banzai Dada2 Pipeline

MBARI Authors and Contributors

Input Files:

Overview:

Ouput Folder:

Detailed Description

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages