Skip to content

Config File Details

emehinovic72 edited this page Jun 1, 2021 · 8 revisions

Config File Details

  • genomesdb: This will be the pathway to the one txt document that holds the names all genomes that the user will use in the pipeline.

    • EX: ENSEMBL_AND_VGP_TOGETHER_FILE.txt
    • Pre-generated file with species names found within the GitHub. Variations of these files may be used, or one of the pre-generated files could be used as well.
  • query: A file that includes the directory and file name of your input file, this will be used as the query of the BLASTn. It is recommended that user puts file inside pre generated file named USERS query Files. If user chooses not use folder, user must provide full file path along with file name needs to be changed in config.json file. Users input file may not be a repeat sequence nor a file larger than 1MB.

    • Must be a FASTA file
  • dbs: The file path in which all '-unmasked.fa' and '.dna.toplevel.fa' files are located. File path should NOT end with '/'.

  • threshold: Default value 0.00001. User can input a value to create a maximum threshold for e-value. Recommended to be in decimal format.

  • queryLengthPer: Default value at 0.3 or 30%. Value must be in decimal format or value 1. User can input a query length minimum requirement to help eliminate small misalliance sequences produced by BLASTn that have met the required threshold. The pipeline will find both the query input sequence length and the subject sequence length. It will then apply a length minimum requirement, and strands that are smaller than the percent inputted of the query length will not be included in results. This can also be set to 1 to allow for all sequences found by BLASTn and meeting the e-value threshold requirement to be used.

Clone this wiki locally