Skip to content

ncbiSraAccessions occasionally contain sample accessions (ERS) instead of the usual run accessions (SRR/ERR/DRR) #516

@corneliusroemer

Description

@corneliusroemer

Working on Pathoplexus input validation, I discovered an inconsistency in the types of SRA accessions returned by the ncbiSraAccessions field via the NCBI datasets CLI.

Most records return SRA Run accessions (SRR/ERR/DRR), but 5 sequences return SRA Sample accessions instead (ERS).

Mpox records returning SRA Sample accessions (ERS*):
* OP696838.1: ERS13580841, ERS13580842
* OP696840.1: ERS13580844, ERS13580845, ERS13580846
* OP696841.1: ERS13580847, ERS13580848, ERS13580849
* OP696839.1: ERS13580843
* OP696842.1: ERS13580850, ERS13580851, ERS13580852, ERS13580853

These are the only mpox sequences with ERS, there are around 1200 mpox sequences that have ERR/SRR/DRR accessions.

(see e.g https://pathoplexus.org/mpox/search?column_insdcRawReadsAccession=true&orderBy=insdcRawReadsAccession&page=11&order=descending)

  • Is this inconsistency known and intended? Should the ncbiSraAccessions field sometimes return Sample accessions and other times Run accessions?
  • Could the field behavior be standardized to always return Run accessions, or alternatively, could the field name reflect when Sample vs Run accessions are provided?

Thank you for your help and work!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions