Working on Pathoplexus input validation, I discovered an inconsistency in the types of SRA accessions returned by the ncbiSraAccessions field via the NCBI datasets CLI.
Most records return SRA Run accessions (SRR/ERR/DRR), but 5 sequences return SRA Sample accessions instead (ERS).
Mpox records returning SRA Sample accessions (ERS*):
* OP696838.1: ERS13580841, ERS13580842
* OP696840.1: ERS13580844, ERS13580845, ERS13580846
* OP696841.1: ERS13580847, ERS13580848, ERS13580849
* OP696839.1: ERS13580843
* OP696842.1: ERS13580850, ERS13580851, ERS13580852, ERS13580853
These are the only mpox sequences with ERS, there are around 1200 mpox sequences that have ERR/SRR/DRR accessions.
(see e.g https://pathoplexus.org/mpox/search?column_insdcRawReadsAccession=true&orderBy=insdcRawReadsAccession&page=11&order=descending)
- Is this inconsistency known and intended? Should the
ncbiSraAccessions field sometimes return Sample accessions and other times Run accessions?
- Could the field behavior be standardized to always return Run accessions, or alternatively, could the field name reflect when Sample vs Run accessions are provided?
Thank you for your help and work!