Skip to content

Inconsistent taxonomy queries with --rank included #532

@xonq

Description

@xonq

Before opening an issue, please:

  • Make sure you are using the latest version using datasets --version
  • Review our documentation

Describe the bug
Inconsistent taxonomy queries being returned with --rank specified

To Reproduce
Here is an example using a while loop where it took 13 attempts to retrieve the expected metadata:

while true; do echo "attempting"; datasets summary taxonomy taxon 138948 --rank species --as-json-lines; done 
attempting
attempting
attempting
attempting
attempting
attempting
attempting
attempting
attempting
attempting
attempting
attempting
attempting
{"query":["138948"],"taxonomy":{"children":[138948],"classification":{"acellular_root":{"id":10239,"name":"Viruses"},"class":{"id":2732506,"name":"Pisoniviricetes"},"family":{"id":12058,"name":"Picornaviridae"},"genus":{"id":12059,"name":"Enterovirus"},"kingdom":{"id":2732396,"name":"Orthornavirae"},"order":{"id":464095,"name":"Picornavirales"},"phylum":{"id":2732408,"name":"Pisuviricota"},"realm":{"id":2559587,"name":"Riboviria"},"species":{"id":3428500,"name":"Enterovirus alphacoxsackie"}},"counts":[{"count":43,"type":"COUNT_TYPE_ASSEMBLY"},{"count":3,"type":"COUNT_TYPE_GENE"},{"count":3,"type":"COUNT_TYPE_PROTEIN_CODING"}],"current_scientific_name":{"name":"Enterovirus alphacoxsackie"},"genomic_moltype":"ssRNA(+)","group_name":"viruses","parents":[1,10239,2559587,2732396,2732408,2732506,464095,12058,2946630,2960224,12059],"rank":"SPECIES","tax_id":3428500}}
attempting

This behavior does NOT occur without --rank:

while true; do echo "attempting"; datasets summary taxonomy taxon 138948 --as-json-lines; done

attempting

{"query":["138948"],"taxonomy":{"children":[1530249,469959,306587,297248,306586,306588,39054,2884179,2870395,2870394,2870393,2487724,1435148,1530250,2760809,42788,42787,86107,42786,42785,42784,33757,31704,42773,42771,42769,172022],"classification":{"acellular_root":{"id":10239,"name":"Viruses"},"class":{"id":2732506,"name":"Pisoniviricetes"},"family":{"id":12058,"name":"Picornaviridae"},"genus":{"id":12059,"name":"Enterovirus"},"kingdom":{"id":2732396,"name":"Orthornavirae"},"order":{"id":464095,"name":"Picornavirales"},"phylum":{"id":2732408,"name":"Pisuviricota"},"realm":{"id":2559587,"name":"Riboviria"},"species":{"id":3428500,"name":"Enterovirus alphacoxsackie"}},"counts":[{"count":43,"type":"COUNT_TYPE_ASSEMBLY"},{"count":3,"type":"COUNT_TYPE_GENE"},{"count":3,"type":"COUNT_TYPE_PROTEIN_CODING"}],"current_scientific_name":{"name":"Enterovirus A"},"genomic_moltype":"ssRNA(+)","group_name":"viruses","parents":[1,10239,2559587,2732396,2732408,2732506,464095,12058,2946630,2960224,12059,3428500],"secondary_tax_ids":[29269],"tax_id":138948}}

attempting
{"query":["138948"],"taxonomy":{"children":[1530249,469959,306587,297248,306586,306588,39054,2884179,2870395,2870394,2870393,2487724,1435148,1530250,2760809,42788,42787,86107,42786,42785,42784,33757,31704,42773,42771,42769,172022],"classification":{"acellular_root":{"id":10239,"name":"Viruses"},"class":{"id":2732506,"name":"Pisoniviricetes"},"family":{"id":12058,"name":"Picornaviridae"},"genus":{"id":12059,"name":"Enterovirus"},"kingdom":{"id":2732396,"name":"Orthornavirae"},"order":{"id":464095,"name":"Picornavirales"},"phylum":{"id":2732408,"name":"Pisuviricota"},"realm":{"id":2559587,"name":"Riboviria"},"species":{"id":3428500,"name":"Enterovirus alphacoxsackie"}},"counts":[{"count":43,"type":"COUNT_TYPE_ASSEMBLY"},{"count":3,"type":"COUNT_TYPE_GENE"},{"count":3,"type":"COUNT_TYPE_PROTEIN_CODING"}],"current_scientific_name":{"name":"Enterovirus A"},"genomic_moltype":"ssRNA(+)","group_name":"viruses","parents":[1,10239,2559587,2732396,2732408,2732506,464095,12058,2946630,2960224,12059,3428500],"secondary_tax_ids":[29269],"tax_id":138948}}

attempting
{"query":["138948"],"taxonomy":{"children":[1530249,469959,306587,297248,306586,306588,39054,2884179,2870395,2870394,2870393,2487724,1435148,1530250,2760809,42788,42787,86107,42786,42785,42784,33757,31704,42773,42771,42769,172022],"classification":{"acellular_root":{"id":10239,"name":"Viruses"},"class":{"id":2732506,"name":"Pisoniviricetes"},"family":{"id":12058,"name":"Picornaviridae"},"genus":{"id":12059,"name":"Enterovirus"},"kingdom":{"id":2732396,"name":"Orthornavirae"},"order":{"id":464095,"name":"Picornavirales"},"phylum":{"id":2732408,"name":"Pisuviricota"},"realm":{"id":2559587,"name":"Riboviria"},"species":{"id":3428500,"name":"Enterovirus alphacoxsackie"}},"counts":[{"count":43,"type":"COUNT_TYPE_ASSEMBLY"},{"count":3,"type":"COUNT_TYPE_GENE"},{"count":3,"type":"COUNT_TYPE_PROTEIN_CODING"}],"current_scientific_name":{"name":"Enterovirus A"},"genomic_moltype":"ssRNA(+)","group_name":"viruses","parents":[1,10239,2559587,2732396,2732408,2732506,464095,12058,2946630,2960224,12059,3428500],"secondary_tax_ids":[29269],"tax_id":138948}}

Steps to reproduce the behavior:
Run datasets with --rank specified. These runs were initialized without prior queries, so the initial failures are likely not due to querying the API too often.

Expected behavior
Queries should be successful and consistent. Does this have to do with the February 2026 deprecation schedule being implemented early on some servers?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions