Skip to content

Is --gzip supposed to fetch compressed files or trigger local compression? #534

@ezherman

Description

@ezherman

Before opening an issue, please:

  • Make sure you are using the latest version using datasets --version
  • Review our documentation

Describe the bug
I have compared rehydration with and without --gzip, expecting this option to fetch a compressed genome and to therefore have a shorter runtime. Instead I am finding --gzip adds to the runtime, suggesting compression is happening on my machine. I am opening this issue to confirm: is --gzip supposed to fetch compressed files?

To Reproduce

datasets download genome accession GCF_000001405.40 --dehydrated --filename human_GRCh38_dataset.zip
unzip human_GRCh38_dataset.zip -d my_human_dataset
time datasets rehydrate --directory my_human_dataset/ --gzip
time datasets rehydrate --directory my_human_dataset/

Expected behavior

time datasets rehydrate --directory my_human_dataset/ --gzip
Found 1 of 1 files for rehydration
Completed 1 of 1 [================================================] 100%

real	5m39.572s
user	0m4.918s
sys	0m10.745s

time datasets rehydrate --directory my_human_dataset/
Found 1 of 1 files for rehydration
Completed 1 of 1 [================================================] 100%

real	4m46.089s
user	0m28.370s
sys	0m10.848s

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions