-
Notifications
You must be signed in to change notification settings - Fork 169
Open
Labels
help wantedExtra attention is neededExtra attention is needed
Description
🚀 The feature
Currently, IterToMap
starts to load all data from prior IterDataPipe
when the first __getitem__
is invoked here.
https://github.com/pytorch/data/blob/13b574c80e8732744fee6ab9cb7e35b5afc34a3c/torchdata/datapipes/iter/util/converter.py#L78
We can stop loading data from prior IterDataPipe
whenever we find the requested index. And, we might need to add a flag to prevent loading data multiple times.
Motivation, pitch
This would improve the performance if users simply iterate over the MapDataPipe
as we don't need to pre-load everything at the beginning of the iteration, basically, simulating the behavior of IterDataPipe
.
Alternatives
No response
Additional context
No response
NivekT, ArXen42, pmeier and linminhtoo
Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is needed