Open
Description
Describe the bug
Using interleave_datasets with multiple dataloader workers and a seed set causes the same dataset sampling order across all workers.
Should the seed be modulated with the worker id?
Steps to reproduce the bug
See above
Expected behavior
See above
Environment info
datasets
version: 3.5.1- Platform: macOS-15.4.1-arm64-arm-64bit
- Python version: 3.12.9
huggingface_hub
version: 0.30.2- PyArrow version: 19.0.1
- Pandas version: 2.2.3
fsspec
version: 2024.12.0
Metadata
Metadata
Assignees
Labels
No labels