-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Description
Describe the bug
AttributeError: type object 'tqdm' has no attribute '_lock'
It occurs when I'm trying to load datasets in thread pool.
Issue #6066 and PR #6067 #6068 tried to fix this.
Steps to reproduce the bug
Will have to try several times to reproduce the error because it is concerned with threads.
- save some datasets for test
import os
os.makedirs("test_dataset_shards", exist_ok=True)
for i in range(10):
data = Dataset.from_dict({"text": [f"example {j}" for j in range(1000000)]})
data = DatasetDict({'train': data})
data.save_to_disk(f"test_dataset_shards/shard_{i}")
- load them in a thread pool
from datasets import load_from_disk
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor, as_completed
import glob
datas = glob.glob('test_dataset_shards/shard_*')
with ThreadPoolExecutor(max_workers=10) as pool:
futures = [pool.submit(load_from_disk, it) for it in datas]
datas = []
for future in tqdm(as_completed(futures), total=len(futures)):
datas.append(future.result())
Expected behavior
no exception raised
Environment info
datasets==2.19.0
python==3.10
Metadata
Metadata
Assignees
Labels
No labels