Skip to content

AttributeError: type object 'tqdm' has no attribute '_lock' #7660

@Hypothesis-Z

Description

@Hypothesis-Z

Describe the bug

AttributeError: type object 'tqdm' has no attribute '_lock'

It occurs when I'm trying to load datasets in thread pool.

Issue #6066 and PR #6067 #6068 tried to fix this.

Steps to reproduce the bug

Will have to try several times to reproduce the error because it is concerned with threads.

  1. save some datasets for test
import os

os.makedirs("test_dataset_shards", exist_ok=True)

for i in range(10):
    data = Dataset.from_dict({"text": [f"example {j}" for j in range(1000000)]})
    data = DatasetDict({'train': data})
    data.save_to_disk(f"test_dataset_shards/shard_{i}")
  1. load them in a thread pool
from datasets import load_from_disk
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor, as_completed
import glob

datas = glob.glob('test_dataset_shards/shard_*')
with ThreadPoolExecutor(max_workers=10) as pool:
    futures = [pool.submit(load_from_disk, it) for it in datas]
datas = []
for future in tqdm(as_completed(futures), total=len(futures)):
    datas.append(future.result())

Expected behavior

no exception raised

Environment info

datasets==2.19.0
python==3.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions