You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
⚠️ This only applies to files written by the `datasets` library (e.g., Arrow files and indices).
33
+
It does **not** affect files downloaded from the Hugging Face Hub (such as models, tokenizers, or raw dataset sources), which are located in `~/.cache/huggingface/hub` by default and controlled separately via the `HF_HUB_CACHE` variable:
34
+
35
+
```
36
+
$ export HF_HUB_CACHE="/path/to/hub_cache"
37
+
```
38
+
39
+
💡 If you'd like to relocate all Hugging Face caches — including datasets and hub downloads — use the `HF_HOME` variable instead:
40
+
41
+
```
42
+
$ export HF_HOME="/path/to/cache_root"
43
+
```
44
+
45
+
This results in:
46
+
- datasets cache → `/path/to/cache_root/datasets`
47
+
- hub cache → `/path/to/cache_root/hub`
48
+
49
+
These distinctions are especially useful when working in shared environments or networked file systems (e.g., NFS).
50
+
See [issue #7480](https://github.com/huggingface/datasets/issues/7480) for discussion on how users encountered unexpected cache locations when `HF_HUB_CACHE` was not set alongside `HF_DATASETS_CACHE`.
51
+
26
52
When you load a dataset, you also have the option to change where the data is cached. Change the `cache_dir` parameter to the path you want:
27
53
28
54
```py
@@ -82,6 +108,6 @@ If you want to reuse a dataset from scratch, try setting the `download_mode` par
82
108
83
109
Disabling the cache and copying the dataset in-memory will speed up dataset operations. There are two options for copying the dataset in-memory:
84
110
85
-
1. Set `datasets.config.IN_MEMORY_MAX_SIZE` to a nonzero value (in bytes) that fits in your RAM memory.
111
+
1. Set `datasets.config.IN_MEMORY_MAX_SIZE` to a nonzero value (in bytes) that fits in your RAM memory.
86
112
87
113
2. Set the environment variable `HF_DATASETS_IN_MEMORY_MAX_SIZE` to a nonzero value. Note that the first method takes higher precedence.
0 commit comments