Skip to content

Commit e199f19

Browse files
authored
minor docs data aug (#7621)
1 parent 660c645 commit e199f19

File tree

1 file changed

+6
-8
lines changed

1 file changed

+6
-8
lines changed

docs/source/use_dataset.mdx

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -177,9 +177,7 @@ Most image models expect the image to be in the RGB mode. The Beans images are a
177177

178178
**3**. Now let's apply data augmentations to your images. 🤗 Datasets works with any augmentation library, and in this example we'll use Albumentations.
179179

180-
### Using Albumentations
181-
182-
[Albumentations](https://albumentations.ai) is a popular image augmentation library that provides a [rich set of transforms](https://albumentations.ai/docs/reference/supported-targets-by-transform/) including spatial-level transforms, pixel-level transforms, and mixing-level transforms. When running on CPU, which is typical for transformers pipelines, Albumentations is [faster than torchvision](https://albumentations.ai/docs/benchmarks/image-benchmarks/).
180+
[Albumentations](https://albumentations.ai) is a popular image augmentation library that provides a [rich set of transforms](https://albumentations.ai/docs/reference/supported-targets-by-transform/) including spatial-level transforms, pixel-level transforms, and mixing-level transforms.
183181

184182
Install Albumentations:
185183

@@ -201,7 +199,7 @@ pip install albumentations
201199
... ])
202200
```
203201

204-
**5**. Since 🤗 Datasets uses PIL images but Albumentations expects OpenCV format (numpy arrays), you need to convert between formats:
202+
**5**. Since 🤗 Datasets uses PIL images but Albumentations expects NumPy arrays, you need to convert between formats:
205203

206204
```py
207205
>>> def albumentations_transforms(examples):
@@ -222,16 +220,16 @@ pip install albumentations
222220
... return examples
223221
```
224222

225-
**6**. Apply the transform using [`~Dataset.set_transform`]:
223+
**6**. Apply the transform using [`~Dataset.with_transform`]:
226224

227225
```py
228-
>>> dataset.set_transform(albumentations_transforms)
226+
>>> dataset = dataset.with_transform(albumentations_transforms)
229227
>>> dataset[0]["pixel_values"]
230228
```
231229

232230
**Key points when using Albumentations with 🤗 Datasets:**
233-
- Convert PIL images to numpy arrays before applying transforms
231+
- Convert PIL images to NumPy arrays before applying transforms
234232
- Albumentations returns a dictionary with the transformed image under the "image" key
235233
- Convert the result back to PIL format after transformation
236234

237-
**7**. The dataset is now ready for training with your machine learning framework!
235+
**7**. The dataset is now ready for training with your machine learning framework!

0 commit comments

Comments
 (0)