Skip to content

Commit 9d3dee9

Browse files
Fix typos in PDF and Video documentation (#7579)
Fix typo
1 parent 53f958e commit 9d3dee9

File tree

3 files changed

+9
-9
lines changed

3 files changed

+9
-9
lines changed

docs/source/document_dataset.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Create a document dataset
22

3-
This guide will show you how to create a document with `PdfFolder` and some metadata. This is a no-code solution for quickly creating a document with several thousand pdfs.
3+
This guide will show you how to create a document dataset with `PdfFolder` and some metadata. This is a no-code solution for quickly creating a document dataset with several thousand pdfs.
44

55
<Tip>
66

@@ -10,7 +10,7 @@ You can control access to your dataset by requiring users to share their contact
1010

1111
## PdfFolder
1212

13-
The `PdfFolder` is a dataset builder designed to quickly load a document with several thousand pdfs without requiring you to write any code.
13+
The `PdfFolder` is a dataset builder designed to quickly load a document dataset with several thousand pdfs without requiring you to write any code.
1414

1515
<Tip>
1616

docs/source/document_load.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ To work with pdf datasets, you need to have the `pdfplumber` package installed.
1414

1515
</Tip>
1616

17-
When you load an pdf dataset and call the pdf column, the pdfs are decoded as `pdfplumber` Pdfs:
17+
When you load a pdf dataset and call the pdf column, the pdfs are decoded as `pdfplumber` Pdfs:
1818

1919
```py
2020
>>> from datasets import load_dataset, Pdf
@@ -26,15 +26,15 @@ When you load an pdf dataset and call the pdf column, the pdfs are decoded as `p
2626

2727
<Tip warning={true}>
2828

29-
Index into an pdf dataset using the row index first and then the `pdf` column - `dataset[0]["pdf"]` - to avoid creating all the pdf objects in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.
29+
Index into a pdf dataset using the row index first and then the `pdf` column - `dataset[0]["pdf"]` - to avoid creating all the pdf objects in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.
3030

3131
</Tip>
3232

3333
For a guide on how to load any type of dataset, take a look at the <a class="underline decoration-sky-400 decoration-2 font-semibold" href="./loading">general loading guide</a>.
3434

3535
## Read pages
3636

37-
Access pages directly from a pdf using the `PDF` using `.pages`.
37+
Access pages directly from a pdf using the `.pages` attribute.
3838

3939
Then you can use the `pdfplumber` functions to read texts, tables and images, e.g.:
4040

@@ -168,7 +168,7 @@ To ignore the information in the metadata file, set `drop_metadata=True` in [`lo
168168

169169
If you don't have a metadata file, `PdfFolder` automatically infers the label name from the directory name.
170170
If you want to drop automatically created labels, set `drop_labels=True`.
171-
In this case, your dataset will only contain an pdf column:
171+
In this case, your dataset will only contain a pdf column:
172172

173173
```py
174174
>>> from datasets import load_dataset

docs/source/video_load.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ To work with video datasets, you need to have the `torchvision` and `av` package
1414

1515
</Tip>
1616

17-
When you load an video dataset and call the video column, the videos are decoded as `torchvision` Videos:
17+
When you load a video dataset and call the video column, the videos are decoded as `torchvision` Videos:
1818

1919
```py
2020
>>> from datasets import load_dataset, Video
@@ -26,7 +26,7 @@ When you load an video dataset and call the video column, the videos are decoded
2626

2727
<Tip warning={true}>
2828

29-
Index into an video dataset using the row index first and then the `video` column - `dataset[0]["video"]` - to avoid creating all the video objects in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.
29+
Index into a video dataset using the row index first and then the `video` column - `dataset[0]["video"]` - to avoid creating all the video objects in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.
3030

3131
</Tip>
3232

@@ -136,7 +136,7 @@ To ignore the information in the metadata file, set `drop_metadata=True` in [`lo
136136

137137
If you don't have a metadata file, `VideoFolder` automatically infers the label name from the directory name.
138138
If you want to drop automatically created labels, set `drop_labels=True`.
139-
In this case, your dataset will only contain an video column:
139+
In this case, your dataset will only contain a video column:
140140

141141
```py
142142
>>> from datasets import load_dataset

0 commit comments

Comments
 (0)