Fix typos in PDF and Video documentation (#7579)

AndreaFrancis · web-flow · commit 9d3dee91f2e0 · 2025-05-22T14:53:47.000+02:00
Fix typo
diff --git a/docs/source/document_dataset.mdx b/docs/source/document_dataset.mdx
@@ -1,6 +1,6 @@
 # Create a document dataset
 
-This guide will show you how to create a document with `PdfFolder` and some metadata. This is a no-code solution for quickly creating a document with several thousand pdfs.
+This guide will show you how to create a document dataset with `PdfFolder` and some metadata. This is a no-code solution for quickly creating a document dataset with several thousand pdfs.
 
 <Tip>
 
@@ -10,7 +10,7 @@ You can control access to your dataset by requiring users to share their contact
 
 ## PdfFolder
 
-The `PdfFolder` is a dataset builder designed to quickly load a document with several thousand pdfs without requiring you to write any code.
+The `PdfFolder` is a dataset builder designed to quickly load a document dataset with several thousand pdfs without requiring you to write any code.
 
 <Tip>
 
diff --git a/docs/source/document_load.mdx b/docs/source/document_load.mdx
@@ -14,7 +14,7 @@ To work with pdf datasets, you need to have the `pdfplumber` package installed.
 
 </Tip>
 
-When you load an pdf dataset and call the pdf column, the pdfs are decoded as `pdfplumber` Pdfs:
+When you load a pdf dataset and call the pdf column, the pdfs are decoded as `pdfplumber` Pdfs:
 
 ```py
 >>> from datasets import load_dataset, Pdf
@@ -26,15 +26,15 @@ When you load an pdf dataset and call the pdf column, the pdfs are decoded as `p
 
 <Tip warning={true}>
 
-Index into an pdf dataset using the row index first and then the `pdf` column - `dataset[0]["pdf"]` - to avoid creating all the pdf objects in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.
+Index into a pdf dataset using the row index first and then the `pdf` column - `dataset[0]["pdf"]` - to avoid creating all the pdf objects in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.
 
 </Tip>
 
 For a guide on how to load any type of dataset, take a look at the <a class="underline decoration-sky-400 decoration-2 font-semibold" href="./loading">general loading guide</a>.
 
 ## Read pages
 
-Access pages directly from a pdf using the `PDF` using `.pages`.
+Access pages directly from a pdf using the `.pages` attribute.
 
 Then you can use the `pdfplumber` functions to read texts, tables and images, e.g.:
 
@@ -168,7 +168,7 @@ To ignore the information in the metadata file, set `drop_metadata=True` in [`lo
 
 If you don't have a metadata file, `PdfFolder` automatically infers the label name from the directory name.
 If you want to drop automatically created labels, set `drop_labels=True`.
-In this case, your dataset will only contain an pdf column:
+In this case, your dataset will only contain a pdf column:
 
 ```py
 >>> from datasets import load_dataset
diff --git a/docs/source/video_load.mdx b/docs/source/video_load.mdx
@@ -14,7 +14,7 @@ To work with video datasets, you need to have the `torchvision` and `av` package
 
 </Tip>
 
-When you load an video dataset and call the video column, the videos are decoded as `torchvision` Videos:
+When you load a video dataset and call the video column, the videos are decoded as `torchvision` Videos:
 
 ```py
 >>> from datasets import load_dataset, Video
@@ -26,7 +26,7 @@ When you load an video dataset and call the video column, the videos are decoded
 
 <Tip warning={true}>
 
-Index into an video dataset using the row index first and then the `video` column - `dataset[0]["video"]` - to avoid creating all the video objects in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.
+Index into a video dataset using the row index first and then the `video` column - `dataset[0]["video"]` - to avoid creating all the video objects in the dataset. Otherwise, this can be a slow and time-consuming process if you have a large dataset.
 
 </Tip>
 
@@ -136,7 +136,7 @@ To ignore the information in the metadata file, set `drop_metadata=True` in [`lo
 
 If you don't have a metadata file, `VideoFolder` automatically infers the label name from the directory name.
 If you want to drop automatically created labels, set `drop_labels=True`.
-In this case, your dataset will only contain an video column:
+In this case, your dataset will only contain a video column:
 
 ```py
 >>> from datasets import load_dataset