Skip to content

Commit f6c01d8

Browse files
TensorFlow Datasets Teamcopybara-github
authored andcommitted
Automated documentation update.
PiperOrigin-RevId: 336145075
1 parent 07f547b commit f6c01d8

File tree

3 files changed

+92
-0
lines changed

3 files changed

+92
-0
lines changed

docs/catalog/_toc.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@ toc:
22
- path: /datasets/catalog/overview
33
title: Overview
44
- section:
5+
- path: /datasets/catalog/accentdb
6+
status: nightly
7+
title: accentdb
58
- path: /datasets/catalog/common_voice
69
title: common_voice
710
- path: /datasets/catalog/crema_d

docs/catalog/accentdb.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
<div itemscope itemtype="http://schema.org/Dataset">
2+
<div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog">
3+
<meta itemprop="name" content="TensorFlow Datasets" />
4+
</div>
5+
6+
<meta itemprop="name" content="accentdb" />
7+
<meta itemprop="description" content="AccentDB is a multi-pairwise parallel corpus of structured and labelled&#10;accented speech. It contains speech samples from speakers of 4 non-native&#10;accents of English (8 speakers, 4 Indian languages); and also has a compilation&#10;of 4 native accents of English (4 countries, 13 speakers) and a metropolitan&#10;Indian accent (2 speakers). The dataset available here corresponds to release&#10;titled accentdb_extended on https://accentdb.github.io/#dataset.&#10;&#10;To use this dataset:&#10;&#10;```python&#10;import tensorflow_datasets as tfds&#10;&#10;ds = tfds.load(&#x27;accentdb&#x27;, split=&#x27;train&#x27;)&#10;for ex in ds.take(4):&#10; print(ex)&#10;```&#10;&#10;See [the guide](https://www.tensorflow.org/datasets/overview) for more&#10;informations on [tensorflow_datasets](https://www.tensorflow.org/datasets).&#10;&#10;" />
8+
<meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/accentdb" />
9+
<meta itemprop="sameAs" content="https://accentdb.github.io/" />
10+
<meta itemprop="citation" content="@InProceedings{ahamad-anand-bhargava:2020:LREC,&#10; author = {Ahamad, Afroz and Anand, Ankit and Bhargava, Pranesh},&#10; title = {AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition},&#10; booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},&#10; month = {May},&#10; year = {2020},&#10; address = {Marseille, France},&#10; publisher = {European Language Resources Association},&#10; pages = {5353--5360},&#10; url = {https://www.aclweb.org/anthology/2020.lrec-1.659}&#10;}" />
11+
</div>
12+
13+
# `accentdb`
14+
15+
Note: This dataset was added recently and is only available in our
16+
`tfds-nightly` package
17+
<span class="material-icons" title="Available only in the tfds-nightly package">nights_stay</span>.
18+
19+
* **Description**:
20+
21+
AccentDB is a multi-pairwise parallel corpus of structured and labelled accented
22+
speech. It contains speech samples from speakers of 4 non-native accents of
23+
English (8 speakers, 4 Indian languages); and also has a compilation of 4 native
24+
accents of English (4 countries, 13 speakers) and a metropolitan Indian accent
25+
(2 speakers). The dataset available here corresponds to release titled
26+
accentdb_extended on https://accentdb.github.io/#dataset.
27+
28+
* **Homepage**: [https://accentdb.github.io/](https://accentdb.github.io/)
29+
30+
* **Source code**:
31+
[`tfds.audio.Accentdb`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/audio/accentdb.py)
32+
33+
* **Versions**:
34+
35+
* **`1.0.0`** (default): No release notes.
36+
37+
* **Download size**: `3.56 GiB`
38+
39+
* **Dataset size**: `19.47 GiB`
40+
41+
* **Auto-cached**
42+
([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):
43+
No
44+
45+
* **Splits**:
46+
47+
Split | Examples
48+
:-------- | -------:
49+
`'train'` | 17,313
50+
51+
* **Features**:
52+
53+
```python
54+
FeaturesDict({
55+
'audio': Audio(shape=(None,), dtype=tf.int64),
56+
'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=9),
57+
'speaker_id': tf.string,
58+
})
59+
```
60+
61+
* **Supervised keys** (See
62+
[`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):
63+
`('audio', 'label')`
64+
65+
* **Citation**:
66+
67+
```
68+
@InProceedings{ahamad-anand-bhargava:2020:LREC,
69+
author = {Ahamad, Afroz and Anand, Ankit and Bhargava, Pranesh},
70+
title = {AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition},
71+
booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},
72+
month = {May},
73+
year = {2020},
74+
address = {Marseille, France},
75+
publisher = {European Language Resources Association},
76+
pages = {5353--5360},
77+
url = {https://www.aclweb.org/anthology/2020.lrec-1.659}
78+
}
79+
```
80+
81+
* **Figure**
82+
([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):
83+
Not supported.
84+
85+
* **Examples**
86+
([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):
87+
Missing.

docs/catalog/overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ for ex in tfds.load('cifar10', split='train'):
2424

2525
### `Audio`
2626

27+
* [`accentdb`](accentdb.md)
28+
<span class="material-icons" title="Available only in the tfds-nightly package">nights_stay</span>
2729
* [`common_voice`](common_voice.md)
2830
* [`crema_d`](crema_d.md)
2931
* [`dementiabank`](dementiabank.md)

0 commit comments

Comments
 (0)