|
| 1 | +<div itemscope itemtype="http://schema.org/Dataset"> |
| 2 | + <div itemscope itemprop="includedInDataCatalog" itemtype="http://schema.org/DataCatalog"> |
| 3 | + <meta itemprop="name" content="TensorFlow Datasets" /> |
| 4 | + </div> |
| 5 | + |
| 6 | + <meta itemprop="name" content="accentdb" /> |
| 7 | + <meta itemprop="description" content="AccentDB is a multi-pairwise parallel corpus of structured and labelled accented speech. It contains speech samples from speakers of 4 non-native accents of English (8 speakers, 4 Indian languages); and also has a compilation of 4 native accents of English (4 countries, 13 speakers) and a metropolitan Indian accent (2 speakers). The dataset available here corresponds to release titled accentdb_extended on https://accentdb.github.io/#dataset. To use this dataset: ```python import tensorflow_datasets as tfds ds = tfds.load('accentdb', split='train') for ex in ds.take(4): print(ex) ``` See [the guide](https://www.tensorflow.org/datasets/overview) for more informations on [tensorflow_datasets](https://www.tensorflow.org/datasets). " /> |
| 8 | + <meta itemprop="url" content="https://www.tensorflow.org/datasets/catalog/accentdb" /> |
| 9 | + <meta itemprop="sameAs" content="https://accentdb.github.io/" /> |
| 10 | + <meta itemprop="citation" content="@InProceedings{ahamad-anand-bhargava:2020:LREC, author = {Ahamad, Afroz and Anand, Ankit and Bhargava, Pranesh}, title = {AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition}, booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference}, month = {May}, year = {2020}, address = {Marseille, France}, publisher = {European Language Resources Association}, pages = {5353--5360}, url = {https://www.aclweb.org/anthology/2020.lrec-1.659} }" /> |
| 11 | +</div> |
| 12 | + |
| 13 | +# `accentdb` |
| 14 | + |
| 15 | +Note: This dataset was added recently and is only available in our |
| 16 | +`tfds-nightly` package |
| 17 | +<span class="material-icons" title="Available only in the tfds-nightly package">nights_stay</span>. |
| 18 | + |
| 19 | +* **Description**: |
| 20 | + |
| 21 | +AccentDB is a multi-pairwise parallel corpus of structured and labelled accented |
| 22 | +speech. It contains speech samples from speakers of 4 non-native accents of |
| 23 | +English (8 speakers, 4 Indian languages); and also has a compilation of 4 native |
| 24 | +accents of English (4 countries, 13 speakers) and a metropolitan Indian accent |
| 25 | +(2 speakers). The dataset available here corresponds to release titled |
| 26 | +accentdb_extended on https://accentdb.github.io/#dataset. |
| 27 | + |
| 28 | +* **Homepage**: [https://accentdb.github.io/](https://accentdb.github.io/) |
| 29 | + |
| 30 | +* **Source code**: |
| 31 | + [`tfds.audio.Accentdb`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/audio/accentdb.py) |
| 32 | + |
| 33 | +* **Versions**: |
| 34 | + |
| 35 | + * **`1.0.0`** (default): No release notes. |
| 36 | + |
| 37 | +* **Download size**: `3.56 GiB` |
| 38 | + |
| 39 | +* **Dataset size**: `19.47 GiB` |
| 40 | + |
| 41 | +* **Auto-cached** |
| 42 | + ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)): |
| 43 | + No |
| 44 | + |
| 45 | +* **Splits**: |
| 46 | + |
| 47 | +Split | Examples |
| 48 | +:-------- | -------: |
| 49 | +`'train'` | 17,313 |
| 50 | + |
| 51 | +* **Features**: |
| 52 | + |
| 53 | +```python |
| 54 | +FeaturesDict({ |
| 55 | + 'audio': Audio(shape=(None,), dtype=tf.int64), |
| 56 | + 'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=9), |
| 57 | + 'speaker_id': tf.string, |
| 58 | +}) |
| 59 | +``` |
| 60 | + |
| 61 | +* **Supervised keys** (See |
| 62 | + [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)): |
| 63 | + `('audio', 'label')` |
| 64 | + |
| 65 | +* **Citation**: |
| 66 | + |
| 67 | +``` |
| 68 | +@InProceedings{ahamad-anand-bhargava:2020:LREC, |
| 69 | + author = {Ahamad, Afroz and Anand, Ankit and Bhargava, Pranesh}, |
| 70 | + title = {AccentDB: A Database of Non-Native English Accents to Assist Neural Speech Recognition}, |
| 71 | + booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference}, |
| 72 | + month = {May}, |
| 73 | + year = {2020}, |
| 74 | + address = {Marseille, France}, |
| 75 | + publisher = {European Language Resources Association}, |
| 76 | + pages = {5353--5360}, |
| 77 | + url = {https://www.aclweb.org/anthology/2020.lrec-1.659} |
| 78 | +} |
| 79 | +``` |
| 80 | + |
| 81 | +* **Figure** |
| 82 | + ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)): |
| 83 | + Not supported. |
| 84 | + |
| 85 | +* **Examples** |
| 86 | + ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)): |
| 87 | + Missing. |
0 commit comments