Skip to content

Commit 3d5aac6

Browse files
authored
Merge pull request #1341 from Jackwaterveg/r0.1
[Version]r0.1.1
2 parents 1656fde + bd1300d commit 3d5aac6

File tree

326 files changed

+8342
-6089
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

326 files changed

+8342
-6089
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
build
1616

1717
docs/build/
18+
docs/topic/ctc/warp-ctc/
1819

1920
tools/venv
2021
tools/kenlm

.mergify.yml

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,12 @@ pull_request_rules:
3232
actions:
3333
label:
3434
remove: ["conflicts"]
35+
- name: "auto add label=Dataset"
36+
conditions:
37+
- files~=^dataset/
38+
actions:
39+
label:
40+
add: ["Dataset"]
3541
- name: "auto add label=S2T"
3642
conditions:
3743
- files~=^paddlespeech/s2t/
@@ -50,18 +56,30 @@ pull_request_rules:
5056
actions:
5157
label:
5258
add: ["Audio"]
53-
- name: "auto add label=TextProcess"
59+
- name: "auto add label=Vector"
60+
conditions:
61+
- files~=^paddlespeech/vector/
62+
actions:
63+
label:
64+
add: ["Vector"]
65+
- name: "auto add label=Text"
5466
conditions:
5567
- files~=^paddlespeech/text/
5668
actions:
5769
label:
58-
add: ["TextProcess"]
70+
add: ["Text"]
5971
- name: "auto add label=Example"
6072
conditions:
6173
- files~=^examples/
6274
actions:
6375
label:
6476
add: ["Example"]
77+
- name: "auto add label=CLI"
78+
conditions:
79+
- files~=^paddlespeech/cli
80+
actions:
81+
label:
82+
add: ["CLI"]
6583
- name: "auto add label=Demo"
6684
conditions:
6785
- files~=^demos/
@@ -70,13 +88,13 @@ pull_request_rules:
7088
add: ["Demo"]
7189
- name: "auto add label=README"
7290
conditions:
73-
- files~=README.md
91+
- files~=(README.md|READEME_cn.md)
7492
actions:
7593
label:
7694
add: ["README"]
7795
- name: "auto add label=Documentation"
7896
conditions:
79-
- files~=^docs/
97+
- files~=^(docs/|CHANGELOG.md|paddleaudio/CHANGELOG.md)
8098
actions:
8199
label:
82100
add: ["Documentation"]
@@ -88,10 +106,16 @@ pull_request_rules:
88106
add: ["CI"]
89107
- name: "auto add label=Installation"
90108
conditions:
91-
- files~=^(tools/|setup.py|setup.sh)
109+
- files~=^(tools/|setup.py|setup.cfg|setup_audio.py)
92110
actions:
93111
label:
94112
add: ["Installation"]
113+
- name: "auto add label=Test"
114+
conditions:
115+
- files~=^(tests/)
116+
actions:
117+
label:
118+
add: ["Test"]
95119
- name: "auto add label=mergify"
96120
conditions:
97121
- files~=^.mergify.yml

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Changelog
2+
3+
4+
Date: 2022-1-10, Author: Jackwaterveg.
5+
Add features to: CLI:
6+
- Support English (librispeech/asr1/transformer).
7+
- Support choosing `decode_method` for conformer and transformer models.
8+
- Refactor the config, using the unified config.
9+
- PRLink: https://github.com/PaddlePaddle/PaddleSpeech/pull/1297
10+
11+
***

README.md

Lines changed: 113 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
<h3>
99
<a href="#quick-start"> Quick Start </a>
10-
| <a href="#tutorials"> Tutorials </a>
10+
| <a href="#documents"> Documents </a>
1111
| <a href="#model-list"> Models List </a>
1212
</div>
1313

@@ -25,14 +25,6 @@
2525
<a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue"></a>
2626
</p>
2727

28-
<!---
29-
from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readmes-readable.md
30-
1.What is this repo or project? (You can reuse the repo description you used earlier because this section doesn’t have to be long.)
31-
2.How does it work?
32-
3.Who will use this repo or project?
33-
4.What is the goal of this project?
34-
-->
35-
3628

3729
**PaddleSpeech** is an open-source toolkit on [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.
3830

@@ -61,7 +53,6 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
6153
</td>
6254
<td>我认为跑步最重要的就是给我带来了身体健康。</td>
6355
</tr>
64-
6556
</tbody>
6657
</table>
6758

@@ -95,7 +86,7 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
9586
<table style="width:100%">
9687
<thead>
9788
<tr>
98-
<th><img width="200" height="1"> Input Text <img width="200" height="1"> </th>
89+
<th width="550" > Input Text</th>
9990
<th>Synthetic Audio</th>
10091
</tr>
10192
</thead>
@@ -114,14 +105,53 @@ from https://github.com/18F/open-source-guide/blob/18f-pages/pages/making-readme
114105
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
115106
</td>
116107
</tr>
108+
<tr>
109+
<td >季姬寂,集鸡,鸡即棘鸡。棘鸡饥叽,季姬及箕稷济鸡。鸡既济,跻姬笈,季姬忌,急咭鸡,鸡急,继圾几,季姬急,即籍箕击鸡,箕疾击几伎,伎即齑,鸡叽集几基,季姬急极屐击鸡,鸡既殛,季姬激,即记《季姬击鸡记》。</td>
110+
<td align = "center">
111+
<a href="https://paddlespeech.bj.bcebos.com/Parakeet/docs/demos/jijiji.wav" rel="nofollow">
112+
<img align="center" src="./docs/images/audio_icon.png" width="200" style="max-width: 100%;"></a><br>
113+
</td>
114+
</tr>
117115
</tbody>
118116
</table>
119117

120118
</div>
121119

122120
For more synthesized audios, please refer to [PaddleSpeech Text-to-Speech samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html).
123121

124-
### Features:
122+
##### Punctuation Restoration
123+
<div align = "center">
124+
<table style="width:100%">
125+
<thead>
126+
<tr>
127+
<th width="390"> Input Text </th>
128+
<th width="390"> Output Text </th>
129+
</tr>
130+
</thead>
131+
<tbody>
132+
<tr>
133+
<td>今天的天气真不错啊你下午有空吗我想约你一起去吃饭</td>
134+
<td>今天的天气真不错啊!你下午有空吗?我想约你一起去吃饭。</td>
135+
</tr>
136+
</tbody>
137+
</table>
138+
139+
</div>
140+
141+
### ⭐ Examples
142+
- **[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo): Use PaddleSpeech TTS to generate virtual human voice.**
143+
144+
<div align="center"><a href="https://www.bilibili.com/video/BV1cL411V71o?share_source=copy_web"><img src="https://ai-studio-static-online.cdn.bcebos.com/06fd746ab32042f398fb6f33f873e6869e846fe63c214596ae37860fe8103720" / width="500px"></a></div>
145+
146+
### 🔥 Hot Activities
147+
148+
- 2021.12.21~12.24
149+
150+
4 Days Live Courses: Depth interpretation of PaddleSpeech!
151+
152+
**Courses videos and related materials: https://aistudio.baidu.com/aistudio/education/group/info/25130**
153+
154+
### Features
125155

126156
Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:
127157
- 📦 **Ease of Use**: low barriers to install, and [CLI](#quick-start) is available to quick-start your journey.
@@ -132,34 +162,30 @@ Via the easy-to-use, efficient, flexible and scalable implementation, our vision
132162
- 🔬 *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details.
133163
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
134164

135-
136-
### Recent Update:
165+
### Recent Update
137166

138167
<!---
139168
2021.12.14: We would like to have an online courses to introduce basics and research of speech, as well as code practice with `paddlespeech`. Please pay attention to our [Calendar](https://www.paddlepaddle.org.cn/live).
140169
--->
141170
- 🤗 2021.12.14: Our PaddleSpeech [ASR](https://huggingface.co/spaces/KPatrick/PaddleSpeechASR) and [TTS](https://huggingface.co/spaces/akhaliq/paddlespeech) Demos on Hugging Face Spaces are available!
142171
- 👏🏻 2021.12.10: PaddleSpeech CLI is available for Audio Classification, Automatic Speech Recognition, Speech Translation (English to Chinese) and Text-to-Speech.
143172

144-
### Communication
145-
If you are in China, we recommend you to join our WeChat group to contact directly with our team members!
173+
### Community
174+
- Scan the QR code below with your Wechat (reply【语音】after your friend's application is approved), you can access to official technical exchange group. Look forward to your participation.
146175

147176
<div align="center">
148-
<img src="./docs/images/wechat_group.png" width = "400" />
149-
177+
<img src="https://raw.githubusercontent.com/yt605155624/lanceTest/main/images/wechat_4.jpg" width = "300" />
150178
</div>
151179

152180
## Installation
153181

154-
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7*, where `paddlespeech` can be easily installed with `pip`:
155-
```python
156-
pip install paddlepaddle paddlespeech
157-
```
158-
Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX and Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. Please see [installation](./docs/source/install.md) for other alternatives.
182+
We strongly recommend our users to install PaddleSpeech in **Linux** with *python>=3.7*.
183+
Up to now, **Linux** supports CLI for the all our tasks, **Mac OSX** and **Windows** only supports PaddleSpeech CLI for Audio Classification, Speech-to-Text and Text-to-Speech. To install `PaddleSpeech`, please see [installation](./docs/source/install.md).
159184

185+
<a name="quickstart"></a>
160186
## Quick Start
161187

162-
Developers can have a try of our models with [PaddleSpeech Command Line](./demos/README.md). Change `--input` to test your own audio/text.
188+
Developers can have a try of our models with [PaddleSpeech Command Line](./paddlespeech/cli/README.md). Change `--input` to test your own audio/text.
163189

164190
**Audio Classification**
165191
```shell
@@ -177,11 +203,20 @@ paddlespeech st --input input_16k.wav
177203
```
178204
**Text-to-Speech**
179205
```shell
180-
paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架" --output output.wav
206+
paddlespeech tts --input "你好,欢迎使用飞桨深度学习框架" --output output.wav
181207
```
182208
- web demo for Text to Speech is integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See Demo: [TTS Demo](https://huggingface.co/spaces/akhaliq/paddlespeech)
183209

210+
**Text Postprocessing**
211+
- Punctuation Restoration
212+
```bash
213+
paddlespeech text --task punc --input 今天的天气真不错啊你下午有空吗我想约你一起去吃饭
214+
```
215+
184216

217+
218+
For more command lines, please see: [demos](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/demos)
219+
185220
If you want to try more functions like training and tuning, please have a look at [Speech-to-Text Quick Start](./docs/source/asr/quick_start.md) and [Text-to-Speech Quick Start](./docs/source/tts/quick_start.md).
186221

187222
## Model List
@@ -190,10 +225,6 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
190225

191226
**Speech-to-Text** contains *Acoustic Model*, *Language Model*, and *Speech Translation*, with the following details:
192227

193-
<!---
194-
The current hyperlinks redirect to [Previous Parakeet](https://github.com/PaddlePaddle/Parakeet/tree/develop/examples).
195-
-->
196-
197228
<table style="width:100%">
198229
<thead>
199230
<tr>
@@ -313,7 +344,7 @@ The current hyperlinks redirect to [Previous Parakeet](https://github.com/Paddle
313344
</td>
314345
</tr>
315346
<tr>
316-
<td rowspan="3">Vocoder</td>
347+
<td rowspan="5">Vocoder</td>
317348
<td >WaveFlow</td>
318349
<td >LJSpeech</td>
319350
<td>
@@ -333,7 +364,21 @@ The current hyperlinks redirect to [Previous Parakeet](https://github.com/Paddle
333364
<td>
334365
<a href = "./examples/csmsc/voc3">Multi Band MelGAN-csmsc</a>
335366
</td>
336-
</tr>
367+
</tr>
368+
<tr>
369+
<td >Style MelGAN</td>
370+
<td >CSMSC</td>
371+
<td>
372+
<a href = "./examples/csmsc/voc4">Style MelGAN-csmsc</a>
373+
</td>
374+
</tr>
375+
<tr>
376+
<td >HiFiGAN</td>
377+
<td >CSMSC</td>
378+
<td>
379+
<a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a>
380+
</td>
381+
<tr>
337382
<tr>
338383
<td rowspan="3">Voice Cloning</td>
339384
<td>GE2E</td>
@@ -383,11 +428,37 @@ The current hyperlinks redirect to [Previous Parakeet](https://github.com/Paddle
383428
</tbody>
384429
</table>
385430

431+
**Punctuation Restoration**
432+
433+
<table style="width:100%">
434+
<thead>
435+
<tr>
436+
<th> Task </th>
437+
<th> Dataset </th>
438+
<th> Model Type </th>
439+
<th> Link </th>
440+
</tr>
441+
</thead>
442+
<tbody>
443+
444+
<tr>
445+
<td>Punctuation Restoration</td>
446+
<td>IWLST2012_zh</td>
447+
<td>Ernie Linear</td>
448+
<td>
449+
<a href = "./examples/iwslt2012/punc0">iwslt2012-punc0</a>
450+
</td>
451+
</tr>
452+
</tbody>
453+
</table>
454+
386455
## Documents
387456

388457
Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](https://paperswithcode.com/area/audio) and [Music SoTA](https://paperswithcode.com/area/music) give you an overview of the hot academic topics in the related area. To focus on the tasks in PaddleSpeech, you will find the following guidelines are helpful to grasp the core ideas.
389458

390459
- [Installation](./docs/source/install.md)
460+
- [Quick Start](#quickstart)
461+
- [Some Demos](./demos/README.md)
391462
- Tutorials
392463
- [Automatic Speech Recognition](./docs/source/asr/quick_start.md)
393464
- [Introduction](./docs/source/asr/models_introduction.md)
@@ -399,9 +470,12 @@ Normally, [Speech SoTA](https://paperswithcode.com/area/speech), [Audio SoTA](ht
399470
- [Advanced Usage](./docs/source/tts/advanced_usage.md)
400471
- [Chinese Rule Based Text Frontend](./docs/source/tts/zh_text_frontend.md)
401472
- [Test Audio Samples](https://paddlespeech.readthedocs.io/en/latest/tts/demo.html)
402-
- Audio Classification
403-
- Speech Translation
473+
- [Audio Classification](./demos/audio_tagging/README.md)
474+
- [Speech Translation](./demos/speech_translation/README.md)
404475
- [Released Models](./docs/source/released_model.md)
476+
- [Community](#Community)
477+
- [Welcome to contribute](#contribution)
478+
- [License](#License)
405479

406480
The Text-to-Speech module is originally called [Parakeet](https://github.com/PaddlePaddle/Parakeet), and now merged with this repository. If you are interested in academic research about this task, please see [TTS research overview](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/docs/source/tts#overview). Also, [this document](https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/tts/models_introduction.md) is a good guideline for the pipeline components.
407481

@@ -416,7 +490,7 @@ howpublished = {\url{https://github.com/PaddlePaddle/PaddleSpeech}},
416490
year={2021}
417491
}
418492
```
419-
493+
<a name="contribution"></a>
420494
## Contribute to PaddleSpeech
421495

422496
You are warmly welcome to submit questions in [discussions](https://github.com/PaddlePaddle/PaddleSpeech/discussions) and bug reports in [issues](https://github.com/PaddlePaddle/PaddleSpeech/issues)! Also, we highly appreciate if you are willing to contribute to this project!
@@ -460,13 +534,16 @@ You are warmly welcome to submit questions in [discussions](https://github.com/P
460534

461535
## Acknowledgement
462536

463-
- Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling) for years of attention, constructive advice and great help.
537+
538+
- Many thanks to [yeyupiaoling](https://github.com/yeyupiaoling)/[PPASR](https://github.com/yeyupiaoling/PPASR)/[PaddlePaddle-DeepSpeech](https://github.com/yeyupiaoling/PaddlePaddle-DeepSpeech)/[VoiceprintRecognition-PaddlePaddle](https://github.com/yeyupiaoling/VoiceprintRecognition-PaddlePaddle)/[AudioClassification-PaddlePaddle](https://github.com/yeyupiaoling/AudioClassification-PaddlePaddle) for years of attention, constructive advice and great help.
464539
- Many thanks to [AK391](https://github.com/AK391) for TTS web demo on Huggingface Spaces using Gradio.
465540
- Many thanks to [mymagicpower](https://github.com/mymagicpower) for the Java implementation of ASR upon [short](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_sdk) and [long](https://github.com/mymagicpower/AIAS/tree/main/3_audio_sdks/asr_long_audio_sdk) audio files.
466-
541+
- Many thanks to [JiehangXie](https://github.com/JiehangXie)/[PaddleBoBo](https://github.com/JiehangXie/PaddleBoBo) for developing Virtual Uploader(VUP)/Virtual YouTuber(VTuber) with PaddleSpeech TTS function.
542+
- Many thanks to [745165806](https://github.com/745165806)/[PaddleSpeechTask](https://github.com/745165806/PaddleSpeechTask) for contributing Punctuation Restoration model.
467543

468544
Besides, PaddleSpeech depends on a lot of open source repositories. See [references](./docs/source/reference.md) for more information.
469545

546+
<a name="License"></a>
470547
## License
471548

472549
PaddleSpeech is provided under the [Apache-2.0 License](./LICENSE).

0 commit comments

Comments
 (0)