-
Notifications
You must be signed in to change notification settings - Fork 650
PaddleRec Milestone #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4d0624b
ee5d44d
02e66e8
93b9050
fab38a4
3cb4e47
a862f89
6dcb9ca
dc2894c
7a2666e
fbab6e1
922dbcf
fd7df95
3b3637d
abde15c
e957d7b
c30368e
decaa00
7211a0b
308fe8c
c554105
d5b4ffd
3c1f6d2
05b60cc
4cdbb3e
0f78b25
74bb8cf
1125b79
bce83d0
e9011c0
f18eea2
531bf66
e1f9013
b0873dd
12f161d
16966dc
7c10b48
77a2da6
e6f590c
b2a8248
ad9349d
8a96d14
9438cb3
1c8486f
06a44f6
ee59aae
cd7cb08
3aec9f2
fcbb1cc
89b366a
b576b45
07d791e
1fb6396
7d585ea
21ddc26
58a74dc
013c12f
797b515
33fc175
5a0982f
6e3933d
a940d88
b357847
b52f68d
685a856
075eb86
41c7764
5b3e120
f265d5e
1bdfd06
032f657
24d5c80
07dbb36
67869ed
fb7fbb8
8dae87e
00d1d6d
5adee37
9b96156
2a5b531
75b8b0e
70d1dc5
b51cbb3
42bdd47
70f1778
c2b6fc4
7a7843c
e2d2db2
120b2e4
e1911c9
91bc9c9
4f59a38
1fb9c55
47672b3
f1ef14d
0a0fb30
307afea
e97bc2c
3a3a235
67997ab
240a3e6
6fa3457
f5626ba
41c8007
6574436
7d00be9
4590256
6275765
7a3a1dd
4f2fe5a
1e0195c
65fe4a7
f27e824
26a8fe3
2047af5
0ab7d17
946d2a5
32028a9
183ff0a
07456ac
ecdec09
417be99
c528127
524a796
3b59e3c
c051fb3
abb7933
388df4f
3c132e9
5e300b8
f1a9503
52d3c0a
ad5022d
729d5e3
c241990
aa858fe
542838d
4c8c32a
6f1df8f
8ca357d
726c61c
3e0c301
76ee787
8c0b1d1
c624dd5
531bf6f
7aef156
1b2a89f
e8b832c
d1c7476
69dae72
50eeed6
39e76f3
98a47ba
dbf4b13
13e62b7
d7171ec
0e9c4fd
077e682
4cdab80
63dd4d0
4f0c5de
b9be302
07f946a
3865e85
c167bf3
dbf19ca
cb01de6
ce6b34a
5d701fe
3214549
61a7361
0cab80c
e48639c
bb21535
c17f9ae
ded2280
3874172
ece6c4b
d610967
0ae5bca
79252c7
a929a99
9ebd08a
975245e
7dfe461
d12cf52
1fdcf2e
fa509de
5461670
81992dc
a66a717
3cae956
d536f99
6378742
7b457e4
2f710e5
1431b66
a14ff89
77c3e44
4699640
7e1f327
1b1cfee
1c41e7b
17420a0
e9853a9
7ab5d82
741fbbd
89799b5
72e086b
3c915be
91dad67
e57ed51
c67be2a
6361f96
473b742
f2ed7fd
c685a49
0f88900
7cd6838
8d23379
8f5d002
bef91cb
8d82a06
ba023df
801dfd3
7a3ec4e
b11b882
522d4d4
c3b2851
5892301
25fbe51
c6fc563
3e50521
a01831d
c782ddb
b111a04
01d8b29
76471fc
d770eec
af164aa
491c036
43701d7
2d22c88
3882868
3b3ae94
17f07fe
b95b2fe
641a7ef
caf0501
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
*.o | ||
output | ||
.idea/ | ||
paddlerec.egg-info/ | ||
*~ | ||
*.pyc |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,219 @@ | ||
# PaddleRec | ||
推荐算法,大规模并行训练支持 | ||
<p align="center"> | ||
<img align="center" src="doc/imgs/logo.png"> | ||
<p> | ||
|
||
<p align="center"> | ||
<br> | ||
<img alt="Release" src="https://img.shields.io/badge/Release-0.1.0-yellowgreen"> | ||
<img alt="License" src="https://img.shields.io/github/license/PaddlePaddle/Serving"> | ||
<img alt="Slack" src="https://img.shields.io/badge/Join-Slack-green"> | ||
<br> | ||
<p> | ||
|
||
|
||
<h2 align="center">什么是PaddleRec</h2> | ||
|
||
<p align="center"> | ||
<img align="center" src="doc/imgs/structure.png"> | ||
<p> | ||
|
||
- 源于飞桨生态的搜索推荐模型**一站式开箱即用工具** | ||
- 适合初学者,开发者,研究者从调研,训练到预测部署的全流程解决方案 | ||
- 包含语义理解、召回、粗排、精排、多任务学习、融合等多个任务的推荐搜索算法库 | ||
- 配置**yaml**自定义选项,即可快速上手使用单机训练、大规模分布式训练、离线预测、在线部署 | ||
|
||
|
||
<h2 align="center">PadlleRec概览</h2> | ||
|
||
<p align="center"> | ||
<img align="center" src="doc/imgs/overview.png"> | ||
<p> | ||
|
||
|
||
<h2 align="center">推荐系统-流程概览</h2> | ||
|
||
<p align="center"> | ||
<img align="center" src="doc/imgs/rec-overview.png"> | ||
<p> | ||
|
||
<h2 align="center">便捷安装</h2> | ||
|
||
### 环境要求 | ||
* Python 2.7/ 3.5 / 3.6 / 3.7 | ||
* PaddlePaddle >= 1.7.2 | ||
* 操作系统: Windows/Mac/Linux | ||
|
||
> Windows下目前仅提供单机训练,建议使用Linux | ||
|
||
### 安装命令 | ||
|
||
- 安装方法一<PIP源直接安装>: | ||
```bash | ||
python -m pip install paddle-rec | ||
``` | ||
|
||
- 安装方法二 | ||
|
||
源码编译安装 | ||
1. 安装飞桨 **注:需要用户安装版本 >1.7.2 的飞桨** | ||
|
||
```shell | ||
python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple | ||
``` | ||
|
||
2. 源码安装PaddleRec | ||
|
||
``` | ||
git clone https://github.com/PaddlePaddle/PaddleRec/ | ||
cd PaddleRec | ||
python setup.py install | ||
``` | ||
|
||
|
||
<h2 align="center">快速启动</h2> | ||
|
||
### 启动内置模型的默认配置 | ||
|
||
目前框架内置了多个模型,一行命令即可使用内置模型开始单机训练和本地模拟分布式训练。 | ||
> 本地模拟分布式(`local_cluster`)为`1个server + 1个trainer`的参数服务器模式 | ||
|
||
|
||
我们以排序模型中的`dnn`模型为例介绍PaddleRec的简单使用。训练数据来源为[Criteo数据集](https://www.kaggle.com/c/criteo-display-ad-challenge/),我们从中截取了100条方便您快速上手体验完整的PaddleRec流程。 | ||
|
||
```bash | ||
# 使用CPU进行单机训练 | ||
python -m paddlerec.run -m paddlerec.models.rank.dnn | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 使用的训练数据是否需要说明一下? |
||
``` | ||
|
||
### 启动内置模型的自定配置 | ||
|
||
若您复用内置模型,对**yaml**配置文件进行了修改,如更改超参,重新配置数据后,可以直接使用paddlerec运行该yaml文件。 | ||
|
||
我们以dnn模型为例,在paddlerec代码目录下: | ||
```bash | ||
cd paddlerec | ||
``` | ||
|
||
修改dnn模型的[超参配置](./models/rank/dnn/config.yaml),例如将迭代训练轮数从10轮修改为5轮: | ||
```yaml | ||
train: | ||
# epochs: 10 | ||
epochs: 5 | ||
``` | ||
|
||
在Linux环境下,可以使用`vim`等文本编辑工具修改yaml文件: | ||
|
||
```bash | ||
vim ./models/rank/dnn/config.yaml | ||
# 键入 i, 进入编辑模式 | ||
# 修改yaml文件配置 | ||
# 完成修改后,点击esc,退出编辑模式 | ||
# 键入 :wq 保存文件并退出 | ||
``` | ||
|
||
完成dnn模型`models/rank/dnn/config.yaml`的配置修改后,运行`dnn`模型: | ||
```bash | ||
# 使用自定配置进行训练 | ||
python -m paddlerec.run -m ./models/rank/dnn/config.yaml | ||
``` | ||
|
||
### 分布式训练 | ||
|
||
分布式训练需要配置`config.yaml`,加入或修改`engine`选项为`cluster`或`local_cluster`,以进行分布式训练,或本地模拟分布式训练。 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
#### 本地模拟分布式训练 | ||
|
||
我们以dnn模型为例,在paddlerec代码目录下,修改dnn模型的`config.yaml`文件: | ||
|
||
```yaml | ||
train: | ||
#engine: single | ||
engine: local_cluster | ||
``` | ||
然后启动paddlerec训练: | ||
|
||
```bash | ||
# 进行本地模拟分布式训练 | ||
python -m paddlerec.run -m ./models/rank/dnn/config.yaml | ||
``` | ||
|
||
#### 集群分布式训练 | ||
|
||
我们以dnn模型为例,在paddlerec代码目录下,首先修改dnn模型`config.yaml`文件: | ||
|
||
```yaml | ||
train: | ||
#engine: single | ||
engine: cluster | ||
``` | ||
再添加分布式启动配置文件`backend.yaml`,具体配置规则在[分布式训练](doc/distributed_train.md)教程中介绍。最后启动paddlerec训练: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 如果这个yaml的用法是我们主推的,最好在readme里面体现 |
||
|
||
```bash | ||
# 配置好 mpi/k8s/paddlecloud集群环境后 | ||
python -m paddlerec.run -m ./models/rank/dnn/config.yaml -b backend.yaml | ||
``` | ||
|
||
|
||
<h2 align="center">支持模型列表</h2> | ||
|
||
|
||
| 方向 | 模型 | 单机CPU训练 | 单机GPU训练 | 分布式CPU训练 | | ||
| :------: | :-----------------------------------------------------------------------: | :---------: | :---------: | :-----------: | | ||
| 内容理解 | [Text-Classifcation](models/contentunderstanding/classification/model.py) | ✓ | x | ✓ | | ||
| 内容理解 | [TagSpace](models/contentunderstanding/tagspace/model.py) | ✓ | x | ✓ | | ||
| 召回 | [DSSM](models/match/dssm/model.py) | ✓ | x | ✓ | | ||
| 召回 | [MultiView-Simnet](models/match/multiview-simnet/model.py) | ✓ | x | ✓ | | ||
| 召回 | [TDM](models/treebased/tdm/model.py) | ✓ | x | ✓ | | ||
| 召回 | [Word2Vec](models/recall/word2vec/model.py) | ✓ | x | ✓ | | ||
| 召回 | [SSR](models/recall/ssr/model.py) | ✓ | ✓ | ✓ | | ||
| 召回 | [Gru4Rec](models/recall/gru4rec/model.py) | ✓ | ✓ | ✓ | | ||
| 召回 | [Youtube_dnn](models/recall/youtube_dnn/model.py) | ✓ | ✓ | ✓ | | ||
| 召回 | [NCF](models/recall/ncf/model.py) | ✓ | ✓ | ✓ | | ||
| 排序 | [Dnn](models/rank/dnn/model.py) | ✓ | x | ✓ | | ||
| 排序 | [DeepFM](models/rank/deepfm/model.py) | ✓ | x | ✓ | | ||
| 排序 | [xDeepFM](models/rank/xdeepfm/model.py) | ✓ | x | ✓ | | ||
| 排序 | [DIN](models/rank/din/model.py) | ✓ | x | ✓ | | ||
| 排序 | [Wide&Deep](models/rank/wide_deep/model.py) | ✓ | x | ✓ | | ||
| 多任务 | [ESMM](models/multitask/esmm/model.py) | ✓ | ✓ | ✓ | | ||
| 多任务 | [MMOE](models/multitask/mmoe/model.py) | ✓ | ✓ | ✓ | | ||
| 多任务 | [ShareBottom](models/multitask/share-bottom/model.py) | ✓ | ✓ | ✓ | | ||
|
||
|
||
|
||
|
||
<h2 align="center">文档</h2> | ||
|
||
### 背景介绍 | ||
* [推荐系统介绍](doc/rec_background.md) | ||
* [分布式深度学习介绍](doc/ps_background.md) | ||
|
||
### 新手教程 | ||
* [环境要求](#环境要求) | ||
* [安装命令](#安装命令) | ||
* [快速开始](#启动内置模型的默认配置) | ||
|
||
### 进阶教程 | ||
* [自定义数据集及Reader](doc/custom_dataset_reader.md) | ||
* [分布式训练](doc/distributed_train.md) | ||
|
||
### 开发者教程 | ||
* [PaddleRec设计文档](doc/design.md) | ||
|
||
### 关于PaddleRec性能 | ||
* [Benchmark](doc/benchmark.md) | ||
|
||
### FAQ | ||
* [常见问题FAQ](doc/faq.md) | ||
|
||
|
||
<h2 align="center">社区</h2> | ||
|
||
### 反馈 | ||
如有意见、建议及使用中的BUG,欢迎在`GitHub Issue`提交 | ||
|
||
### 版本历史 | ||
- 2020.5.14 - PaddleRec v0.1 | ||
|
||
### 许可证书 | ||
本项目的发布受[Apache 2.0 license](LICENSE)许可认证。 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
#!/bin/bash | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. license |
||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
################################################### | ||
# Usage: submit.sh | ||
# Description: run mpi submit client implement | ||
################################################### | ||
|
||
# ---------------------------------------------------------------------------- # | ||
# variable define # | ||
# ---------------------------------------------------------------------------- # | ||
|
||
#----------------------------------------------------------------------------------------------------------------- | ||
#fun : package | ||
#param : N/A | ||
#return : 0 -- success; not 0 -- failure | ||
#----------------------------------------------------------------------------------------------------------------- | ||
function package_hook() { | ||
g_run_stage="package" | ||
package | ||
} | ||
|
||
#----------------------------------------------------------------------------------------------------------------- | ||
#fun : before hook submit to cluster | ||
#param : N/A | ||
#return : 0 -- success; not 0 -- failure | ||
#----------------------------------------------------------------------------------------------------------------- | ||
function _before_submit() { | ||
echo "before_submit" | ||
before_submit_hook | ||
} | ||
|
||
#----------------------------------------------------------------------------------------------------------------- | ||
#fun : after hook submit to cluster | ||
#param : N/A | ||
#return : 0 -- success; not 0 -- failure | ||
#----------------------------------------------------------------------------------------------------------------- | ||
function _after_submit() { | ||
echo "after_submit" | ||
after_submit_hook | ||
} | ||
|
||
#----------------------------------------------------------------------------------------------------------------- | ||
#fun : submit to cluster | ||
#param : N/A | ||
#return : 0 -- success; not 0 -- failure | ||
#----------------------------------------------------------------------------------------------------------------- | ||
function _submit() { | ||
g_run_stage="submit" | ||
|
||
cd ${engine_temp_path} | ||
|
||
paddlecloud job --ak ${engine_submit_ak} --sk ${engine_submit_sk} train --cluster-name ${engine_submit_cluster} \ | ||
--job-version ${engine_submit_version} \ | ||
--mpi-priority ${engine_submit_priority} \ | ||
--mpi-wall-time 300:59:00 \ | ||
--mpi-nodes ${engine_submit_nodes} --is-standalone 0 \ | ||
--mpi-memory 110Gi \ | ||
--job-name ${engine_submit_jobname} \ | ||
--start-cmd "${g_run_cmd}" \ | ||
--group-name ${engine_submit_group} \ | ||
--job-conf ${engine_submit_config} \ | ||
--files ${g_submitfiles} \ | ||
--json | ||
|
||
cd - | ||
} | ||
|
||
function submit_hook() { | ||
_before_submit | ||
_submit | ||
_after_submit | ||
} | ||
|
||
function main() { | ||
source ${engine_submit_scrpit} | ||
|
||
package_hook | ||
submit_hook | ||
} | ||
|
||
main |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
from __future__ import print_function | ||
from __future__ import unicode_literals | ||
|
||
import copy | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. import请用字母序 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
import os | ||
import subprocess | ||
|
||
from paddlerec.core.engine.engine import Engine | ||
from paddlerec.core.factory import TrainerFactory | ||
from paddlerec.core.utils import envs | ||
|
||
|
||
class ClusterEngine(Engine): | ||
def __init_impl__(self): | ||
abs_dir = os.path.dirname(os.path.abspath(__file__)) | ||
backend = envs.get_runtime_environ("engine_backend") | ||
if backend == "PaddleCloud": | ||
self.submit_script = os.path.join(abs_dir, "cloud/cluster.sh") | ||
else: | ||
raise ValueError("{} can not be supported now".format(backend)) | ||
|
||
def start_worker_procs(self): | ||
trainer = TrainerFactory.create(self.trainer) | ||
trainer.run() | ||
|
||
def start_master_procs(self): | ||
default_env = os.environ.copy() | ||
current_env = copy.copy(default_env) | ||
current_env.pop("http_proxy", None) | ||
current_env.pop("https_proxy", None) | ||
|
||
cmd = ("bash {}".format(self.submit_script)).split(" ") | ||
proc = subprocess.Popen(cmd, env=current_env, cwd=os.getcwd()) | ||
proc.wait() | ||
|
||
def run(self): | ||
role = envs.get_runtime_environ("engine_role") | ||
|
||
if role == "MASTER": | ||
self.start_master_procs() | ||
|
||
elif role == "WORKER": | ||
self.start_worker_procs() | ||
|
||
else: | ||
raise ValueError("role {} error, must in MASTER/WORKER".format(role)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是否支持mac和windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
支持mac和windows, windows目前只能提供单机训练,这个在文档中会说明