Skip to content

Commit 5d7e913

Browse files
committed
Merge branch 'dev' of github.com:PaddlePaddle/PaddleHelix into dev
2 parents 922944a + 351a5f1 commit 5d7e913

40 files changed

+1110
-1195
lines changed

README_cn.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@
1616
1717
## 特色
1818

19-
* **高性能**提供了 LinearRNA 系列高性能算法助力 RNA 结构预测和分析。例如,LinearFold 和 LinearPartition 能够迅速准确定位能量较低 RNA 二级结构,性能相比传统方法提升数百甚至上千倍。
19+
* **高性能**提供了LinearRNA系列高性能算法助力 RNA 结构预测和分析。例如,LinearFold 和 LinearPartition 能够迅速准确定位能量较低 RNA 二级结构,性能相比传统方法提升数百甚至上千倍。
2020
<p align="center">
2121
<img src="./.github/LinearRNA.jpg" align="middle" />
2222
</p>
2323

24-
* 由大规模 **表示预训练**支撑的生物计算工具:随着自监督学习用于分子表示训练的进展,为样本量非常稀少的很多生物计算任务带来了全新的突破,这些任务包括分子性质预测,药物-靶点相互作用,蛋白质-蛋白质相互作用,RNA-RNA 相互作用,蛋白质折叠,RNA 折叠等等领域。螺旋桨广泛提供了业界最领先的表示学习方法和模型,使得开发者可以基于大规模模型快速切入需求的任务,站在巨人的肩膀上。
24+
* 由大规模**表示预训练**支撑的生物计算工具:随着自监督学习用于分子表示训练的进展,为样本量非常稀少的很多生物计算任务带来了全新的突破,这些任务包括分子性质预测,药物-靶点相互作用,蛋白质-蛋白质相互作用,RNA-RNA 相互作用,蛋白质折叠,RNA 折叠等等领域。螺旋桨广泛提供了业界最领先的表示学习方法和模型,使得开发者可以基于大规模模型快速切入需求的任务,站在巨人的肩膀上。
2525
<p align="center">
2626
<img src="./.github/paddlehelix_features.jpg" align="middle" />
2727
</p>
@@ -37,19 +37,18 @@
3737

3838
### 教学
3939
* 我们提供了大量的[教学实例](./tutorials)以方便开发者快速了解和使用该框架
40-
* PaddleHelix 基于[飞桨(PaddlePaddle)](https://github.com/paddlepaddle/paddle)开源深度学习框架实现,该框架在性能表现上尤其出色。
40+
* PaddleHelix基于[飞桨(PaddlePaddle)](https://github.com/paddlepaddle/paddle)开源深度学习框架实现,该框架在性能表现上尤其出色。
4141

4242
### 使用示例
4343
* [表示学习 - 化合物](./apps/pretrained_compound/README_cn.md)
4444
* [表示学习 - 蛋白质](./apps/pretrained_protein/README_cn.md)
4545
* [药物-分子作用预测](./apps/drug_target_interaction/README_cn.md)
4646
* [分子生成](./apps/molecular_generation/README_cn.md)
4747
* [药物联用](./apps/drug_drug_synergy/README_cn.md)
48-
* [药物-分子作用预测](./apps/drug_target_interaction/README_cn.md)
4948
* [LinearRNA](./c/pahelix/toolkit/linear_rna/README_cn.md)
5049

5150
### API 文档
52-
* 如果你对 PaddleHelix 的详细接口感兴趣,请查阅 [API 文档](https://paddlehelix.readthedocs.io/en/dev/)
51+
* 如果你对PaddleHelix的详细接口感兴趣,请查阅[API 文档](https://paddlehelix.readthedocs.io/en/dev/)
5352

5453
### 开发者指南
55-
* 如果你需要修改 PaddleHelix 的源代码,请查阅我们提供的[开发者指南](./developer_guide_cn.md)
54+
* 如果你需要修改PaddleHelix的源代码,请查阅我们提供的[开发者指南](./developer_guide_cn.md)

apps/drug_drug_synergy/README.md

Lines changed: 3 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,7 @@
1-
# DDs(Drug Drug synergy)
1+
# drug_drug_synergy
22

33
[中文版本](./README_cn.md) [English Version](./README.md)
44

5-
* [Background](#background)
6-
* [Datasets](#datasets)
7-
* [ddi](#ddi)
8-
* [dti](#dti)
9-
* [ppi](#ppi)
10-
* [features](#features)
11-
* [Instructions](#instructions)
12-
* [Training and Evaluation](#train-and-evaluation)
13-
* [Reference](#reference)
5+
We provide the following pretrained protein methods.
146

15-
## Background
16-
17-
Drug combinations, also known as combinatorial therapy, are frequently prescribed to treat patients with complex disease. Graph convolutional network(GCN) can be used to predict drug-drug synergy by intergrating multiple biological networks.
18-
19-
## Datasets
20-
Drug-drug synergy information and drug physi-chemical features can be put under `data` folder. Then let us create `ddi`, `dti` and `ppi` folder under `data` folder.
21-
### ddi
22-
23-
```sh
24-
cd data/ddi && wget "http://www.bioinf.jku.at/software/DeepSynergy/labels.csv"
25-
```
26-
27-
### dti
28-
29-
### ppi
30-
31-
32-
## Instructions
33-
For illustration, we provide a python script `train.py`.
34-
Its usage is:
35-
```
36-
CUDA_VISIBLE_DEVICES=0 python3 train.py
37-
--ddi ./data/ddi/DDs.csv
38-
--dti ./data/dti/drug_protein_links.tsv
39-
--ppi ./data/ppi/protein_protein_links.txt
40-
--d_feat ./data/all_drugs_name.fet
41-
--epochs 10
42-
--num_graph 10
43-
--sub_neighbours 10 10
44-
--cuda
45-
```
46-
Notice that if you only have CPU machine, just remove `--cuda`.
47-
48-
## Reference
49-
**RGCN**
50-
> @article{jiang2020deep,
51-
title={Deep graph embedding for prioritizing synergistic anticancer drug combinations},
52-
author={Jiang, Peiran and Huang, Shujun and Fu, Zhenyuan and Sun, Zexuan and Lakowski, Ted M and Hu, Pingzhao},
53-
journal={Computational and structural biotechnology journal},
54-
volume={18},
55-
pages={427--438},
56-
year={2020},
57-
publisher={Elsevier}
58-
}
7+
* [RGCN](./RGCN/README.md)

apps/drug_drug_synergy/README_cn.md

Lines changed: 3 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,7 @@
1-
# DDs(Drug Drug synergy)
1+
# 药物协同预测模型
22

33
[中文版本](./README_cn.md) [English Version](./README.md)
44

5-
* [背景介绍](#背景介绍)
6-
* [数据集](#数据集)
7-
* [ddi](#ddi)
8-
* [dti](#dti)
9-
* [ppi](#ppi)
10-
* [特征集](#特征集)
11-
* [使用说明](#使用说明)
12-
* [训练与评估](#训练与评估)
13-
* [引用](#引用)
5+
我们提供以下双摇联用协调性预测模型。
146

15-
## 背景
16-
药物联用,也叫做协同治疗,通常在应对复杂疾病时使用。而图神经网络能够结合多种生物学网络从而来预测药物的协同作用。
17-
## 数据集
18-
药物协同的分值文件和药物的理化特征信息文件在 `data` 文件夹下. 首先在`data` 文件夹下创建`ddi`, `dti``ppi`文件夹。
19-
### ddi
20-
```sh
21-
cd data/ddi && wget "http://www.bioinf.jku.at/software/DeepSynergy/labels.csv"
22-
```
23-
### dti
24-
### ppi
25-
26-
## 使用说明
27-
为了方便展示,我们构建了一个脚本, `train.py`.
28-
用法如下:
29-
```
30-
CUDA_VISIBLE_DEVICES=0 python3 train.py
31-
--ddi ./data/ddi/DDs.csv
32-
--dti ./data/dti/drug_protein_links.tsv
33-
--ppi ./data/ppi/protein_protein_links.txt
34-
--d_feat ./data/all_drugs_name.fet
35-
--epochs 10
36-
--num_graph 10
37-
--sub_neighbours 10 10
38-
--cuda
39-
```
40-
请注意,如果训练环境没有GPU,去掉`--cuda`即可。
41-
## 引用
42-
**RGCN**
43-
> @article{jiang2020deep,
44-
title={Deep graph embedding for prioritizing synergistic anticancer drug combinations},
45-
author={Jiang, Peiran and Huang, Shujun and Fu, Zhenyuan and Sun, Zexuan and Lakowski, Ted M and Hu, Pingzhao},
46-
journal={Computational and structural biotechnology journal},
47-
volume={18},
48-
pages={427--438},
49-
year={2020},
50-
publisher={Elsevier}
51-
}
7+
* [RGCN](./RGCN/README_cn.md)

apps/drug_drug_synergy/RGCN/README.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# DDs(Drug Drug synergy)
2+
3+
[中文版本](./README_cn.md) [English Version](./README.md)
4+
5+
* [Background](#background)
6+
* [Datasets](#datasets)
7+
* [ddi](#ddi)
8+
* [dti](#dti)
9+
* [ppi](#ppi)
10+
* [features](#features)
11+
* [Instructions](#instructions)
12+
* [Training and Evaluation](#train-and-evaluation)
13+
* [Reference](#reference)
14+
15+
## Background
16+
17+
Drug combinations, also known as combinatorial therapy, are frequently prescribed to treat patients with complex disease. Graph convolutional network(GCN) can be used to predict drug-drug synergy by intergrating multiple biological networks.
18+
19+
## Datasets
20+
Drug-drug synergy information and drug physi-chemical features can be put under `data` folder.
21+
### ddi
22+
23+
```sh
24+
cd data && mkdir DDI && wget "http://www.bioinf.jku.at/software/DeepSynergy/labels.csv"
25+
```
26+
27+
### dti
28+
```sh
29+
cd data && wget "https://baidu-nlp.bj.bcebos.com/PaddleHelix/datasets/drug_synergy_datasets/dti.tgz" && tar xzvf dti.tgz
30+
```
31+
32+
### ppi
33+
```sh
34+
cd data && wget "https://baidu-nlp.bj.bcebos.com/PaddleHelix/datasets/drug_synergy_datasets/ppi.tgz" && tar xzvf ppi.tgz
35+
```
36+
37+
### drug features
38+
```sh
39+
cd data && wget "https://baidu-nlp.bj.bcebos.com/PaddleHelix/datasets/drug_synergy_datasets/drug_feat.tgz" && tar xzvf drug_feat.tgz
40+
```
41+
42+
## Instructions
43+
For illustration, we provide a python script `train.py`.
44+
Its usage is:
45+
```
46+
CUDA_VISIBLE_DEVICES=0 python3 train.py
47+
--ddi ./data/DDI/DDs.csv
48+
--dti ./data/DTI/drug_protein_links.tsv
49+
--ppi ./data/PPI/protein_protein_links.txt
50+
--d_feat ./data/all_drugs_name.fet
51+
--epochs 10
52+
--num_graph 10
53+
--sub_neighbours 10 10
54+
--cuda
55+
```
56+
Notice that if you only have CPU machine, just remove `--cuda`.
57+
58+
## Reference
59+
**RGCN**
60+
> @article{jiang2020deep,
61+
title={Deep graph embedding for prioritizing synergistic anticancer drug combinations},
62+
author={Jiang, Peiran and Huang, Shujun and Fu, Zhenyuan and Sun, Zexuan and Lakowski, Ted M and Hu, Pingzhao},
63+
journal={Computational and structural biotechnology journal},
64+
volume={18},
65+
pages={427--438},
66+
year={2020},
67+
publisher={Elsevier}
68+
}
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# DDs(Drug Drug synergy)
2+
3+
[中文版本](./README_cn.md) [English Version](./README.md)
4+
5+
* [背景介绍](#背景介绍)
6+
* [数据集](#数据集)
7+
* [ddi](#ddi)
8+
* [dti](#dti)
9+
* [ppi](#ppi)
10+
* [特征集](#特征集)
11+
* [使用说明](#使用说明)
12+
* [训练与评估](#训练与评估)
13+
* [引用](#引用)
14+
15+
## 背景
16+
药物联用,也叫做协同治疗,通常在应对复杂疾病时使用。而图神经网络能够结合多种生物学网络从而来预测药物的协同作用。
17+
## 数据集
18+
药物协同的分值文件和药物的理化特征信息文件在 `data` 文件夹下。
19+
### ddi
20+
```sh
21+
cd data/ddi && wget "http://www.bioinf.jku.at/software/DeepSynergy/labels.csv"
22+
```
23+
### dti
24+
```sh
25+
cd data && wget "https://baidu-nlp.bj.bcebos.com/PaddleHelix/datasets/drug_synergy_datasets/dti.tgz" && tar xzvf dti.tgz
26+
```
27+
28+
### ppi
29+
```sh
30+
cd data && wget "https://baidu-nlp.bj.bcebos.com/PaddleHelix/datasets/drug_synergy_datasets/ppi.tgz" && tar xzvf ppi.tgz
31+
```
32+
33+
### drug features
34+
```sh
35+
cd data && wget "https://baidu-nlp.bj.bcebos.com/PaddleHelix/datasets/drug_synergy_datasets/drug_feat.tgz" && tar xzvf drug_feat.tgz
36+
```
37+
38+
## 使用说明
39+
为了方便展示,我们构建了一个脚本, `train.py`.
40+
用法如下:
41+
```
42+
CUDA_VISIBLE_DEVICES=0 python3 train.py
43+
--ddi ./data/DDI/DDs.csv
44+
--dti ./data/DTI/drug_protein_links.tsv
45+
--ppi ./data/PPI/protein_protein_links.txt
46+
--d_feat ./data/all_drugs_name.fet
47+
--epochs 10
48+
--num_graph 10
49+
--sub_neighbours 10 10
50+
--cuda
51+
```
52+
请注意,如果训练环境没有GPU,去掉`--cuda`即可。
53+
## 引用
54+
**RGCN**
55+
> @article{jiang2020deep,
56+
title={Deep graph embedding for prioritizing synergistic anticancer drug combinations},
57+
author={Jiang, Peiran and Huang, Shujun and Fu, Zhenyuan and Sun, Zexuan and Lakowski, Ted M and Hu, Pingzhao},
58+
journal={Computational and structural biotechnology journal},
59+
volume={18},
60+
pages={427--438},
61+
year={2020},
62+
publisher={Elsevier}
63+
}

apps/drug_drug_synergy/R_model.py renamed to apps/drug_drug_synergy/RGCN/R_model.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
import pandas as pd
3333
import numpy as np
3434

35-
def Decagon_norm(graph, feature, edges):
35+
def decagon_norm(graph, feature, edges):
3636
"""
3737
Relation Graph Neural Network degree normalization method
3838
"""
@@ -96,6 +96,7 @@ def __init__(self, in_dim, out_dim, etypes, num_bases=0, act='relu', norm=True):
9696
)
9797
self.act = act
9898
self.norm = norm
99+
99100
def forward(self, graph, feat):
100101
"""Forward
101102
Args:
@@ -221,4 +222,4 @@ def negative_Sampling(label):
221222
val_neg_idx = np.random.choice(len(neg_pos[0]), num_neg)
222223
valid[(neg_pos[0][val_neg_idx], neg_pos[1][val_neg_idx])] = -1
223224

224-
return paddle.to_tensor(valid.astype('float32'))
225+
return valid.astype('float32')

apps/drug_drug_synergy/graphsage_sampling.py renamed to apps/drug_drug_synergy/RGCN/graphsage_sampling.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ def graphsage_sampling(hg, start_nodes, num_neighbours=10, etype='dti'):
5353

5454
return qualified_neighs, qualified_eids
5555

56+
5657
def subgraph_gen(hg, label_idx, neighbours=[10, 10]):
5758
"""
5859
Subgraph sampling by graphsage_sampling

apps/drug_drug_synergy/train.py renamed to apps/drug_drug_synergy/RGCN/train.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,7 @@
4141
from pahelix.featurizers import het_gnn_featurizer
4242

4343

44-
45-
def Train(num_subgraph, graph, label_idx, epochs, sub_neighbours=[10, 10], init=True):
44+
def train(num_subgraph, graph, label_idx, epochs, sub_neighbours=[10, 10], init=True):
4645
"""
4746
Model training for one epoch and return training loss and validation loss.
4847
"""
@@ -105,6 +104,7 @@ def eval(model, graph, label, sub_neighbours, criterion):
105104
loss = criterion(pred, label)
106105
return pred, loss
107106

107+
108108
def train_val_plot(training_loss, val_loss, figure_name='loss_figure.pdf'):
109109
"""
110110
Plot the training loss figure.
@@ -115,6 +115,7 @@ def train_val_plot(training_loss, val_loss, figure_name='loss_figure.pdf'):
115115
axx.legend(['training loss', 'val loss'])
116116
fig.savefig(figure_name)
117117

118+
118119
def main(ddi, dti, ppi, d_feat, epochs=10, num_subgraph=20, sub_neighbours=[10, 10], cuda=False):
119120
"""
120121
Args:
@@ -141,7 +142,7 @@ def main(ddi, dti, ppi, d_feat, epochs=10, num_subgraph=20, sub_neighbours=[10,
141142
value = drug_feat.collate_fn(ddi, dti, ppi, d_feat)
142143
hg, nodes_dict, label, label_idx = value['rt']
143144

144-
trained_model = Train(num_subgraph, hg, label_idx, epochs, [25, 25])
145+
trained_model = train(num_subgraph, hg, label_idx, epochs, args.sub_neighbours)
145146

146147
return trained_model
147148

0 commit comments

Comments
 (0)