Skip to content

Commit 29bcb56

Browse files
authored
Merge pull request #288 from PaddlePaddle/helixdock
Update README and latest news
2 parents b3974fd + 871c7cb commit 29bcb56

File tree

4 files changed

+126
-0
lines changed

4 files changed

+126
-0
lines changed

README.md

+5
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,11 @@ English | [简体中文](README_cn.md)
1414
## Latest News
1515
`2024.05.23` PaddleHelix released the codes of HelixDock, a pre-training model on large-scale generated docking conformations to unlock the potential of protein-ligand structure prediction, significantly improving prediction accuracy and generalizability. Please refer to [paper]([https://arxiv.org/abs/2310.13913) and [codes](./apps/molecular_docking/helixdock) for more details. Welcome to [PaddleHelix website](https://paddlehelix.baidu.com/app/drug/helix-dock/forecast)to try out the structure prediction online service.
1616

17+
`2024.05.13` Paper "Multi-purpose RNA Language Modeling with Motif-aware Pre-training and Type-guided Fine-tuning" is accepted by Nature Machine Intelligence. Please refer to [paper](https://www.nature.com/articles/s42256-024-00836-4) and [codes](https://github.com/CatIIIIIIII/RNAErnie) for more details.
18+
19+
20+
`2024.04.16` PaddleHelix released the technical report of HelixFold-Multimer, a protein complex structure prediction model which achieves remarkable success in antigen-antibody and peptide-protein structure prediction. Please refer to the [report](https://arxiv.org/abs/2404.10260v2) for more details. The online structure prediction services for general and antigen-antibody protein complex are now available at [link1](https://paddlehelix.baidu.com/app/drug/protein-complex/forecast) and [link2](https://paddlehelix.baidu.com/app/drug/KYKT/forecast) on the PaddleHelix platform respectively.
21+
1722
`2022.12.08` Paper "HelixMO: Sample-Efficient Molecular Optimization in Scene-Sensitive Latent Space" is accepted by **BIBM 2022**. Please refere to [link1](https://www.computer.org/csdl/proceedings-article/bibm/2022/09995561/1JC23yWxizC) or [link2](https://aps.arxiv.org/abs/2112.00905) for more details. We also deployed the drug design service on the website [PaddleHelix](https://paddlehelix.baidu.com/app/drug/drugdesign/forecast).
1823

1924
`2022.08.11` PaddleHelix released the codes of HelixGEM-2, a novel Molecular Property Prediction Network that models full-range many-body interactions. And it ranked 1st in the OGB [PCQM4Mv2](https://ogb.stanford.edu/docs/lsc/leaderboards/) leaderboard. Please refer to [paper](https://arxiv.org/abs/2208.05863) and [codes](./apps/pretrained_compound/ChemRL/GEM-2) for more details.

README_cn.md

+6
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@
1010
![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
1111

1212
## 最新消息
13+
`2024.05.23`,螺旋桨团队开源了HelixDock的代码,它是一个针对大规模生成的对接构象进行预训练的模型,旨在释放蛋白质-配体结构预测的潜力,显著提高了预测准确性和泛化能力。更多详情请参考[论文]([https://arxiv.org/abs/2310.13913)[代码](./apps/molecular_docking/helixdock)。欢迎访问[PaddleHelix网站](https://paddlehelix.baidu.com/app/drug/helix-dock/forecast)尝试在线结构预测服务。
14+
15+
`2024.05.13` 论文 "Multi-purpose RNA Language Modeling with Motif-aware Pre-training and Type-guided Fine-tuning" 被 Nature Machine Intelligence期刊接收。获取更多细节请参考[论文](https://www.nature.com/articles/s42256-024-00836-4)[代码](https://github.com/CatIIIIIIII/RNAErnie)
16+
17+
`2024.04.16` 螺旋桨团队发布了《HelixFold-Multimer技术报告》,它是一个蛋白质复合物结构预测模型,在抗原-抗体和肽-蛋白质结构预测方面取得了显著成功。更多详情请参考[报告](https://arxiv.org/abs/2404.10260v2)。螺旋桨平台上现已提供通用和抗原-抗体蛋白质复合物的在线结构预测服务,分别位于[链接1](https://paddlehelix.baidu.com/app/drug/protein-complex/forecast)[链接2](https://paddlehelix.baidu.com/app/drug/KYKT/forecast)
18+
1319
`2022.12.08` 论文"HelixMO: Sample-Efficient Molecular Optimization in Scene-Sensitive Latent Space"被**BIBM 2022**接收。详情参见[链接1](https://www.computer.org/csdl/proceedings-article/bibm/2022/09995561/1JC23yWxizC)[链接2](https://aps.arxiv.org/abs/2112.00905)去获得更多信息。也欢迎到我们的服务平台[PaddleHelix](https://paddlehelix.baidu.com/app/drug/drugdesign/forecast)试用药物设计服务.
1420

1521
`2022.08.11` 螺旋桨团队开源了HelixGEM-2的代码, 它是一个全新的基于长程多体建模的小分子属性预测框架,并在OGB [PCQM4Mv2](https://ogb.stanford.edu/docs/lsc/leaderboards/) 排行榜取得第一的成绩。详情参见 [论文](https://arxiv.org/abs/2208.05863)[代码](./apps/pretrained_compound/ChemRL/GEM-2)

apps/molecular_docking/helixdock/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
English | [简体中文](README_cn.md)
12
# HelixDock:Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models
23
This repository contains the implementation for our [paper](https://arxiv.org/abs/2310.13913).
34

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
[English](README.md) | 简体中文
2+
# HelixDock:Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models
3+
本仓库包含我们的[论文工作](https://arxiv.org/abs/2310.13913)代码实现。
4+
5+
蛋白质-配体结构预测在药物发现中至关重要,可用于确定小分子(配体)与靶蛋白(受体)之间的相互作用。传统的基于物理的对接工具虽然被广泛使用,但由于构象采样有限和评分函数不精确,其准确性受到影响。尽管一些工作尝试利用深度学习的进展提高预测准确性,但由于训练数据有限,效果仍有可以提升的空间。
6+
7+
8+
HelixDock通过在传统物理对接工具生成的大规模对接构象上进行预训练,然后使用有限的经实验验证的受体-配体复合物进行微调,解决了这些挑战。这种方法显著提高了预测准确性和模型的泛化能力。经过与基于物理和深度学习的基线进行严格对比,HelixDock展示出了卓越的精度和强大的迁移能力。
9+
HelixDock还在交叉对接和基于结构的虚拟筛选基准中表现出色,在实际的虚拟筛选项目中成功识别出高活性的抑制剂。
10+
11+
## 在线服务
12+
我们同时也提供了免安装的在线预测服务[螺旋桨 HelixDock-预测](https://paddlehelix.baidu.com/app/drug/helix-dock/forecast)
13+
14+
## 环境
15+
16+
17+
## 安装
18+
除了`requirements.txt`中列出的工具外,还需要`openbabel`工具来计算预测构象与晶体构象之间的对齐RMSD。你可以使用以下命令来安装环境。
19+
```bash
20+
conda create -n helixdock python=3.7
21+
conda activate helixdock
22+
python install -r requirements.txt
23+
conda install openbabel==2.4.1 -c conda-forge
24+
```
25+
请注意,rdkit版本应为2022.3.3,否则可能在加载模型参数时会导致一些错误。
26+
27+
## 下载训练好的模型参数
28+
这里我们提供了可以用来复现我们论文结果的模型参数。
29+
30+
```bash
31+
mkdir -p model
32+
wget https://paddlehelix.bd.bcebos.com/HelixDock/helixdock.pdparams
33+
mv helixdock.pdparams ./model/
34+
```
35+
36+
## 下载原始数据
37+
```
38+
# PDBbind core set
39+
wget https://paddlehelix.bd.bcebos.com/HelixDock/pdbbind_core_raw.tgz
40+
tar xzf pdbbind_core_raw.tgz
41+
mkdir -p ../data/PDBbind_v2020/complex/
42+
mv pdbbind_core/* ../data/PDBbind_v2020/complex/
43+
44+
45+
# PoseBusters dataset
46+
wget https://paddlehelix.bd.bcebos.com/HelixDock/posebuster_raw.tgz
47+
tar xzf posebuster_raw.tgz
48+
```
49+
50+
## 下载处理过的数据
51+
```
52+
mkdir -p data/processed/
53+
# PDBbind core set
54+
wget https://paddlehelix.bd.bcebos.com/HelixDock/pdbbind_core_processed.tgz
55+
tar xzf pdbbind_core_processed.tgz
56+
mv pdbbind_core_processed data/processed/
57+
58+
# PoseBusters dataset
59+
wget https://paddlehelix.bd.bcebos.com/HelixDock/posebuster_processed.tgz
60+
tar xzf posebuster_processed.tgz
61+
mv posebuster_processed data/processed/
62+
```
63+
64+
65+
## 使用方法
66+
为了复现我们论文的结果,我们提供了以下脚本:
67+
68+
```bash
69+
# 复现PDBBind core set的结果
70+
sh reproduce_core.sh
71+
```
72+
73+
输出结果组织如下:
74+
```
75+
./log/reproduce_core/save_output/step-1
76+
mol_name.sdf
77+
```
78+
79+
其中`mol_name.sdf`是输入分子的预测构象。
80+
81+
82+
```bash
83+
# 复现PoseBusters的结果
84+
# 请注意,为了复现PoseBusters结果,需要多次采样并使用RTMScore和posebuster分数进行排名。
85+
sh reproduce_posebuster.sh
86+
```
87+
88+
输出结果组织如下:
89+
```
90+
./log/reproduce_posebuster/save_output/step-1
91+
mol_name.sdf
92+
```
93+
94+
其中`mol_name.sdf`是输入分子的预测构象。
95+
96+
## 数据获取
97+
为了推动小分子药物发现领域的前沿探索,为学术领域的研究者们提供最大助力,HelixDock最新技术将面向学术领域的研究人员全面开放,包括代码和亿级别的训练数据,帮助加速AI技术在小分子药物研发领域的落地,促进该领域的发展(商业客户可通过官网“合作咨询”入口咨询具体商用规则)。
98+
99+
训练数据通过如下链接联系飞桨螺旋桨PaddleHelix团队免费获取(请注明单位名称)https://paddlehelix.baidu.com/partnership
100+
101+
## 引用此工作
102+
103+
如果你在研究中使用了本仓库的代码或数据,请引用:
104+
105+
```bibtex
106+
@article{liu2024pretraining,
107+
title={Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models},
108+
author={Lihang Liu and Shanzhuo Zhang and Donglong He and Xianbin Ye and Jingbo Zhou and Xiaonan Zhang and Yaoyao Jiang and Weiming Diao and Hang Yin and Hua Chai and Fan Wang and Jingzhou He and Liang Zheng and Yonghui Li and Xiaomin Fang},
109+
year={2024},
110+
eprint={2310.13913},
111+
archivePrefix={arXiv},
112+
primaryClass={cs.LG}
113+
}
114+
```

0 commit comments

Comments
 (0)