-
Notifications
You must be signed in to change notification settings - Fork 214
【PPMix No.4】POINTS-Qwen-2-5-7B-Chat推理对齐 #1241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
edcd17c
add points_qwen2_5
zhaop-l 1d23b67
fix readme
zhaop-l e68b591
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleMIX i…
zhaop-l a317165
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleMIX i…
zhaop-l e970b76
add points_qwen2_5
zhaop-l 8c836bc
fix readme
zhaop-l 3261fca
add convert script
zhaop-l ede4f0c
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleMIX i…
zhaop-l c7135ab
add convert script
zhaop-l e9615a2
delete unnecessary images
zhaop-l 0750d65
fix infer
zhaop-l ba5de17
fix infer
zhaop-l dc39c52
Merge branch 'develop' into local_dev
lyuwenyu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# POINTS-Qwen-2-5 | ||
|
||
## 1. 模型介绍 | ||
|
||
[POINTS-Qwen](https://huggingface.co/WePOINTS/POINTS-Qwen-2-5-7B-Chat) 融合了视觉语言模型的最新研究进展,并采用了微信AI团队提出的前沿创新技术。 | ||
|
||
- **强大的基线**:将视觉-语言模型领域的最新进展,即CapFusion、双视觉编码器和动态高分辨率技术,整合到POINTS中 | ||
|
||
- **预训练数据集过滤**:提出使用困惑度(perplexity)作为指标来过滤预训练数据集。通过这种过滤策略,可以显著减少预训练数据集的规模,同时提升模型的性能。 | ||
|
||
- **模型融合(Model Soup)**:提出对使用不同视觉指令微调数据集进行微调的模型应用模型融合技术,这可以进一步显著提升模型的性能。 | ||
|
||
**本仓库支持的模型权重:** | ||
|
||
| Model | | ||
|--------------------| | ||
| WePOINTS/POINTS-Qwen-2-5-7B-Chat | | ||
|
||
|
||
## 2 环境准备 | ||
1)[安装PaddlePaddle](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle) | ||
- **python >= 3.10** | ||
- **paddlepaddle-gpu 要求是3.0.0b2或develop版本** | ||
```bash | ||
# 提供三种 PaddlePaddle 安装命令示例,也可参考PaddleMIX主页的安装教程进行安装 | ||
|
||
# 3.0.0b2版本安装示例 (CUDA 11.8) | ||
python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 建议升级到 3.0.0b4也试试 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
# Develop 版本安装示例 | ||
python -m pip install paddlepaddle-gpu==0.0.0.post118 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html | ||
|
||
# sh 脚本快速安装 | ||
sh build_paddle_env.sh | ||
``` | ||
|
||
2)[安装PaddleMIX环境依赖包](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle) | ||
- **paddlenlp >= 3.0.0b3** | ||
|
||
```bash | ||
# 提供两种 PaddleMIX 依赖安装命令示例 | ||
|
||
# pip 安装示例,安装paddlemix、ppdiffusers、项目依赖、paddlenlp | ||
python -m pip install -e . --user | ||
python -m pip install -e ppdiffusers --user | ||
python -m pip install -r requirements.txt --user | ||
python -m pip install paddlenlp==3.0.0b4 --user | ||
|
||
# sh 脚本快速安装 | ||
sh build_env.sh | ||
``` | ||
|
||
> 注: | ||
* 请确保安装了以上依赖,否则无法运行。同时,需要安装 paddlemix/external_ops 下的自定义OP, `python setup.py install`。如果安装后仍然找不到算子,需要额外设置PYTHONPATH | ||
* (默认开启flash_attn)使用flash_attn 要求A100/A800显卡或者H20显卡。V100请用float16推理。 | ||
|
||
## 3 模型转换 | ||
|
||
将torch模型转换成paddle模型,请采用下述命令。 | ||
|
||
```bash | ||
# 单图推理 | ||
python paddlemix/examples/points_qwen2_5/convert_torch_to_paddle.py --torch_model_path ./models/POINTS-Qwen-2-5-7B-Chat/ --paddle_model_path ./models/POINTS-Qwen-2-5-7B-Chat_pd | ||
``` | ||
|
||
## 4 快速开始 | ||
|
||
### 推理 | ||
|
||
```bash | ||
# 单图推理 | ||
python paddlemix/examples/points_qwen2_5/image_infer.py --model_path ./models/POINTS-Qwen-2-5-7B-Chat_pd/ --image_file ./paddlemix/demo_images/examples_image2.jpg | ||
``` | ||
|
||
 | ||
|
||
**Prompt:** | ||
|
||
>please describe the image in detail | ||
|
||
**Result:** | ||
|
||
>The image features a giant panda sitting amidst a lush environment. The panda, with its distinctive black and white fur, is holding a bamboo shoot, which is a staple in its diet. The panda's eyes are looking slightly to the side, giving it a contemplative expression. Surrounding the panda are various green plants, including bamboo shoots and other foliage, which contribute to the natural of a natural habitat. The ground is covered with what appears to be a layer of mulch or soil, and the overall setting suggests a well-maintained enclosure, likely within a zoo or conservation area. | ||
|
||
|
||
|
||
### 参考文献 | ||
|
||
```BibTeX | ||
@article{liu2024points, | ||
title={POINTS: Improving Your Vision-language Model with Affordable Strategies}, | ||
author={Liu, Yuan and Zhao, Zhongyin and Zhuang, Ziyuan and Tian, Le and Zhou, Xiao and Zhou, Jie}, | ||
journal={arXiv preprint arXiv:2409.04828}, | ||
year={2024} | ||
} | ||
|
||
@article{liu2024rethinking, | ||
title={Rethinking Overlooked Aspects in Vision-Language Models}, | ||
author={Liu, Yuan and Tian, Le and Zhou, Xiao and Zhou, Jie}, | ||
journal={arXiv preprint arXiv:2405.11850}, | ||
year={2024} | ||
} | ||
|
||
``` |
173 changes: 173 additions & 0 deletions
173
paddlemix/examples/points_qwen2_5/convert_torch_to_paddle.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# @Time : 2025/4/25 下午11:33 | ||
# @Author : zhaop-l(zhaop-l@glocon.com) | ||
import argparse | ||
import copy | ||
import json | ||
import os | ||
import shutil | ||
|
||
import paddle | ||
import torch | ||
from safetensors.numpy import save_file | ||
from safetensors.torch import load_file | ||
|
||
from paddlemix.utils.log import logger | ||
|
||
need_transpose = { | ||
# —— 语言模型部分(CustomLlamaForCausalLM) —— | ||
"attention.query_dense.weight", | ||
"attention.key_value_dense.weight", | ||
"attention.dense.weight", | ||
"mlp.dense_h_to_4h.weight", | ||
"mlp.dense_4h_to_h.weight", | ||
"llm.embed_out.weight", | ||
# —— 双路视觉编码器部分(general_vit + ocr_vit) —— | ||
# self_attn | ||
"self_attn.k_proj.weight", | ||
"self_attn.v_proj.weight", | ||
"self_attn.q_proj.weight", | ||
"self_attn.out_proj.weight", | ||
# mlp | ||
"mlp.fc1.weight", | ||
"mlp.fc2.weight", | ||
# —— vision_projector 重采样 / 映射层 —— | ||
"vision_projector.0.weight", | ||
"vision_projector.2.weight", | ||
} | ||
|
||
rename_layers = { | ||
"embeddings.class_embedding": "class_embedding", | ||
"embeddings.patch_embedding.weight": "conv1.weight", | ||
"embeddings.position_embedding": "positional_embedding", | ||
"pre_layrnorm": "ln_pre", | ||
"vision_model.encoder": "vision_model.transformer", | ||
"layer_norm1": "norm1", | ||
"layer_norm2": "norm2", | ||
"mlp.fc1": "linear1", | ||
"mlp.fc2": "linear2", | ||
"post_layernorm": "ln_post", | ||
} | ||
|
||
|
||
def execute_cmd(cmd, file_path): | ||
cmd = cmd + " " + file_path | ||
os.system(cmd) | ||
|
||
|
||
def check_trans(key, _need_transpose): | ||
precess_list = [] | ||
for x in _need_transpose: | ||
if x in key: | ||
precess_list.append(x) | ||
if len(precess_list) > 0: | ||
return True, precess_list | ||
else: | ||
return False, None | ||
|
||
|
||
def translate_one_safetensors(file_name: str, dst_path: str, model_path: str): | ||
tensors = load_file(os.path.join(model_path, file_name)) | ||
for key in list(tensors.keys()): | ||
dst_key = key | ||
shape_ = tensors[key].shape | ||
rename_flag, rename_key = check_trans(key, rename_layers) | ||
if rename_flag: | ||
for _r in rename_key: | ||
dst_key = dst_key.replace(_r, rename_layers[_r]) | ||
t_flag, _ = check_trans(key, need_transpose) | ||
if t_flag and len(shape_) == 2: | ||
t = tensors.pop(key).cuda().t().contiguous() | ||
capsule = torch.utils.dlpack.to_dlpack(t) | ||
t = paddle.utils.dlpack.from_dlpack(capsule) | ||
tensors[dst_key] = t.numpy() | ||
else: | ||
t = tensors.pop(key).cuda() | ||
capsule = torch.utils.dlpack.to_dlpack(t) | ||
t = paddle.utils.dlpack.from_dlpack(capsule) | ||
tensors[dst_key] = t.numpy() | ||
|
||
save_file(tensors, os.path.join(dst_path, file_name), metadata={"format": "np"}) | ||
|
||
|
||
def main(args): | ||
model_path = args.torch_model_path | ||
if args.paddle_model_path is not None: | ||
dst_path = args.paddle_model_path | ||
else: | ||
dst_path = model_path.rstrip("/") + "_pd" | ||
os.makedirs(dst_path, exist_ok=True) | ||
|
||
logger.info(f"torch model path: {model_path}, paddle model path: {dst_path}") | ||
logger.info("start convert torch model to paddle model") | ||
|
||
if os.path.exists(os.path.join(model_path, "model.safetensors.index.json")): | ||
index = json.load(open(os.path.join(model_path, "model.safetensors.index.json"))) | ||
dst_index = copy.deepcopy(index) | ||
files = set(index["weight_map"].values()) | ||
|
||
for key in list(dst_index["weight_map"].keys()): | ||
rename_flag, rename_key = check_trans(key, rename_layers) | ||
dst_key = key | ||
if rename_flag: | ||
for _r in rename_key: | ||
dst_key = dst_key.replace(_r, rename_layers[_r]) | ||
dst_index["weight_map"][dst_key] = dst_index["weight_map"].pop(key) | ||
|
||
for file_name in sorted(os.listdir(model_path)): | ||
# skip hidden files | ||
if file_name.startswith("."): | ||
continue | ||
|
||
if file_name in files: | ||
# convert safetensors to safetensors(paddle) | ||
logger.info(f"start convert {file_name}") | ||
translate_one_safetensors(file_name, dst_path, model_path) | ||
else: | ||
# copy config.json and other files | ||
shutil.copy(os.path.join(model_path, file_name), os.path.join(dst_path, file_name)) | ||
|
||
json.dump(dst_index, open(os.path.join(dst_path, "model.safetensors.index.json"), "w"), indent=2) | ||
|
||
else: | ||
for file_name in sorted(os.listdir(model_path)): | ||
# skip hidden files | ||
if file_name.startswith("."): | ||
continue | ||
|
||
logger.info(file_name) | ||
if file_name == "model.safetensors": | ||
# convert safetensors to safetensors(paddle) | ||
translate_one_safetensors(file_name, dst_path, model_path) | ||
else: | ||
# copy config.json and other files | ||
shutil.copy(os.path.join(model_path, file_name), os.path.join(dst_path, file_name)) | ||
|
||
execute_cmd(cmd="sed -i -e 's/torch_dtype/dtype/g' ", file_path=os.path.join(dst_path, "config.json")) | ||
|
||
execute_cmd(cmd="sed -i /transformers_version/d ", file_path=os.path.join(dst_path, "config.json")) | ||
|
||
logger.info(f"convert torch model to paddle model success, paddle model path: {dst_path}") | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("--torch_model_path", type=str, default="POINTS-Qwen-2-5-7B-Chat") | ||
parser.add_argument("--paddle_model_path", type=str, default=None) | ||
args = parser.parse_args() | ||
main(args) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# @Time : 2025/4/19 下午8:37 | ||
# @Author : zhaop-l(zhaopuzxjc@126.com) | ||
import argparse | ||
|
||
from paddlenlp.transformers import CLIPImageProcessor, Qwen2Tokenizer | ||
from PIL import Image | ||
|
||
from paddlemix.models.points_qwen2_5 import POINTSChatModel | ||
|
||
|
||
def main(args): | ||
model_path = args.model_path | ||
|
||
model = POINTSChatModel.from_pretrained(model_path) | ||
tokenizer = Qwen2Tokenizer.from_pretrained(model_path) | ||
image_processor = CLIPImageProcessor.from_pretrained(model_path) | ||
|
||
image_path = args.image_file | ||
pil_image = Image.open(image_path) | ||
question = args.question | ||
|
||
generation_config = { | ||
"max_new_tokens": args.max_new_tokens, | ||
"temperature": args.temperature, | ||
"top_p": args.top_p, | ||
"num_beams": 1, | ||
} | ||
res = model.chat(pil_image, question, tokenizer, image_processor, True, generation_config) | ||
|
||
print(f"User: {question}\nAssistant: {res}") | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("--model_path", type=str, default="./models/POINTS-Qwen-2-5-7B-Chat_pd") | ||
parser.add_argument("--question", type=str, default="please describe the image in detail") | ||
parser.add_argument("--image_file", type=str, default="paddlemix/demo_images/examples_image2.jpg") | ||
parser.add_argument("--top_p", type=float, default=0.0) | ||
parser.add_argument("--temperature", type=float, default=0.0) | ||
parser.add_argument("--max_new_tokens", type=int, default=1024) | ||
args = parser.parse_args() | ||
main(args) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# @Time : 2025/4/19 下午8:37 | ||
# @Author : zhaop-l(zhaopuzxjc@126.com) | ||
from .modeling_points_chat import * |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
权重是上传了嘛?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
权重怎么上传呢?哈哈,那个参考文档写的太简略了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
哈哈问题不大;这个参数是torch转过来的嘛 可以把转参数的脚步上传一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已经按照要求修改