Skip to content

【PPMix No.4】POINTS-Qwen-2-5-7B-Chat推理对齐 #1241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 28, 2025
104 changes: 104 additions & 0 deletions paddlemix/examples/points_qwen2_5/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# MiniCPM-V-2_6
Copy link
Collaborator

@lyuwenyu lyuwenyu Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文档里这个名字也改一下?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

尬住了,没注意到


## 1. 模型介绍

[POINTS-Qwen](https://huggingface.co/WePOINTS/POINTS-Qwen-2-5-7B-Chat) 融合了视觉语言模型的最新研究进展,并采用了微信AI团队提出的前沿创新技术。

- **强大的基线**:将视觉-语言模型领域的最新进展,即CapFusion、双视觉编码器和动态高分辨率技术,整合到POINTS中

- **预训练数据集过滤**:提出使用困惑度(perplexity)作为指标来过滤预训练数据集。通过这种过滤策略,可以显著减少预训练数据集的规模,同时提升模型的性能。

- **模型融合(Model Soup)**:提出对使用不同视觉指令微调数据集进行微调的模型应用模型融合技术,这可以进一步显著提升模型的性能。

**本仓库支持的模型权重:**

| Model |
|--------------------|
| WePOINTS/POINTS-Qwen-2-5-7B-Chat |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

权重是上传了嘛?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

权重怎么上传呢?哈哈,那个参考文档写的太简略了

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哈哈问题不大;这个参数是torch转过来的嘛 可以把转参数的脚步上传一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改



## 2 环境准备
1)[安装PaddlePaddle](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle)
- **python >= 3.10**
- **paddlepaddle-gpu 要求是3.0.0b2或develop版本**
```bash
# 提供三种 PaddlePaddle 安装命令示例,也可参考PaddleMIX主页的安装教程进行安装

# 3.0.0b2版本安装示例 (CUDA 11.8)
python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议升级到 3.0.0b4也试试

Copy link
Contributor Author

@zhaop-l zhaop-l Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我本地测试用的3.0.0, 3.0.0b4没见这个版本的呀?
image

paddlenlp的话本地测试用的就是3.0.0b4


# Develop 版本安装示例
python -m pip install paddlepaddle-gpu==0.0.0.post118 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html

# sh 脚本快速安装
sh build_paddle_env.sh
```

2)[安装PaddleMIX环境依赖包](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle)
- **paddlenlp >= 3.0.0b3**

```bash
# 提供两种 PaddleMIX 依赖安装命令示例

# pip 安装示例,安装paddlemix、ppdiffusers、项目依赖、paddlenlp
python -m pip install -e . --user
python -m pip install -e ppdiffusers --user
python -m pip install -r requirements.txt --user
python -m pip install paddlenlp==3.0.0b4 --user

# sh 脚本快速安装
sh build_env.sh
```

> 注:
* 请确保安装了以上依赖,否则无法运行。同时,需要安装 paddlemix/external_ops 下的自定义OP, `python setup.py install`。如果安装后仍然找不到算子,需要额外设置PYTHONPATH
* (默认开启flash_attn)使用flash_attn 要求A100/A800显卡或者H20显卡。V100请用float16推理。

## 3 模型转换

将torch模型转换成paddle模型,请采用下述命令。

```bash
# 单图推理
python paddlemix/examples/points_qwen2_5/convert_torch_to_paddle.py --torch_model_path ./models/POINTS-Qwen-2-5-7B-Chat/ --paddle_model_path ./models/POINTS-Qwen-2-5-7B-Chat_pd
```

## 4 快速开始

### 推理

```bash
# 单图推理
python paddlemix/examples/points_qwen2_5/image_infer.py --model_path ./models/POINTS-Qwen-2-5-7B-Chat_pd/ --image_file ./paddlemix/demo_images/examples_image2.jpg
```

![](../../demo_images/examples_image2.jpg)

**Prompt:**

>please describe the image in detail

**Result:**

>The image features a giant panda sitting amidst a lush environment. The panda, with its distinctive black and white fur, is holding a bamboo shoot, which is a staple in its diet. The panda's eyes are looking slightly to the side, giving it a contemplative expression. Surrounding the panda are various green plants, including bamboo shoots and other foliage, which contribute to the natural of a natural habitat. The ground is covered with what appears to be a layer of mulch or soil, and the overall setting suggests a well-maintained enclosure, likely within a zoo or conservation area.



### 参考文献

```BibTeX
@article{liu2024points,
title={POINTS: Improving Your Vision-language Model with Affordable Strategies},
author={Liu, Yuan and Zhao, Zhongyin and Zhuang, Ziyuan and Tian, Le and Zhou, Xiao and Zhou, Jie},
journal={arXiv preprint arXiv:2409.04828},
year={2024}
}

@article{liu2024rethinking,
title={Rethinking Overlooked Aspects in Vision-Language Models},
author={Liu, Yuan and Tian, Le and Zhou, Xiao and Zhou, Jie},
journal={arXiv preprint arXiv:2405.11850},
year={2024}
}

```
173 changes: 173 additions & 0 deletions paddlemix/examples/points_qwen2_5/convert_torch_to_paddle.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# -*- coding: utf-8 -*-

# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# @Time : 2025/4/25 下午11:33
# @Author : zhaop-l(zhaop-l@glocon.com)
import argparse
import copy
import json
import os
import shutil

import paddle
import torch
from safetensors.numpy import save_file
from safetensors.torch import load_file

from paddlemix.utils.log import logger

need_transpose = {
# —— 语言模型部分(CustomLlamaForCausalLM) ——
"attention.query_dense.weight",
"attention.key_value_dense.weight",
"attention.dense.weight",
"mlp.dense_h_to_4h.weight",
"mlp.dense_4h_to_h.weight",
"llm.embed_out.weight",
# —— 双路视觉编码器部分(general_vit + ocr_vit) ——
# self_attn
"self_attn.k_proj.weight",
"self_attn.v_proj.weight",
"self_attn.q_proj.weight",
"self_attn.out_proj.weight",
# mlp
"mlp.fc1.weight",
"mlp.fc2.weight",
# —— vision_projector 重采样 / 映射层 ——
"vision_projector.0.weight",
"vision_projector.2.weight",
}

rename_layers = {
"embeddings.class_embedding": "class_embedding",
"embeddings.patch_embedding.weight": "conv1.weight",
"embeddings.position_embedding": "positional_embedding",
"pre_layrnorm": "ln_pre",
"vision_model.encoder": "vision_model.transformer",
"layer_norm1": "norm1",
"layer_norm2": "norm2",
"mlp.fc1": "linear1",
"mlp.fc2": "linear2",
"post_layernorm": "ln_post",
}


def execute_cmd(cmd, file_path):
cmd = cmd + " " + file_path
os.system(cmd)


def check_trans(key, _need_transpose):
precess_list = []
for x in _need_transpose:
if x in key:
precess_list.append(x)
if len(precess_list) > 0:
return True, precess_list
else:
return False, None


def translate_one_safetensors(file_name: str, dst_path: str, model_path: str):
tensors = load_file(os.path.join(model_path, file_name))
for key in list(tensors.keys()):
dst_key = key
shape_ = tensors[key].shape
rename_flag, rename_key = check_trans(key, rename_layers)
if rename_flag:
for _r in rename_key:
dst_key = dst_key.replace(_r, rename_layers[_r])
t_flag, _ = check_trans(key, need_transpose)
if t_flag and len(shape_) == 2:
t = tensors.pop(key).cuda().t().contiguous()
capsule = torch.utils.dlpack.to_dlpack(t)
t = paddle.utils.dlpack.from_dlpack(capsule)
tensors[dst_key] = t.numpy()
else:
t = tensors.pop(key).cuda()
capsule = torch.utils.dlpack.to_dlpack(t)
t = paddle.utils.dlpack.from_dlpack(capsule)
tensors[dst_key] = t.numpy()

save_file(tensors, os.path.join(dst_path, file_name), metadata={"format": "np"})


def main(args):
model_path = args.torch_model_path
if args.paddle_model_path is not None:
dst_path = args.paddle_model_path
else:
dst_path = model_path.rstrip("/") + "_pd"
os.makedirs(dst_path, exist_ok=True)

logger.info(f"torch model path: {model_path}, paddle model path: {dst_path}")
logger.info("start convert torch model to paddle model")

if os.path.exists(os.path.join(model_path, "model.safetensors.index.json")):
index = json.load(open(os.path.join(model_path, "model.safetensors.index.json")))
dst_index = copy.deepcopy(index)
files = set(index["weight_map"].values())

for key in list(dst_index["weight_map"].keys()):
rename_flag, rename_key = check_trans(key, rename_layers)
dst_key = key
if rename_flag:
for _r in rename_key:
dst_key = dst_key.replace(_r, rename_layers[_r])
dst_index["weight_map"][dst_key] = dst_index["weight_map"].pop(key)

for file_name in sorted(os.listdir(model_path)):
# skip hidden files
if file_name.startswith("."):
continue

if file_name in files:
# convert safetensors to safetensors(paddle)
logger.info(f"start convert {file_name}")
translate_one_safetensors(file_name, dst_path, model_path)
else:
# copy config.json and other files
shutil.copy(os.path.join(model_path, file_name), os.path.join(dst_path, file_name))

json.dump(dst_index, open(os.path.join(dst_path, "model.safetensors.index.json"), "w"), indent=2)

else:
for file_name in sorted(os.listdir(model_path)):
# skip hidden files
if file_name.startswith("."):
continue

logger.info(file_name)
if file_name == "model.safetensors":
# convert safetensors to safetensors(paddle)
translate_one_safetensors(file_name, dst_path, model_path)
else:
# copy config.json and other files
shutil.copy(os.path.join(model_path, file_name), os.path.join(dst_path, file_name))

execute_cmd(cmd="sed -i -e 's/torch_dtype/dtype/g' ", file_path=os.path.join(dst_path, "config.json"))

execute_cmd(cmd="sed -i /transformers_version/d ", file_path=os.path.join(dst_path, "config.json"))

logger.info(f"convert torch model to paddle model success, paddle model path: {dst_path}")


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--torch_model_path", type=str, default="POINTS-Qwen-2-5-7B-Chat")
parser.add_argument("--paddle_model_path", type=str, default=None)
args = parser.parse_args()
main(args)
58 changes: 58 additions & 0 deletions paddlemix/examples/points_qwen2_5/image_infer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# -*- coding: utf-8 -*-

# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# @Time : 2025/4/19 下午8:37
# @Author : zhaop-l(zhaopuzxjc@126.com)
import argparse

from paddlenlp.transformers import CLIPImageProcessor, Qwen2Tokenizer
from PIL import Image

from paddlemix.models.points_qwen2_5 import POINTSChatModel


def main(args):
model_path = args.model_path

model = POINTSChatModel.from_pretrained(model_path)
tokenizer = Qwen2Tokenizer.from_pretrained(model_path)
image_processor = CLIPImageProcessor.from_pretrained(model_path)

image_path = args.image_file
pil_image = Image.open(image_path)
question = args.question

generation_config = {
"max_new_tokens": args.max_new_tokens,
"temperature": args.temperature,
"top_p": args.top_p,
"num_beams": 1,
}
res = model.chat(pil_image, question, tokenizer, image_processor, True, generation_config)

print(f"User: {question}\nAssistant: {res}")


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model_path", type=str, default="./models/POINTS-Qwen-2-5-7B-Chat_pd")
parser.add_argument("--question", type=str, default="please describe the image in detail")
parser.add_argument("--image_file", type=str, default="paddlemix/demo_images/examples_image2.jpg")
parser.add_argument("--top_p", type=float, default=0.0)
parser.add_argument("--temperature", type=float, default=0.0)
parser.add_argument("--max_new_tokens", type=int, default=1024)
args = parser.parse_args()
main(args)
19 changes: 19 additions & 0 deletions paddlemix/models/points_qwen2_5/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# -*- coding: utf-8 -*-

# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# @Time : 2025/4/19 下午8:37
# @Author : zhaop-l(zhaopuzxjc@126.com)
from .modeling_points_chat import *
Loading