-
Notifications
You must be signed in to change notification settings - Fork 213
【PPMix No.4】POINTS-Qwen-2-5-7B-Chat推理对齐 #1241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 11 commits
edcd17c
1d23b67
e68b591
a317165
e970b76
8c836bc
3261fca
ede4f0c
c7135ab
e9615a2
0750d65
ba5de17
dc39c52
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# MiniCPM-V-2_6 | ||
|
||
## 1. 模型介绍 | ||
|
||
[POINTS-Qwen](https://huggingface.co/WePOINTS/POINTS-Qwen-2-5-7B-Chat) 融合了视觉语言模型的最新研究进展,并采用了微信AI团队提出的前沿创新技术。 | ||
|
||
- **强大的基线**:将视觉-语言模型领域的最新进展,即CapFusion、双视觉编码器和动态高分辨率技术,整合到POINTS中 | ||
|
||
- **预训练数据集过滤**:提出使用困惑度(perplexity)作为指标来过滤预训练数据集。通过这种过滤策略,可以显著减少预训练数据集的规模,同时提升模型的性能。 | ||
|
||
- **模型融合(Model Soup)**:提出对使用不同视觉指令微调数据集进行微调的模型应用模型融合技术,这可以进一步显著提升模型的性能。 | ||
|
||
**本仓库支持的模型权重:** | ||
|
||
| Model | | ||
|--------------------| | ||
| WePOINTS/POINTS-Qwen-2-5-7B-Chat | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 权重是上传了嘛? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 权重怎么上传呢?哈哈,那个参考文档写的太简略了 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 哈哈问题不大;这个参数是torch转过来的嘛 可以把转参数的脚步上传一下 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 已经按照要求修改 |
||
|
||
|
||
## 2 环境准备 | ||
1)[安装PaddlePaddle](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle) | ||
- **python >= 3.10** | ||
- **paddlepaddle-gpu 要求是3.0.0b2或develop版本** | ||
```bash | ||
# 提供三种 PaddlePaddle 安装命令示例,也可参考PaddleMIX主页的安装教程进行安装 | ||
|
||
# 3.0.0b2版本安装示例 (CUDA 11.8) | ||
python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 建议升级到 3.0.0b4也试试 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
# Develop 版本安装示例 | ||
python -m pip install paddlepaddle-gpu==0.0.0.post118 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html | ||
|
||
# sh 脚本快速安装 | ||
sh build_paddle_env.sh | ||
``` | ||
|
||
2)[安装PaddleMIX环境依赖包](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle) | ||
- **paddlenlp >= 3.0.0b3** | ||
|
||
```bash | ||
# 提供两种 PaddleMIX 依赖安装命令示例 | ||
|
||
# pip 安装示例,安装paddlemix、ppdiffusers、项目依赖、paddlenlp | ||
python -m pip install -e . --user | ||
python -m pip install -e ppdiffusers --user | ||
python -m pip install -r requirements.txt --user | ||
python -m pip install paddlenlp==3.0.0b4 --user | ||
|
||
# sh 脚本快速安装 | ||
sh build_env.sh | ||
``` | ||
|
||
> 注: | ||
* 请确保安装了以上依赖,否则无法运行。同时,需要安装 paddlemix/external_ops 下的自定义OP, `python setup.py install`。如果安装后仍然找不到算子,需要额外设置PYTHONPATH | ||
* (默认开启flash_attn)使用flash_attn 要求A100/A800显卡或者H20显卡。V100请用float16推理。 | ||
|
||
## 3 模型转换 | ||
|
||
将torch模型转换成paddle模型,请采用下述命令。 | ||
|
||
```bash | ||
# 单图推理 | ||
python paddlemix/examples/points_qwen2_5/convert_torch_to_paddle.py --torch_model_path ./models/POINTS-Qwen-2-5-7B-Chat/ --paddle_model_path ./models/POINTS-Qwen-2-5-7B-Chat_pd | ||
``` | ||
|
||
## 4 快速开始 | ||
|
||
### 推理 | ||
|
||
```bash | ||
# 单图推理 | ||
python paddlemix/examples/points_qwen2_5/image_infer.py --model_path ./models/POINTS-Qwen-2-5-7B-Chat_pd/ --image_file ./paddlemix/demo_images/examples_image2.jpg | ||
``` | ||
|
||
 | ||
|
||
**Prompt:** | ||
|
||
>please describe the image in detail | ||
|
||
**Result:** | ||
|
||
>The image features a giant panda sitting amidst a lush environment. The panda, with its distinctive black and white fur, is holding a bamboo shoot, which is a staple in its diet. The panda's eyes are looking slightly to the side, giving it a contemplative expression. Surrounding the panda are various green plants, including bamboo shoots and other foliage, which contribute to the natural of a natural habitat. The ground is covered with what appears to be a layer of mulch or soil, and the overall setting suggests a well-maintained enclosure, likely within a zoo or conservation area. | ||
|
||
|
||
|
||
### 参考文献 | ||
|
||
```BibTeX | ||
@article{liu2024points, | ||
title={POINTS: Improving Your Vision-language Model with Affordable Strategies}, | ||
author={Liu, Yuan and Zhao, Zhongyin and Zhuang, Ziyuan and Tian, Le and Zhou, Xiao and Zhou, Jie}, | ||
journal={arXiv preprint arXiv:2409.04828}, | ||
year={2024} | ||
} | ||
|
||
@article{liu2024rethinking, | ||
title={Rethinking Overlooked Aspects in Vision-Language Models}, | ||
author={Liu, Yuan and Tian, Le and Zhou, Xiao and Zhou, Jie}, | ||
journal={arXiv preprint arXiv:2405.11850}, | ||
year={2024} | ||
} | ||
|
||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,173 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# @Time : 2025/4/25 下午11:33 | ||
# @Author : zhaop-l(zhaop-l@glocon.com) | ||
import argparse | ||
import copy | ||
import json | ||
import os | ||
import shutil | ||
|
||
import paddle | ||
import torch | ||
from safetensors.numpy import save_file | ||
from safetensors.torch import load_file | ||
|
||
from paddlemix.utils.log import logger | ||
|
||
need_transpose = { | ||
# —— 语言模型部分(CustomLlamaForCausalLM) —— | ||
"attention.query_dense.weight", | ||
"attention.key_value_dense.weight", | ||
"attention.dense.weight", | ||
"mlp.dense_h_to_4h.weight", | ||
"mlp.dense_4h_to_h.weight", | ||
"llm.embed_out.weight", | ||
# —— 双路视觉编码器部分(general_vit + ocr_vit) —— | ||
# self_attn | ||
"self_attn.k_proj.weight", | ||
"self_attn.v_proj.weight", | ||
"self_attn.q_proj.weight", | ||
"self_attn.out_proj.weight", | ||
# mlp | ||
"mlp.fc1.weight", | ||
"mlp.fc2.weight", | ||
# —— vision_projector 重采样 / 映射层 —— | ||
"vision_projector.0.weight", | ||
"vision_projector.2.weight", | ||
} | ||
|
||
rename_layers = { | ||
"embeddings.class_embedding": "class_embedding", | ||
"embeddings.patch_embedding.weight": "conv1.weight", | ||
"embeddings.position_embedding": "positional_embedding", | ||
"pre_layrnorm": "ln_pre", | ||
"vision_model.encoder": "vision_model.transformer", | ||
"layer_norm1": "norm1", | ||
"layer_norm2": "norm2", | ||
"mlp.fc1": "linear1", | ||
"mlp.fc2": "linear2", | ||
"post_layernorm": "ln_post", | ||
} | ||
|
||
|
||
def execute_cmd(cmd, file_path): | ||
cmd = cmd + " " + file_path | ||
os.system(cmd) | ||
|
||
|
||
def check_trans(key, _need_transpose): | ||
precess_list = [] | ||
for x in _need_transpose: | ||
if x in key: | ||
precess_list.append(x) | ||
if len(precess_list) > 0: | ||
return True, precess_list | ||
else: | ||
return False, None | ||
|
||
|
||
def translate_one_safetensors(file_name: str, dst_path: str, model_path: str): | ||
tensors = load_file(os.path.join(model_path, file_name)) | ||
for key in list(tensors.keys()): | ||
dst_key = key | ||
shape_ = tensors[key].shape | ||
rename_flag, rename_key = check_trans(key, rename_layers) | ||
if rename_flag: | ||
for _r in rename_key: | ||
dst_key = dst_key.replace(_r, rename_layers[_r]) | ||
t_flag, _ = check_trans(key, need_transpose) | ||
if t_flag and len(shape_) == 2: | ||
t = tensors.pop(key).cuda().t().contiguous() | ||
capsule = torch.utils.dlpack.to_dlpack(t) | ||
t = paddle.utils.dlpack.from_dlpack(capsule) | ||
tensors[dst_key] = t.numpy() | ||
else: | ||
t = tensors.pop(key).cuda() | ||
capsule = torch.utils.dlpack.to_dlpack(t) | ||
t = paddle.utils.dlpack.from_dlpack(capsule) | ||
tensors[dst_key] = t.numpy() | ||
|
||
save_file(tensors, os.path.join(dst_path, file_name), metadata={"format": "np"}) | ||
|
||
|
||
def main(args): | ||
model_path = args.torch_model_path | ||
if args.paddle_model_path is not None: | ||
dst_path = args.paddle_model_path | ||
else: | ||
dst_path = model_path.rstrip("/") + "_pd" | ||
os.makedirs(dst_path, exist_ok=True) | ||
|
||
logger.info(f"torch model path: {model_path}, paddle model path: {dst_path}") | ||
logger.info("start convert torch model to paddle model") | ||
|
||
if os.path.exists(os.path.join(model_path, "model.safetensors.index.json")): | ||
index = json.load(open(os.path.join(model_path, "model.safetensors.index.json"))) | ||
dst_index = copy.deepcopy(index) | ||
files = set(index["weight_map"].values()) | ||
|
||
for key in list(dst_index["weight_map"].keys()): | ||
rename_flag, rename_key = check_trans(key, rename_layers) | ||
dst_key = key | ||
if rename_flag: | ||
for _r in rename_key: | ||
dst_key = dst_key.replace(_r, rename_layers[_r]) | ||
dst_index["weight_map"][dst_key] = dst_index["weight_map"].pop(key) | ||
|
||
for file_name in sorted(os.listdir(model_path)): | ||
# skip hidden files | ||
if file_name.startswith("."): | ||
continue | ||
|
||
if file_name in files: | ||
# convert safetensors to safetensors(paddle) | ||
logger.info(f"start convert {file_name}") | ||
translate_one_safetensors(file_name, dst_path, model_path) | ||
else: | ||
# copy config.json and other files | ||
shutil.copy(os.path.join(model_path, file_name), os.path.join(dst_path, file_name)) | ||
|
||
json.dump(dst_index, open(os.path.join(dst_path, "model.safetensors.index.json"), "w"), indent=2) | ||
|
||
else: | ||
for file_name in sorted(os.listdir(model_path)): | ||
# skip hidden files | ||
if file_name.startswith("."): | ||
continue | ||
|
||
logger.info(file_name) | ||
if file_name == "model.safetensors": | ||
# convert safetensors to safetensors(paddle) | ||
translate_one_safetensors(file_name, dst_path, model_path) | ||
else: | ||
# copy config.json and other files | ||
shutil.copy(os.path.join(model_path, file_name), os.path.join(dst_path, file_name)) | ||
|
||
execute_cmd(cmd="sed -i -e 's/torch_dtype/dtype/g' ", file_path=os.path.join(dst_path, "config.json")) | ||
|
||
execute_cmd(cmd="sed -i /transformers_version/d ", file_path=os.path.join(dst_path, "config.json")) | ||
|
||
logger.info(f"convert torch model to paddle model success, paddle model path: {dst_path}") | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("--torch_model_path", type=str, default="POINTS-Qwen-2-5-7B-Chat") | ||
parser.add_argument("--paddle_model_path", type=str, default=None) | ||
args = parser.parse_args() | ||
main(args) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# @Time : 2025/4/19 下午8:37 | ||
# @Author : zhaop-l(zhaopuzxjc@126.com) | ||
import argparse | ||
|
||
from paddlenlp.transformers import CLIPImageProcessor, Qwen2Tokenizer | ||
from PIL import Image | ||
|
||
from paddlemix.models.points_qwen2_5 import POINTSChatModel | ||
|
||
|
||
def main(args): | ||
model_path = args.model_path | ||
|
||
model = POINTSChatModel.from_pretrained(model_path) | ||
tokenizer = Qwen2Tokenizer.from_pretrained(model_path) | ||
image_processor = CLIPImageProcessor.from_pretrained(model_path) | ||
|
||
image_path = args.image_file | ||
pil_image = Image.open(image_path) | ||
question = args.question | ||
|
||
generation_config = { | ||
"max_new_tokens": args.max_new_tokens, | ||
"temperature": args.temperature, | ||
"top_p": args.top_p, | ||
"num_beams": 1, | ||
} | ||
res = model.chat(pil_image, question, tokenizer, image_processor, True, generation_config) | ||
|
||
print(f"User: {question}\nAssistant: {res}") | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("--model_path", type=str, default="./models/POINTS-Qwen-2-5-7B-Chat_pd") | ||
parser.add_argument("--question", type=str, default="please describe the image in detail") | ||
parser.add_argument("--image_file", type=str, default="paddlemix/demo_images/examples_image2.jpg") | ||
parser.add_argument("--top_p", type=float, default=0.0) | ||
parser.add_argument("--temperature", type=float, default=0.0) | ||
parser.add_argument("--max_new_tokens", type=int, default=1024) | ||
args = parser.parse_args() | ||
main(args) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# @Time : 2025/4/19 下午8:37 | ||
# @Author : zhaop-l(zhaopuzxjc@126.com) | ||
from .modeling_points_chat import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文档里这个名字也改一下?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
尬住了,没注意到