Skip to content

【PPMix No.4】POINTS-Qwen-2-5-7B-Chat推理对齐 #1241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 28, 2025
Binary file added paddlemix/demo_images/minicpm_demo.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
81 changes: 81 additions & 0 deletions paddlemix/examples/points_qwen2_5/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# MiniCPM-V-2_6
Copy link
Collaborator

@lyuwenyu lyuwenyu Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文档里这个名字也改一下?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

尬住了,没注意到


## 1. 模型介绍

[POINTS-Qwen](https://huggingface.co/WePOINTS/POINTS-Qwen-2-5-7B-Chat) 融合了视觉语言模型的最新研究进展,并采用了微信AI团队提出的前沿创新技术。

- **强大的基线**:将视觉-语言模型领域的最新进展,即CapFusion、双视觉编码器和动态高分辨率技术,整合到POINTS中

- **预训练数据集过滤**:提出使用困惑度(perplexity)作为指标来过滤预训练数据集。通过这种过滤策略,可以显著减少预训练数据集的规模,同时提升模型的性能。

- **模型融合(Model Soup)**:提出对使用不同视觉指令微调数据集进行微调的模型应用模型融合技术,这可以进一步显著提升模型的性能。

**本仓库支持的模型权重:**

| Model |
|--------------------|
| WePOINTS/POINTS-Qwen-2-5-7B-Chat |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

权重是上传了嘛?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

权重怎么上传呢?哈哈,那个参考文档写的太简略了

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哈哈问题不大;这个参数是torch转过来的嘛 可以把转参数的脚步上传一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改



## 2 环境准备
1)[安装PaddlePaddle](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle)
- **python >= 3.10**
- **paddlepaddle-gpu 要求是3.0.0b2或develop版本**
```bash
# 提供三种 PaddlePaddle 安装命令示例,也可参考PaddleMIX主页的安装教程进行安装

# 3.0.0b2版本安装示例 (CUDA 11.8)
python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议升级到 3.0.0b4也试试

Copy link
Contributor Author

@zhaop-l zhaop-l Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我本地测试用的3.0.0, 3.0.0b4没见这个版本的呀?
image

paddlenlp的话本地测试用的就是3.0.0b4


# Develop 版本安装示例
python -m pip install paddlepaddle-gpu==0.0.0.post118 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html

# sh 脚本快速安装
sh build_paddle_env.sh
```

2)[安装PaddleMIX环境依赖包](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle)
- **paddlenlp >= 3.0.0b3**

```bash
# 提供两种 PaddleMIX 依赖安装命令示例

# pip 安装示例,安装paddlemix、ppdiffusers、项目依赖、paddlenlp
python -m pip install -e . --user
python -m pip install -e ppdiffusers --user
python -m pip install -r requirements.txt --user
python -m pip install paddlenlp==3.0.0b3 --user

# sh 脚本快速安装
sh build_env.sh
```

> 注:
* 请确保安装了以上依赖,否则无法运行。同时,需要安装 paddlemix/external_ops 下的自定义OP, `python setup.py install`。如果安装后仍然找不到算子,需要额外设置PYTHONPATH
* (默认开启flash_attn)使用flash_attn 要求A100/A800显卡或者H20显卡。V100请用float16推理。

## 3 快速开始

### 推理
```bash
# 单图推理
python paddlemix/examples/points_qwen2_5/image_infer.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议把推理结果也放一下

```

### 参考文献
```BibTeX
@article{liu2024points,
title={POINTS: Improving Your Vision-language Model with Affordable Strategies},
author={Liu, Yuan and Zhao, Zhongyin and Zhuang, Ziyuan and Tian, Le and Zhou, Xiao and Zhou, Jie},
journal={arXiv preprint arXiv:2409.04828},
year={2024}
}

@article{liu2024rethinking,
title={Rethinking Overlooked Aspects in Vision-Language Models},
author={Liu, Yuan and Tian, Le and Zhou, Xiao and Zhou, Jie},
journal={arXiv preprint arXiv:2405.11850},
year={2024}
}

```
43 changes: 43 additions & 0 deletions paddlemix/examples/points_qwen2_5/image_infer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# -*- coding: utf-8 -*-

# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# @Time : 2025/4/19 下午8:37
# @Author : zhaop-l(zhaopuzxjc@126.com)

from paddlenlp.transformers import CLIPImageProcessor, Qwen2Tokenizer
from PIL import Image

from paddlemix.models.points_qwen2_5 import POINTSChatModel

model_path = "WePOINTS/POINTS-Qwen-2-5-7B-Chat"
Copy link
Collaborator

@lyuwenyu lyuwenyu Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块是不是应该用转换后的paddle的权重路径;建议使用argparser传入模型路径和图片路径

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经按照要求修改了


model = POINTSChatModel.from_pretrained(model_path)
tokenizer = Qwen2Tokenizer.from_pretrained(model_path)
image_processor = CLIPImageProcessor.from_pretrained(model_path)

image_path = "paddlemix/demo_images/minicpm_demo.jpeg"
pil_image = Image.open(image_path)
prompt = "please describe the image in detail"

generation_config = {
"max_new_tokens": 1024,
"temperature": 0.0,
"top_p": 0.0,
"num_beams": 1,
}
res = model.chat(pil_image, prompt, tokenizer, image_processor, True, generation_config)

print(res)
19 changes: 19 additions & 0 deletions paddlemix/models/points_qwen2_5/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# -*- coding: utf-8 -*-

# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# @Time : 2025/4/19 下午8:37
# @Author : zhaop-l(zhaopuzxjc@126.com)
from .modeling_points_chat import *
223 changes: 223 additions & 0 deletions paddlemix/models/points_qwen2_5/catty.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# -*- coding: utf-8 -*-

# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# @Time : 2025/4/19 下午8:37
# @Author : zhaop-l(zhaopuzxjc@126.com)

import os
from typing import List, Tuple

from PIL import Image

from .dynamic_high_resolution import factorize_number


def construct_mapping_dict(max_splits: int = 12) -> dict:
"""Construct a mapping dictionary for the given max_splits.

Args:
max_splits (int, optional): The maximum number of splits.
Defaults to 12.

Returns:
dict: A mapping dictionary for the given max_splits.
"""
mapping_dict = {}
for i in range(1, max_splits + 1):
factor_list = factorize_number(i)
for factor in factor_list:
ratio = factor[0] / factor[1]
if ratio not in mapping_dict:
mapping_dict[ratio] = [factor]
else:
mapping_dict[ratio].append(factor)
return mapping_dict


def save_image_list(image_list: List[Image.Image], save_folder: str) -> None:
"""Save a list of images to a folder.

Args:
image_list (List[Image.Image]): A list of images.
save_folder (str): The folder to save the images to.
"""
os.makedirs(save_folder, exist_ok=True)
for i, image in enumerate(image_list):
image.save(os.path.join(save_folder, f"{i}.png"))


def resize_to_best_size(
image: Image.Image,
best_slices: tuple,
width_slices: int,
height_slices: int,
sub_image_size: int,
) -> Image.Image:
"""Resize an image to the best size for the given number of slices.

Args:
image (Image.Image): The image to resize.
best_slices (tuple): The best number of slices for the image.
width_slices (int): The number of horizontal slices.
height_slices (int): The number of vertical slices.
sub_image_size (int): The size of the sub-images.

Returns:
Image.Image: The resized image.
"""
width, height = image.size
best_width_slices, best_height_slices = best_slices
if width_slices < height_slices:
new_image_width = best_width_slices * sub_image_size
new_image_height = int(height / width * new_image_width)
else:
new_image_height = best_height_slices * sub_image_size
new_image_width = int(width / height * new_image_height)
new_image = image.resize((new_image_width, new_image_height), resample=2)
return new_image


def compute_strides(height: int, width: int, sub_image_size: int, slices: Tuple[int, int]) -> Tuple[int, int]:
"""Compute the strides for the given image size and slices.

Args:
height (int): The height of the image.
width (int): The width of the image.
sub_image_size (int): The size of the sub-images.
slices (Tuple[int, int]): The number of horizontal and vertical slices.

Returns:
Tuple[int, int]: The strides for the given image size and slices.
"""
slice_width, slice_height = slices
if slice_width > 1:
stride_x = (width - sub_image_size) // (slice_width - 1)
else:
stride_x = 0
if slice_height > 1:
stride_y = (height - sub_image_size) // (slice_height - 1)
else:
stride_y = 0
return stride_x, stride_y


def sliding_window_crop(image: Image.Image, window_size: int, slices: Tuple[int, int]) -> List[Image.Image]:
"""Crop an image into sub-images using a sliding window.

Args:
image (Image.Image): The image to crop.
window_size (int): The size of the sub-images.
slices (Tuple[int, int]): The number of horizontal and vertical slices.

Returns:
List[Image]: A list of cropped images.
"""
width, height = image.size
stride_x, stride_y = compute_strides(height, width, window_size, slices)
sub_images = []
if stride_x == 0:
stride_x = window_size
if stride_y == 0:
stride_y = window_size
for y in range(0, height - window_size + 1, stride_y):
for x in range(0, width - window_size + 1, stride_x):
sub_image = image.crop((x, y, x + window_size, y + window_size))
sub_images.append(sub_image)
return sub_images


def find_best_slices(width_slices: int, height_slices: int, aspect_ratio: float, max_splits: int = 12) -> list:
"""Find the best slices for the given image size and aspect ratio.

Args:
width_slices (int): The number of horizontal slices.
height_slices (int): The number of vertical slices.
aspect_ratio (float): The aspect ratio of the image.
max_splits (int, optional): The maximum number of splits.
Defaults to 12.

Returns:
list: the best slices for the given image.
"""
mapping_dict = construct_mapping_dict(max_splits)
if aspect_ratio < 1:
mapping_dict = {k: v for k, v in mapping_dict.items() if k <= aspect_ratio}
elif aspect_ratio > 1:
mapping_dict = {k: v for k, v in mapping_dict.items() if k >= aspect_ratio}
best_ratio = min(mapping_dict.keys(), key=lambda x: abs(x - aspect_ratio))
best_image_sizes = mapping_dict[best_ratio]
best_slices = min(best_image_sizes, key=lambda x: abs(x[0] * x[1] - width_slices * height_slices))
return best_slices


def split_image_with_catty(
pil_image: Image.Image,
image_size: int = 336,
max_crop_slices: int = 8,
save_folder: str = None,
add_thumbnail: bool = True,
do_resize: bool = False,
**kwargs,
) -> List[Image.Image]:
"""Split an image into sub-images using Catty.

Args:
pil_image (Image.Image): The image to split.
image_size (int, optional): The size of the image.
Defaults to 336.
max_crop_slices (int, optional): The maximum number of slices.
Defaults to 8.
save_folder (str, optional): The folder to save the sub-images.
Defaults to None.
add_thumbnail (bool, optional): Whether to add a thumbnail.
Defaults to False.
do_resize (bool, optional): Whether to resize the image to fit the
maximum number of slices. Defaults to False.

Returns:
List[Image.Image]: A list of cropped images.
"""
width, height = pil_image.size
ratio = width / height
if ratio > max_crop_slices or ratio < 1 / max_crop_slices:
if do_resize:
print(f"Resizing image to fit maximum number of slices ({max_crop_slices})")
if width > height:
new_width = max_crop_slices * height
new_height = height
else:
new_width = width
new_height = max_crop_slices * width
pil_image = pil_image.resize((new_width, new_height), resample=2)
width, height = pil_image.size
ratio = width / height
else:
print(
f"Image aspect ratio ({ratio:.2f}) is out of range: ({1 / max_crop_slices:.2f}, {max_crop_slices:.2f})"
)
return None
width_slices = width / image_size
height_slices = height / image_size
best_slices = find_best_slices(width_slices, height_slices, ratio, max_crop_slices)
pil_image = resize_to_best_size(pil_image, best_slices, width_slices, height_slices, image_size)
width, height = pil_image.size
sub_images = sliding_window_crop(pil_image, image_size, best_slices)
if add_thumbnail:
thumbnail_image = pil_image.resize((image_size, image_size), resample=2)
sub_images.append(thumbnail_image)
if save_folder is not None:
save_image_list(sub_images, save_folder)
return sub_images
Loading