-
Notifications
You must be signed in to change notification settings - Fork 213
【PPMix No.4】POINTS-Qwen-2-5-7B-Chat推理对齐 #1241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
edcd17c
1d23b67
e68b591
a317165
e970b76
8c836bc
3261fca
ede4f0c
c7135ab
e9615a2
0750d65
ba5de17
dc39c52
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# MiniCPM-V-2_6 | ||
|
||
## 1. 模型介绍 | ||
|
||
[POINTS-Qwen](https://huggingface.co/WePOINTS/POINTS-Qwen-2-5-7B-Chat) 融合了视觉语言模型的最新研究进展,并采用了微信AI团队提出的前沿创新技术。 | ||
|
||
- **强大的基线**:将视觉-语言模型领域的最新进展,即CapFusion、双视觉编码器和动态高分辨率技术,整合到POINTS中 | ||
|
||
- **预训练数据集过滤**:提出使用困惑度(perplexity)作为指标来过滤预训练数据集。通过这种过滤策略,可以显著减少预训练数据集的规模,同时提升模型的性能。 | ||
|
||
- **模型融合(Model Soup)**:提出对使用不同视觉指令微调数据集进行微调的模型应用模型融合技术,这可以进一步显著提升模型的性能。 | ||
|
||
**本仓库支持的模型权重:** | ||
|
||
| Model | | ||
|--------------------| | ||
| WePOINTS/POINTS-Qwen-2-5-7B-Chat | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 权重是上传了嘛? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 权重怎么上传呢?哈哈,那个参考文档写的太简略了 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 哈哈问题不大;这个参数是torch转过来的嘛 可以把转参数的脚步上传一下 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 已经按照要求修改 |
||
|
||
|
||
## 2 环境准备 | ||
1)[安装PaddlePaddle](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle) | ||
- **python >= 3.10** | ||
- **paddlepaddle-gpu 要求是3.0.0b2或develop版本** | ||
```bash | ||
# 提供三种 PaddlePaddle 安装命令示例,也可参考PaddleMIX主页的安装教程进行安装 | ||
|
||
# 3.0.0b2版本安装示例 (CUDA 11.8) | ||
python -m pip install paddlepaddle-gpu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 建议升级到 3.0.0b4也试试 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
# Develop 版本安装示例 | ||
python -m pip install paddlepaddle-gpu==0.0.0.post118 -f https://www.paddlepaddle.org.cn/whl/linux/gpu/develop.html | ||
|
||
# sh 脚本快速安装 | ||
sh build_paddle_env.sh | ||
``` | ||
|
||
2)[安装PaddleMIX环境依赖包](https://github.com/PaddlePaddle/PaddleMIX?tab=readme-ov-file#3-%EF%B8%8F%E5%AE%89%E8%A3%85paddlepaddle) | ||
- **paddlenlp >= 3.0.0b3** | ||
|
||
```bash | ||
# 提供两种 PaddleMIX 依赖安装命令示例 | ||
|
||
# pip 安装示例,安装paddlemix、ppdiffusers、项目依赖、paddlenlp | ||
python -m pip install -e . --user | ||
python -m pip install -e ppdiffusers --user | ||
python -m pip install -r requirements.txt --user | ||
python -m pip install paddlenlp==3.0.0b3 --user | ||
|
||
# sh 脚本快速安装 | ||
sh build_env.sh | ||
``` | ||
|
||
> 注: | ||
* 请确保安装了以上依赖,否则无法运行。同时,需要安装 paddlemix/external_ops 下的自定义OP, `python setup.py install`。如果安装后仍然找不到算子,需要额外设置PYTHONPATH | ||
* (默认开启flash_attn)使用flash_attn 要求A100/A800显卡或者H20显卡。V100请用float16推理。 | ||
|
||
## 3 快速开始 | ||
|
||
### 推理 | ||
```bash | ||
# 单图推理 | ||
python paddlemix/examples/points_qwen2_5/image_infer.py | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 建议把推理结果也放一下 |
||
``` | ||
|
||
### 参考文献 | ||
```BibTeX | ||
@article{liu2024points, | ||
title={POINTS: Improving Your Vision-language Model with Affordable Strategies}, | ||
author={Liu, Yuan and Zhao, Zhongyin and Zhuang, Ziyuan and Tian, Le and Zhou, Xiao and Zhou, Jie}, | ||
journal={arXiv preprint arXiv:2409.04828}, | ||
year={2024} | ||
} | ||
|
||
@article{liu2024rethinking, | ||
title={Rethinking Overlooked Aspects in Vision-Language Models}, | ||
author={Liu, Yuan and Tian, Le and Zhou, Xiao and Zhou, Jie}, | ||
journal={arXiv preprint arXiv:2405.11850}, | ||
year={2024} | ||
} | ||
|
||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# @Time : 2025/4/19 下午8:37 | ||
# @Author : zhaop-l(zhaopuzxjc@126.com) | ||
|
||
from paddlenlp.transformers import CLIPImageProcessor, Qwen2Tokenizer | ||
from PIL import Image | ||
|
||
from paddlemix.models.points_qwen2_5 import POINTSChatModel | ||
|
||
model_path = "WePOINTS/POINTS-Qwen-2-5-7B-Chat" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这块是不是应该用转换后的paddle的权重路径;建议使用argparser传入模型路径和图片路径 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 已经按照要求修改了 |
||
|
||
model = POINTSChatModel.from_pretrained(model_path) | ||
tokenizer = Qwen2Tokenizer.from_pretrained(model_path) | ||
image_processor = CLIPImageProcessor.from_pretrained(model_path) | ||
|
||
image_path = "paddlemix/demo_images/minicpm_demo.jpeg" | ||
pil_image = Image.open(image_path) | ||
prompt = "please describe the image in detail" | ||
|
||
generation_config = { | ||
"max_new_tokens": 1024, | ||
"temperature": 0.0, | ||
"top_p": 0.0, | ||
"num_beams": 1, | ||
} | ||
res = model.chat(pil_image, prompt, tokenizer, image_processor, True, generation_config) | ||
|
||
print(res) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# @Time : 2025/4/19 下午8:37 | ||
# @Author : zhaop-l(zhaopuzxjc@126.com) | ||
from .modeling_points_chat import * |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,223 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# @Time : 2025/4/19 下午8:37 | ||
# @Author : zhaop-l(zhaopuzxjc@126.com) | ||
|
||
import os | ||
from typing import List, Tuple | ||
|
||
from PIL import Image | ||
|
||
from .dynamic_high_resolution import factorize_number | ||
|
||
|
||
def construct_mapping_dict(max_splits: int = 12) -> dict: | ||
"""Construct a mapping dictionary for the given max_splits. | ||
|
||
Args: | ||
max_splits (int, optional): The maximum number of splits. | ||
Defaults to 12. | ||
|
||
Returns: | ||
dict: A mapping dictionary for the given max_splits. | ||
""" | ||
mapping_dict = {} | ||
for i in range(1, max_splits + 1): | ||
factor_list = factorize_number(i) | ||
for factor in factor_list: | ||
ratio = factor[0] / factor[1] | ||
if ratio not in mapping_dict: | ||
mapping_dict[ratio] = [factor] | ||
else: | ||
mapping_dict[ratio].append(factor) | ||
return mapping_dict | ||
|
||
|
||
def save_image_list(image_list: List[Image.Image], save_folder: str) -> None: | ||
"""Save a list of images to a folder. | ||
|
||
Args: | ||
image_list (List[Image.Image]): A list of images. | ||
save_folder (str): The folder to save the images to. | ||
""" | ||
os.makedirs(save_folder, exist_ok=True) | ||
for i, image in enumerate(image_list): | ||
image.save(os.path.join(save_folder, f"{i}.png")) | ||
|
||
|
||
def resize_to_best_size( | ||
image: Image.Image, | ||
best_slices: tuple, | ||
width_slices: int, | ||
height_slices: int, | ||
sub_image_size: int, | ||
) -> Image.Image: | ||
"""Resize an image to the best size for the given number of slices. | ||
|
||
Args: | ||
image (Image.Image): The image to resize. | ||
best_slices (tuple): The best number of slices for the image. | ||
width_slices (int): The number of horizontal slices. | ||
height_slices (int): The number of vertical slices. | ||
sub_image_size (int): The size of the sub-images. | ||
|
||
Returns: | ||
Image.Image: The resized image. | ||
""" | ||
width, height = image.size | ||
best_width_slices, best_height_slices = best_slices | ||
if width_slices < height_slices: | ||
new_image_width = best_width_slices * sub_image_size | ||
new_image_height = int(height / width * new_image_width) | ||
else: | ||
new_image_height = best_height_slices * sub_image_size | ||
new_image_width = int(width / height * new_image_height) | ||
new_image = image.resize((new_image_width, new_image_height), resample=2) | ||
return new_image | ||
|
||
|
||
def compute_strides(height: int, width: int, sub_image_size: int, slices: Tuple[int, int]) -> Tuple[int, int]: | ||
"""Compute the strides for the given image size and slices. | ||
|
||
Args: | ||
height (int): The height of the image. | ||
width (int): The width of the image. | ||
sub_image_size (int): The size of the sub-images. | ||
slices (Tuple[int, int]): The number of horizontal and vertical slices. | ||
|
||
Returns: | ||
Tuple[int, int]: The strides for the given image size and slices. | ||
""" | ||
slice_width, slice_height = slices | ||
if slice_width > 1: | ||
stride_x = (width - sub_image_size) // (slice_width - 1) | ||
else: | ||
stride_x = 0 | ||
if slice_height > 1: | ||
stride_y = (height - sub_image_size) // (slice_height - 1) | ||
else: | ||
stride_y = 0 | ||
return stride_x, stride_y | ||
|
||
|
||
def sliding_window_crop(image: Image.Image, window_size: int, slices: Tuple[int, int]) -> List[Image.Image]: | ||
"""Crop an image into sub-images using a sliding window. | ||
|
||
Args: | ||
image (Image.Image): The image to crop. | ||
window_size (int): The size of the sub-images. | ||
slices (Tuple[int, int]): The number of horizontal and vertical slices. | ||
|
||
Returns: | ||
List[Image]: A list of cropped images. | ||
""" | ||
width, height = image.size | ||
stride_x, stride_y = compute_strides(height, width, window_size, slices) | ||
sub_images = [] | ||
if stride_x == 0: | ||
stride_x = window_size | ||
if stride_y == 0: | ||
stride_y = window_size | ||
for y in range(0, height - window_size + 1, stride_y): | ||
for x in range(0, width - window_size + 1, stride_x): | ||
sub_image = image.crop((x, y, x + window_size, y + window_size)) | ||
sub_images.append(sub_image) | ||
return sub_images | ||
|
||
|
||
def find_best_slices(width_slices: int, height_slices: int, aspect_ratio: float, max_splits: int = 12) -> list: | ||
"""Find the best slices for the given image size and aspect ratio. | ||
|
||
Args: | ||
width_slices (int): The number of horizontal slices. | ||
height_slices (int): The number of vertical slices. | ||
aspect_ratio (float): The aspect ratio of the image. | ||
max_splits (int, optional): The maximum number of splits. | ||
Defaults to 12. | ||
|
||
Returns: | ||
list: the best slices for the given image. | ||
""" | ||
mapping_dict = construct_mapping_dict(max_splits) | ||
if aspect_ratio < 1: | ||
mapping_dict = {k: v for k, v in mapping_dict.items() if k <= aspect_ratio} | ||
elif aspect_ratio > 1: | ||
mapping_dict = {k: v for k, v in mapping_dict.items() if k >= aspect_ratio} | ||
best_ratio = min(mapping_dict.keys(), key=lambda x: abs(x - aspect_ratio)) | ||
best_image_sizes = mapping_dict[best_ratio] | ||
best_slices = min(best_image_sizes, key=lambda x: abs(x[0] * x[1] - width_slices * height_slices)) | ||
return best_slices | ||
|
||
|
||
def split_image_with_catty( | ||
pil_image: Image.Image, | ||
image_size: int = 336, | ||
max_crop_slices: int = 8, | ||
save_folder: str = None, | ||
add_thumbnail: bool = True, | ||
do_resize: bool = False, | ||
**kwargs, | ||
) -> List[Image.Image]: | ||
"""Split an image into sub-images using Catty. | ||
|
||
Args: | ||
pil_image (Image.Image): The image to split. | ||
image_size (int, optional): The size of the image. | ||
Defaults to 336. | ||
max_crop_slices (int, optional): The maximum number of slices. | ||
Defaults to 8. | ||
save_folder (str, optional): The folder to save the sub-images. | ||
Defaults to None. | ||
add_thumbnail (bool, optional): Whether to add a thumbnail. | ||
Defaults to False. | ||
do_resize (bool, optional): Whether to resize the image to fit the | ||
maximum number of slices. Defaults to False. | ||
|
||
Returns: | ||
List[Image.Image]: A list of cropped images. | ||
""" | ||
width, height = pil_image.size | ||
ratio = width / height | ||
if ratio > max_crop_slices or ratio < 1 / max_crop_slices: | ||
if do_resize: | ||
print(f"Resizing image to fit maximum number of slices ({max_crop_slices})") | ||
if width > height: | ||
new_width = max_crop_slices * height | ||
new_height = height | ||
else: | ||
new_width = width | ||
new_height = max_crop_slices * width | ||
pil_image = pil_image.resize((new_width, new_height), resample=2) | ||
width, height = pil_image.size | ||
ratio = width / height | ||
else: | ||
print( | ||
f"Image aspect ratio ({ratio:.2f}) is out of range: ({1 / max_crop_slices:.2f}, {max_crop_slices:.2f})" | ||
) | ||
return None | ||
width_slices = width / image_size | ||
height_slices = height / image_size | ||
best_slices = find_best_slices(width_slices, height_slices, ratio, max_crop_slices) | ||
pil_image = resize_to_best_size(pil_image, best_slices, width_slices, height_slices, image_size) | ||
width, height = pil_image.size | ||
sub_images = sliding_window_crop(pil_image, image_size, best_slices) | ||
if add_thumbnail: | ||
thumbnail_image = pil_image.resize((image_size, image_size), resample=2) | ||
sub_images.append(thumbnail_image) | ||
if save_folder is not None: | ||
save_image_list(sub_images, save_folder) | ||
return sub_images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文档里这个名字也改一下?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
尬住了,没注意到