Skip to content

Add Cpp Doc Generate tools #5900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 13, 2023
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions ci_scripts/CAPItools/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# CAPI tools
CAPI tools 用于一键生成 C++ 的 rst 文档。

## 调用方式
```python
python main.py [source dir] [target dir]
```

其中:
- source dir 是安装后的 Paddle C++ API 声明路径。 例如`venv/Lib/site-packages/paddle/include/paddle`。
- target dir 目标文件保存路径。

最终生成结果如下所示:
```python
target dir
| -cn
|- index.rst
|- Paddle
|- fluid
|- phi
|- ...
| -en
|- index.rst
|- Paddle
|- fluid
|- phi
|- ...
```

## 获取最新 PaddlePaddle
pip install python -m pip install paddlepaddle==0.0.0 -f https://www.paddlepaddle.org.cn/whl/windows/cpu-mkl-avx/develop.html

## 特别说明
有少量报错为正常显现,将在后续修正
140 changes: 140 additions & 0 deletions ci_scripts/CAPItools/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
# python main.py [source dir] [target dir]
# python main.py ../paddle .


import CppHeaderParser
import json
import os
import traceback
import sys

from utils_helper import func_helper, class_helper, generate_overview
from utils import get_PADDLE_API_class, get_PADDLE_API_func

# TODO 通过已安装的 paddle 来查找 include
# import paddle
# import inspect
#
# # 获取已安装paddle的路径
# print(os.path.dirname(inspect.getsourcefile(paddle)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个功能挺好的,避免重复安装 paddle,后续会跟进吗?现在解析 rst 的逻辑是,每次解析前要都要执行下安装 paddle 的命令吗?

Copy link
Member

@gouzil gouzil Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个目前是手动导入.h文件, 在设计时有讨论到看看是否能在ci中预留PR-CI-Build或者PR-CI-Py3编译后的包(因为在生成文档时编译整个项目其实有点浪费资源)

同时我们引入了另一个解决方案, 在运行时带参数

python main.py [source dir] [target dir]
python main.py ../paddle .
if __name__ == "__main__":
    if len(sys.argv) == 3:
        root_dir = sys.argv[1]
        save_dir = sys.argv[2]
    else:
        # for simple run
        root_dir = '../paddle'
        save_dir = '.'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看看是否能在ci中预留PR-CI-Build或者PR-CI-Py3编译后的包(因为在生成文档时编译整个项目其实有点浪费资源)

官网展示应该不要求这么高的实效性,根据 nightly build 包解析应该就行。
此处疑惑的点在于,paddle doc 生成 python API 英文文档的时候,应该已经安装了一次 paddle 的包,是否能复用这个包的路径。如果没办法复用的话,要在脚本里运行 安装 paddle 的命令吗?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, 等 @Liyulingyue 更新

现在的优先级是参数->已安装的paddle环境->同目录下的paddle



# TODO 需要单独处理一下这种
"""
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
/**
* Get the current CUDA stream for the passed CUDA device.
*/
PADDLE_API phi::CUDAStream* GetCurrentCUDAStream(const phi::Place& place);
#endif
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我没有看到,对这种 pattern 做了特殊匹配

如果 TODO 已经完成,可以更新下

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这边我们讨论了一下会在注释中处理, 解析是可以正常解析的, 就是说明一下只能在gpu环境下使用



# 获取namespace
# 多线程使用并不安全, 请不要使用多线程
def analysis_file(path):
header = CppHeaderParser.CppHeader(path, encoding='utf8')
data = json.loads(header.toJSON())
return data


# 生成文件
def generate_docs(
all_funcs, all_class, cpp2py_api_list, save_dir, LANGUAGE="cn"
):
for item in all_funcs:
path = item["filename"].replace("../", "").replace(".h", "")
dir_path = os.path.join(save_dir, LANGUAGE, path)
if not os.path.exists(dir_path):
os.makedirs(dir_path)

# 这个反斜杠需要单独处理, 在 linux 下
func_name = item["name"].replace("/", "")
rst_dir = os.path.join(save_dir, LANGUAGE, path, func_name + ".rst")
# avoid a filename such as operate*.rst, only windows
try:
helper = func_helper(item, cpp2py_api_list)
helper.create_file(rst_dir, LANGUAGE)
except:
print(traceback.format_exc())
print('FAULT GENERATE:' + rst_dir)

for item in all_class:
path = item["filename"].replace("../", "").replace(".h", "")
dir_path = os.path.join(save_dir, LANGUAGE, path)
if not os.path.exists(dir_path):
os.makedirs(dir_path)

func_name = item["name"].replace("PADDLE_API", "")
rst_dir = os.path.join(save_dir, LANGUAGE, path, func_name + ".rst")
try:
helper = class_helper(item)
helper.create_file(rst_dir, LANGUAGE)
except:
print(traceback.format_exc())
print('FAULT GENERATE:' + rst_dir)


# cpp 对应 python api
def cpp2py(data: dict):
cpp2py_api_list = []
for i in data["using"]:
cpp2py_api_list.append(i.replace("paddle::", ""))

return cpp2py_api_list


if __name__ == "__main__":
assert len(sys.argv) == 3

root_dir = sys.argv[1]
save_dir = sys.argv[2]

all_funcs = []
all_class = []
cpp2py_api_list = []
overview_list = []
for home, dirs, files in os.walk(root_dir):
for file_name in files:
file_path = os.path.join(home, file_name)
# 处理 cpp 和 py api对应的文件
if file_name == "tensor_compat.h":
cpp2py_data = analysis_file(file_path)
cpp2py_api_list = cpp2py(cpp2py_data).copy()

# 跳过文件中未包含PADDLE_API
with open(file_path, encoding='utf8') as f:
if 'PADDLE_API ' not in f.read():
continue

print("Parsing: ", file_path)
data = analysis_file(file_path)

# 信息抽取
current_func = get_PADDLE_API_func(data)
current_class = get_PADDLE_API_class(data)

# 信息记录
all_funcs.extend(current_func)
all_class.extend(current_class)
overview_list.append(
{
'h_file': file_path,
'class': current_class,
'function': current_func,
}
)

generate_docs(all_funcs, all_class, cpp2py_api_list, save_dir, "cn")
generate_docs(all_funcs, all_class, cpp2py_api_list, save_dir, "en")

# TODO: delete the try-except after every thing is prepare
try:
generate_overview(overview_list, save_dir, "cn")
generate_overview(overview_list, save_dir, "en")
except:
print('index error')

print("PADDLE_API func count: ", len(all_funcs))
print("PADDLE_API class count: ", len(all_class))
print("cpp2py api count: ", len(cpp2py_api_list))
2 changes: 2 additions & 0 deletions ci_scripts/CAPItools/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
robotpy-cppheaderparser==5.1.0
# paddle
84 changes: 84 additions & 0 deletions ci_scripts/CAPItools/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# 获取存在 PADDLE_API func 数组的名称
def get_PADDLE_API_func(data: dict):
result = []
for i in data["functions"]:
if 'PADDLE_API' in i['debug']:
result.append(i)
return result


# 获取存在 PADDLE_API class 数组的名称
def get_PADDLE_API_class(data: dict):
result = []
for classname in data["classes"]:
# TODO 目前没有 PADDLE_API 是 struct 的
if data["classes"][classname]["declaration_method"] == "struct":
continue

# TODO 这里需要处理一下, 因为类名和 PADDLE_API 会粘在一起, 例: PADDLE_APIDeviceContextPool
if "PADDLE_API" in classname:
result.append(data["classes"][classname])
return result


# 获取方法中的参数parameters
def get_parameters(parameters):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

提个小建议哈:每个函数上面的注释可以写的更详细些,对于降低后续的维护成本有很大帮助,尤其是对于这种解析文本类的函数,是不是可以在函数前面注释下,方便结合示例理解代码,比如:
解析前:***
解析后:***

# parameter_api = "" # 这里解析是给api使用的 (暂时不用)
parameter_dict = {}
for i in parameters:
parameter_type_tmp = i['type'].replace(" &", "").replace(" *", "")
# * 和 & 情况
# parameter_api += parameter_type_tmp
if i["reference"] == 1:
# parameter_api += "&"
parameter_type_tmp += "&"
if i["pointer"] == 1:
# parameter_api += "*"
parameter_type_tmp += "*"
if i["constant"] == 1 and not parameter_type_tmp.startswith('const'):
parameter_type_tmp = "const " + parameter_type_tmp
# parameter_api += f" {i['name']}, "
desc = i.get('desc', '').replace(' ', '')

# special progress for none parameter name case
if i['name'] == '&':
continue
else:
parameter_dict[i['name']] = {
'type': parameter_type_tmp,
'intro': desc,
}
# parameter += f"\t- **{i['name']}** ({parameter_type_tmp}) - {desc}\n"
# 去掉末尾的逗号
# parameter_api = parameter_api[:-2]
# return parameter, parameter_api
return parameter_dict


def parse_doxygen(doxygen):
doxygen_dict = {
'intro': '',
'returns': '',
'param_intro': {},
'note': '',
}

if '@' in doxygen:
doxygen = doxygen[doxygen.find('@') :]
for doxygen_part in doxygen.split('@'):
if doxygen_part.startswith('brief '):
doxygen_dict['intro'] = doxygen_part.replace('brief ', '', 1)
elif doxygen_part.startswith('return '):
doxygen_dict['returns'] = doxygen_part.replace('return ', '', 1)
elif doxygen_part.startswith('param '):
param_intro = doxygen_part.replace('param ', '', 1)
param_name = param_intro[: param_intro.find(' ')]
doxygen_dict['param_intro'][param_name] = param_intro[
param_intro.find(' ') + 1 :
]
elif doxygen_part.startswith('note '):
doxygen_dict['note'] = doxygen_part.replace('note ', '', 1)
else:
pass

return doxygen_dict
Loading