Skip to content

[CI] Add kunlun and change name with XPU #72068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 83 commits into from
Apr 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
2d287f3
test
swgu98 Apr 2, 2025
a6ad67a
test
swgu98 Apr 3, 2025
8e2a63b
test=document_fix
swgu98 Apr 3, 2025
6eeb38e
test=document-fix
swgu98 Apr 7, 2025
4fda625
test=document_fix
swgu98 Apr 7, 2025
c61dcc2
test=document_fix
swgu98 Apr 7, 2025
1764eae
test=document_fix
swgu98 Apr 7, 2025
ae4e5ec
test=document_fix
swgu98 Apr 7, 2025
b9890e3
test=document_fix
swgu98 Apr 7, 2025
d8f177f
test=document_fix
swgu98 Apr 7, 2025
9206ba6
test=document_fix
swgu98 Apr 7, 2025
a04487a
test=document_fix
swgu98 Apr 7, 2025
e30b7cd
test=document_fix
swgu98 Apr 7, 2025
1038121
test=document_fix
swgu98 Apr 7, 2025
e2db7ab
test=document_fix safe submodule
swgu98 Apr 8, 2025
2fc7c0a
test
swgu98 Apr 8, 2025
a4b89e5
test=document_fix, build
swgu98 Apr 8, 2025
9e65c12
test=document_fix build no tag
swgu98 Apr 8, 2025
4497d54
test=document_fix nospace rmswap
swgu98 Apr 8, 2025
95775ab
test=document_fix all args
swgu98 Apr 8, 2025
0eff947
test=document_fix ssd1
swgu98 Apr 9, 2025
3ed38b4
test=document_fix proxy
swgu98 Apr 9, 2025
f96a96e
test=document_fix ulimit
swgu98 Apr 9, 2025
87508f6
test=document_fix ulimit
swgu98 Apr 9, 2025
a02bfa9
test=document_fix test
swgu98 Apr 9, 2025
52c459f
test=document_fix test
swgu98 Apr 9, 2025
05b9981
test=document_fix tar
swgu98 Apr 9, 2025
b066bb7
test=document_fix run0
swgu98 Apr 9, 2025
e7537b4
test=document_fix run0
swgu98 Apr 9, 2025
94e208f
test=document_fix run0
swgu98 Apr 9, 2025
e723857
test=document_fix
swgu98 Apr 9, 2025
38ff8e4
test=document_fix test docker
swgu98 Apr 10, 2025
213d91b
test=document_fix no cap-add
swgu98 Apr 10, 2025
effc1a6
test=document_fix no cap-add
swgu98 Apr 10, 2025
62854fc
test=document_fix docker no all
swgu98 Apr 10, 2025
366dcd3
test=document_fix docker no all
swgu98 Apr 10, 2025
2f17812
test=document_fix docker no all no core
swgu98 Apr 10, 2025
90c44f2
test=document_fix docker no all no core
swgu98 Apr 10, 2025
48e4150
test=document_fix docker no all no core no device
swgu98 Apr 10, 2025
0440000
test=document_fix docker no all no core no device
swgu98 Apr 10, 2025
1d08a39
test=document_fix mkdir 0
swgu98 Apr 10, 2025
990ddde
test=document_fix
swgu98 Apr 10, 2025
96ceb92
test=document_fix fix conflict
swgu98 Apr 10, 2025
21394f8
test=document_fix all
swgu98 Apr 10, 2025
fe3b87f
test=dpcument_fix
swgu98 Apr 10, 2025
3bf0b86
test=dpcument_fix
swgu98 Apr 10, 2025
8deff66
test=dpcument_fix
swgu98 Apr 10, 2025
31f88e9
test=document_fix xpu docker path
swgu98 Apr 11, 2025
8092d12
test=document_fix cd P
swgu98 Apr 14, 2025
3cfaf6b
test=document_fix ci
swgu98 Apr 14, 2025
ae88820
test=document_fix rerun
swgu98 Apr 14, 2025
b5ba2ec
test=document_fix all
swgu98 Apr 14, 2025
199a0d5
test=document_fix ci name
swgu98 Apr 14, 2025
9e4e818
test=document_fix merge develop
swgu98 Apr 14, 2025
20690c4
test=document_fix upload
swgu98 Apr 14, 2025
723bc87
test=document_fix upload aksk
swgu98 Apr 14, 2025
2153b9c
test=document_fix bos
swgu98 Apr 14, 2025
6cb4ff1
test=document_fix no build
swgu98 Apr 15, 2025
6c1bb88
test=document_fix download noxly
swgu98 Apr 15, 2025
c44c24a
test=document_fix nobuild
swgu98 Apr 15, 2025
ff40431
test=document_fix nobuild
swgu98 Apr 15, 2025
3907a5a
test=document_fix nobuild
swgu98 Apr 15, 2025
07bab38
test=document_fix ssd1 cache
swgu98 Apr 15, 2025
4c106ef
Merge branch 'develop' into kunlun
swgu98 Apr 15, 2025
140490d
test=document_fix merge develop
swgu98 Apr 15, 2025
f15abf7
test=document_fix merge develop
swgu98 Apr 15, 2025
9e3b471
test=document_fix all
swgu98 Apr 15, 2025
15e1745
test=document_fix out up
swgu98 Apr 15, 2025
9e10849
test=document_fix out up
swgu98 Apr 15, 2025
c30892b
test=document_fix dist
swgu98 Apr 16, 2025
a58543d
test=document_fix dist
swgu98 Apr 16, 2025
4e1fc4d
test=document_fix FLAGS_use_stride_kernel
swgu98 Apr 16, 2025
c9e1f6b
test=document_fix x
swgu98 Apr 16, 2025
c3c4963
test=document_fix
swgu98 Apr 16, 2025
01f91db
merge dev
swgu98 Apr 17, 2025
c9c21b4
merge rerun
swgu98 Apr 17, 2025
07415c2
no ""
swgu98 Apr 17, 2025
3645084
sudo mkdir 0
swgu98 Apr 18, 2025
4f37b4a
test=document_fix rename
swgu98 Apr 21, 2025
756654c
test=document_fix rename
swgu98 Apr 21, 2025
7784e98
test=document_fix ci
swgu98 Apr 21, 2025
e36bd59
test=document_fix rerun
swgu98 Apr 21, 2025
9385e87
test=document_fix merge dev
swgu98 Apr 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ jobs:
uses: ./.github/workflows/_SOT.yml
needs: clone

xpu:
name: Linux-XPU
uses: ./.github/workflows/_Linux-XPU.yml
needs: clone

inference:
name: PR-CI-Inference
uses: ./.github/workflows/_Inference.yml
Expand Down
266 changes: 266 additions & 0 deletions .github/workflows/_Linux-XPU.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
name: Linux-XPU

on:
workflow_call:

env:
dockerfile: dockerfile
docker_image: aa13dc110ab3
PR_ID: ${{ github.event.pull_request.number }}
COMMIT_ID: ${{ github.event.pull_request.head.sha }}
ci_scripts: /paddle/ci
ci_scripts_runner: ${{ github.workspace }}/ci
work_dir: /paddle
PADDLE_ROOT: /paddle
BRANCH: ${{ github.event.pull_request.base.ref }}
CI_name: xpu

defaults:
run:
shell: bash

jobs:
check-bypass:
name: Check bypass for XPU
uses: ./.github/workflows/check-bypass.yml
with:
workflow-name: 'xpu'
secrets:
github-token: ${{ secrets.GITHUB_TOKEN }}

build:
name: Build
needs: check-bypass
if: ${{ github.repository_owner == 'PaddlePaddle' && needs.check-bypass.outputs.can-skip != 'true' }}
env:
TASK: paddle-CI-${{ github.event.pull_request.number }}-xpu_build
runs-on:
group: Kunlun-CPU

steps:
- name: Download paddle.tar.gz and update test branch
run: |
set -e
echo "Downloading Paddle.tar.gz"
wget -q --no-proxy https://paddle-github-action.bj.bcebos.com/PR/Paddle/${PR_ID}/${COMMIT_ID}/Paddle.tar.gz --no-check-certificate
echo "Extracting Paddle.tar.gz"
tar xf Paddle.tar.gz --strip-components=1
rm Paddle.tar.gz
git config --global user.name "PaddleCI"
git config --global user.email "paddle_ci@example.com"
git remote add upstream https://github.com/PaddlePaddle/Paddle.git
source ${{ github.workspace }}/../../../proxy
git config pull.rebase false
git checkout test
echo "Pull upstream develop or target branch"
git pull upstream $BRANCH --no-edit

- name: Check docker image and run container
env:
WITH_SHARED_PHI: "ON"
WITH_XPU: "ON"
COVERALLS_UPLOAD: "OFF"
WITH_AVX: "OFF"
GIT_PR_ID: ${{ github.event.pull_request.number }}
PADDLE_VERSION: 0.0.0
WITH_TESTING: "ON"
WITH_DISTRIBUTE: "ON"
PY_VERSION: "3.10"
XPU_VISIBLE_DEVICES: "0,1"
CUDA_VERSION:
CUDNN_VERSION:
WITH_XPU_BKCL: "ON"
WITH_XPU_XRE5: "ON"
CACHE_DIR: /root/.cache
CCACHE_DIR: /root/.ccache
CCACHE_MAXSIZE: 150G
CCACHE_LIMIT_MULTIPLE: 0.8
no_proxy: "bcebos.com,apiin.im.baidu.com,gitee.com,aliyun.com,.baidu.com,.tuna.tsinghua.edu.cn"
GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}
home_dir: ${{ github.workspace }}/../../../..
run: |
container_name=${TASK}-$(date +%Y%m%d-%H%M%S)
echo "container_name=${container_name}" >> ${{ github.env }}
docker run --privileged --ulimit nofile=102400:102400 -d -t --name ${container_name} \
-v $home_dir/.cache:/root/.cache \
-v $home_dir/.ccache:/root/.ccache \
-v ${{ github.workspace }}/../../..:${{ github.workspace }}/../../.. \
-v ${{ github.workspace }}:/paddle \
-e BRANCH \
-e PR_ID \
-e COMMIT_ID \
-e work_dir \
-e PADDLE_ROOT \
-e WITH_SHARED_PHI \
-e WITH_XPU \
-e COVERALLS_UPLOAD \
-e GIT_PR_ID \
-e PADDLE_VERSION \
-e WITH_TESTING \
-e WITH_DISTRIBUTE \
-e PY_VERSION \
-e XPU_VISIBLE_DEVICES \
-e CUDA_VERSION \
-e CUDNN_VERSION \
-e WITH_XPU_BKCL \
-e WITH_XPU_XRE5 \
-e WITH_INFERENCE_API_TEST \
-e CACHE_DIR \
-e CCACHE_DIR \
-e CCACHE_MAXSIZE \
-e CCACHE_LIMIT_MULTIPLE \
-e ci_scripts \
-e WITH_AVX \
-e no_proxy \
-e GITHUB_API_TOKEN \
-w /paddle --network host ${docker_image} /bin/bash

- name: Run build
env:
work_dir: ${{ github.workspace }}
PADDLE_ROOT: ${{ github.workspace }}
run: |
docker exec -t ${{ env.container_name }} /bin/bash -c '
source ${{ github.workspace }}/../../../proxy
ulimit -n 102400
git config --global --add safe.directory ${work_dir}
git submodule foreach "git config --global --add safe.directory \$toplevel/\$sm_path"
bash -x ${ci_scripts}/run_setup.sh bdist_wheel
EXCODE=$?
exit $EXCODE
'

- name: Upload build.tar.gz and paddle_whl to bos
env:
AK: paddle
SK: paddle
home_path: ${{ github.workspace }}/..
bos_file: ${{ github.workspace }}/../bos/BosClient.py
paddle_whl: paddlepaddle_xpu-0.0.0-cp310-cp310-linux_x86_64.whl
run: |
if [ ! -f "${{ env.bos_file }}" ]; then
wget -q --no-proxy -O ${{ env.home_path }}/bos_new.tar.gz https://xly-devops.bj.bcebos.com/home/bos_new.tar.gz --no-check-certificate
mkdir ${{ env.home_path }}/bos
tar xf ${{ env.home_path }}/bos_new.tar.gz -C ${{ env.home_path }}/bos
fi
cd ..
tar --use-compress-program="pigz" -cpf build.tar.gz Paddle
# source /home/opt/deck/1.0/etc/bashrc
python3 ${bos_file} build.tar.gz paddle-github-action/PR/xpu/${{ env.PR_ID }}/${{ env.COMMIT_ID }}
rm build.tar.gz
cp ${{ github.workspace }}/dist/$paddle_whl .
python3 ${bos_file} ${paddle_whl} paddle-github-action/PR/xpu/${{ env.PR_ID }}/${{ env.COMMIT_ID }}
rm ${paddle_whl}

- name: Terminate and delete the container
if: always()
run: |
docker exec -t ${container_name} /bin/bash -c 'rm -rf * .[^.]*'
docker stop ${container_name}
docker rm ${container_name}

test:
name: Test
needs: build
env:
TASK: paddle-CI-${{ github.event.pull_request.number }}-xpu_test
runs-on:
group: Kunlun

steps:

- name: Download build.tar.gz
run: |
sudo rm -rf * .[^.]*
wget -q --no-proxy https://paddle-github-action.bj.bcebos.com/PR/xpu/${PR_ID}/${COMMIT_ID}/build.tar.gz --no-check-certificate
# wget -q --no-proxy https://paddle-github-action.bj.bcebos.com/PR/xpu/72068/a58543d2f92bbe5c817e185e9cc24b549c919601/build.tar.gz --no-check-certificate
tar --use-compress-program="pigz" -xpf build.tar.gz --strip-components=1
rm build.tar.gz

- name: Determine the runner
run: |
runner_name=`(echo $PWD|awk -F '/' '{print $3}')`
echo $runner_name
source ${ci_scripts_runner}/utils.sh
determine_kunlun_runner ${runner_name}

- name: Check docker image and run container
env:
WITH_XPU: "ON"
COVERALLS_UPLOAD: "OFF"
CMAKE_BUILD_TYPE: Release
WITH_AVX: "OFF"
GIT_PR_ID: ${{ github.event.pull_request.number }}
PADDLE_VERSION: 0.0.0
WITH_TESTING: "ON"
WITH_DISTRIBUTE: "ON"
PY_VERSION: "3.10"
XPU_VISIBLE_DEVICES: ${{ env.CUDA_VISIBLE_DEVICES }}
CUDA_VISIBLE_DEVICES: ${{ env.CUDA_VISIBLE_DEVICES }}
CUDA_VERSION:
CUDNN_VERSION:
WITH_XPU_BKCL: "ON"
CACHE_DIR: /root/.cache
CCACHE_DIR: /root/.ccache
CCACHE_MAXSIZE: 150G
CCACHE_LIMIT_MULTIPLE: 0.8
no_proxy: "bcebos.com,apiin.im.baidu.com,gitee.com,aliyun.com,.baidu.com,.tuna.tsinghua.edu.cn"
GITHUB_API_TOKEN: ${{ secrets.GITHUB_TOKEN }}
home_dir: ${{ github.workspace }}/../../../..
FLAGS_use_stride_kernel: "0"
run: |
container_name=${TASK}-$(date +%Y%m%d-%H%M%S)
echo "container_name=${container_name}" >> ${{ github.env }}
sudo mkdir -p /run/user/0
docker run --cap-add=SYS_PTRACE --privileged --ulimit nofile=102400 --ulimit core=-1 --shm-size=32g -d -t --name ${container_name} \
-v /ssd1/cibuild/.cache:/root/.cache \
-v /ssd1/cibuild/.ccache:/root/.ccache \
-v ${{ github.workspace }}/../../..:${{ github.workspace }}/../../.. \
-v ${{ github.workspace }}:/paddle \
--device ${XPU_CODE_1} \
--device ${XPU_CODE_2} \
--shm-size=32g \
-e BRANCH \
-e PR_ID \
-e COMMIT_ID \
-e work_dir \
-e PADDLE_ROOT \
-e WITH_XPU \
-e COVERALLS_UPLOAD \
-e CMAKE_BUILD_TYPE \
-e GIT_PR_ID \
-e PADDLE_VERSION \
-e WITH_TESTING \
-e WITH_DISTRIBUTE \
-e PY_VERSION \
-e XPU_VISIBLE_DEVICES \
-e CUDA_VISIBLE_DEVICES \
-e CUDA_VERSION \
-e CUDNN_VERSION \
-e WITH_XPU_BKCL \
-e CACHE_DIR \
-e CCACHE_DIR \
-e WITH_INFERENCE_API_TEST \
-e CCACHE_MAXSIZE \
-e CCACHE_LIMIT_MULTIPLE \
-e ci_scripts \
-e WITH_AVX \
-e no_proxy \
-e GITHUB_API_TOKEN \
-e FLAGS_use_stride_kernel \
-w /paddle --network host ${docker_image} /bin/bash

- name: Run test
run: |
sudo mkdir -p /run/user/0
docker exec -t ${{ env.container_name }} /bin/bash -c '
bash ${ci_scripts}/kunlun_test.sh
'

- name: Terminate and delete the container
if: always()
run: |
sudo mkdir -p /run/user/0
docker exec -t ${container_name} /bin/bash -c 'rm -rf * .[^.]*'
docker stop ${container_name}
docker rm ${container_name}
30 changes: 30 additions & 0 deletions .github/workflows/re-run.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,3 +118,33 @@ jobs:
OWNER: ${{ github.repository_owner }}
REPO: ${{ github.event.repository.name }}
JOB_NAME: 'PR-CI-Inference / Test'

- name: Rerun XPU
if: ${{ contains(env.comment_body, 'xpu') && !contains(env.comment_body, 'build') && !contains(env.comment_body, 'test') }}
uses: ./.github/actions/rerun-workflow
with:
PR_ID: ${{ github.event.issue.number }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
OWNER: ${{ github.repository_owner }}
REPO: ${{ github.event.repository.name }}
JOB_NAME: 'Linux-XPU / Check bypass for XPU / Check bypass'

- name: Rerun XPU build
if: ${{ contains(env.comment_body, 'xpu') && contains(env.comment_body, 'build') }}
uses: ./.github/actions/rerun-workflow
with:
PR_ID: ${{ github.event.issue.number }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
OWNER: ${{ github.repository_owner }}
REPO: ${{ github.event.repository.name }}
JOB_NAME: 'Linux-XPU / Build'

- name: Rerun XPU test
if: ${{ contains(env.comment_body, 'xpu') && contains(env.comment_body, 'test') }}
uses: ./.github/actions/rerun-workflow
with:
PR_ID: ${{ github.event.issue.number }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
OWNER: ${{ github.repository_owner }}
REPO: ${{ github.event.repository.name }}
JOB_NAME: 'Linux-XPU / Test'
Loading
Loading