Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: Environment setup
weight: 2

### FIXED, DO NOT MODIFY
layout: learningpathall
---


### Python Environment Setup

Before building ExecuTorch, it is highly recommended to create an isolated Python environment.
This prevents dependency conflicts with your system Python installation and ensures a clean build environment.

```bash
cd $WORKSPACE
python3 -m venv pyenv
source pyenv/bin/activate

```
All subsequent steps should be executed within this Python virtual environment.

### Download the ExecuTorch Source Code

Clone the ExecuTorch repository from GitHub. The following command checks out the stable v1.0.0 release and ensures all required submodules are fetched.

```bash
cd $WORKSPACE
git clone -b v1.0.0 --recurse-submodules https://github.com/pytorch/executorch.git

```

> **Note:**
> The instructions in this guide are based on **ExecuTorch v1.0.0**.
> Commands or configuration options may differ in later releases.

### Build and Install the ExecuTorch Python Components

Next, build the Python bindings and install them into your environment. The following command uses the provided installation script to configure, compile, and install ExecuTorch with developer tools enabled.

```bash
cd $WORKSPACE/executorch
CMAKE_ARGS="-DEXECUTORCH_BUILD_DEVTOOLS=ON" ./install_executorch.sh

```

This will build ExecuTorch and its dependencies using CMake, enabling optional developer utilities such as ETDump and Inspector.

After installation completes successfully, you can verify the environment by running:

```bash
python -c "import executorch; print('Executorch build and install successfully.')"
```

Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
---
title: Cross-Compile ExecuTorch for the Aarch64 platform
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---


This section describes how to cross-compile ExecuTorch for an AArch64 target platform with XNNPACK and KleidiAI support enabled.
All commands below are intended to be executed on an x86-64 Linux host with an appropriate cross-compilation toolchain installed (e.g., aarch64-linux-gnu-gcc).


### Run CMake Configuration

Use CMake to configure the ExecuTorch build for Aarch64. The example below enables key extensions, developer tools, and XNNPACK with KleidiAI acceleration:

```bash

cd $WORKSPACE
mkdir -p build-arm64
cd build-arm64

cmake -GNinja \
-DCMAKE_BUILD_TYPE=Debug \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_PROCESSOR=aarch64 \
-DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
-DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
-DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=BOTH \
-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=ONLY \
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
-DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_NAMED_DATA_MAP=ON \
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DEXECUTORCH_BUILD_DEVTOOLS=ON \
-DEXECUTORCH_ENABLE_EVENT_TRACER=ON \
-DEXECUTORCH_ENABLE_LOGGING=ON \
-DEXECUTORCH_LOG_LEVEL=debug \
-DEXECUTORCH_XNNPACK_ENABLE_KLEIDI=ON \
../executorch

```

#### Key Build Options

| **CMake Option** | **Description** |
| ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `EXECUTORCH_BUILD_XNNPACK` | Builds the **XNNPACK backend**, which provides highly optimized CPU operators (GEMM, convolution, etc.) for Arm64 platforms. |
| `EXECUTORCH_XNNPACK_ENABLE_KLEIDI` | Enables **Arm KleidiAI** acceleration for XNNPACK kernels, providing further performance improvements on Armv8.2+ CPUs. |
| `EXECUTORCH_BUILD_DEVTOOLS` | Builds **developer tools** such as the ExecuTorch Inspector and diagnostic utilities for profiling and debugging. |
| `EXECUTORCH_BUILD_EXTENSION_MODULE` | Builds the **Module API** extension, which provides a high-level abstraction for model loading and execution using `Module` objects. |
| `EXECUTORCH_BUILD_EXTENSION_TENSOR` | Builds the **Tensor API** extension, providing convenience functions for creating, manipulating, and managing tensors in C++ runtime. |
| `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | Enables building **optimized kernel implementations** for better performance on supported architectures. |
| `EXECUTORCH_ENABLE_EVENT_TRACER` | Enables the **event tracing** feature, which records performance and operator timing information for runtime analysis. |



### Build ExecuTorch

```bash
cmake --build . -j$(nproc)

```

If the build completes successfully, you should find the executor_runner binary under the directory:

```bash
build-arm64/executor_runner

```

This binary can be used to run ExecuTorch models on the ARM64 target device using the XNNPACK backend with KleidiAI acceleration.

Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
title: KleidiAI micro-kernels support in ExecuTorch
weight: 4

### FIXED, DO NOT MODIFY
layout: learningpathall
---
ExecuTorch uses XNNPACK as its primary CPU backend for operator execution and performance optimization.

Within this architecture, only a subset of KleidiAI SME (Scalable Matrix Extension) micro-kernels has been integrated into XNNPACK to provide additional acceleration on supported Arm platforms.

These specialized micro-kernels are designed to accelerate operators with specific data types and quantization configurations in ExecuTorch models.

When an operator matches one of the supported configurations, ExecuTorch automatically dispatches it through the KleidiAI-optimized path.

Operators that are not covered by KleidiAI fall back to the standard XNNPACK implementations during inference, ensuring functional correctness across all models.

In ExecuTorch v1.0.0, the following operator types are implemented through the XNNPACK backend and can potentially benefit from KleidiAI acceleration:
- XNNFullyConnected – Fully connected (dense) layers
- XNNConv2d – Standard 2D convolution layers
- XNNBatchMatrixMultiply – Batched matrix multiplication operations

However, not all instances of these operators are accelerated by KleidiAI.

Acceleration eligibility depends on several operator attributes and backend support, including:
- Data types (e.g., float32, int8, int4)
- Quantization schemes (e.g., symmetric/asymmetric, per-tensor/per-channel)
- Tensor memory layout and alignment
- Kernel dimensions and stride settings

The following section provides detailed information on which operator configurations can benefit from KleidiAI acceleration, along with their corresponding data type and quantization support.


### XNNFullyConnected

| XNNPACK GEMM Variant | Activations DataType| Weights DataType | Output DataType |
| ------------------ | ---------------------------- | --------------------------------------- | ---------------------------- |
| pf16_gemm | FP16 | FP16 | FP16 |
| pf32_gemm | FP32 | FP32 | FP32 |
| qp8_f32_qc8w_gemm | Asymmetric INT8 per-row quantization | Per-channel symmetric INT8 quantization | FP32 |
| pqs8_qc8w_gemm | Asymmetric INT8 quantization | Per-channel symmetric INT8 quantization | Asymmetric INT8 quantization |
| qp8_f32_qb4w_gemm | Asymmetric INT8 per-row quantization | INT4 (signed), shared blockwise quantization | FP32 |


### XNNConv2d
| XNNPACK GEMM Variant | Input DataType| Filter DataType | Output DataType |
| ------------------ | ---------------------------- | --------------------------------------- | ---------------------------- |
| pf32_gemm | FP32 | FP32, pointwise (1×1) | FP32 |
| pqs8_qc8w_gemm | Asymmetric INT8 quantization (NHWC) | Per-channel or per-tensor symmetric INT8 quantization | Asymmetric INT8 quantization(NHWC) |


### XNNBatchMatrixMultiply
| XNNPACK GEMM Variant | Input A DataType| Input B DataType |Output DataType |
| ------------------ | ---------------------------- | --------------------------------------- |--------------------------------------- |
| pf32_gemm | FP32 | FP32 | FP32 |
| pf16_gemm | FP16 | FP16 | FP16 |



Loading