[Draft] Add a new stage to generate `zebin` to align CUDA stages in triton.compile #5189

chengjunlu · 2025-09-25T05:47:12Z

Add a new stage to generate zebin for XPU.

anmyachev · 2025-09-25T13:49:38Z

third_party/intel/backend/compiler.py

            stages["ttgir"] = lambda src, metadata: self.gluon_to_ttgir(src, metadata, options)
        stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options)
        stages["spv"] = lambda src, metadata: self.make_spv(src, metadata, options, self.device_arch)
+        stages["zebin"] = lambda src, metadata: self.make_zebin(src, metadata, options, self.device_arch)


We can't make this step mandatory yet (due to #5153 (comment)), but if we make it optional using options.generate_native_code, we can do a good refactoring right now.

Signed-off-by: Lu,Chengjun <chengjun.lu@intel.com>

Copilot

Pull Request Overview

This PR adds a new "zebin" compilation stage for XPU backend to align with CUDA compilation stages in triton.compile. The change introduces zebin as a binary format alternative to SPIRV for Intel XPU targets.

Adds make_zebin method to generate zebin binary format from SPIRV input
Updates binary extension from "spv" to "zebin" for XPU backend
Modifies compilation pipeline to handle zebin as a binary format alongside cubin and hsaco

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
third_party/intel/backend/compiler.py	Adds zebin compilation stage and updates binary extension
python/triton/compiler/compiler.py	Updates file parsing and compilation pipeline to support zebin format

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-25T14:59:33Z

third_party/intel/backend/compiler.py

+            # assert options.generate_native_code, "Only support native code generation with shader dump"
            shader_dump_opt = f" -igc_opts ',DumpToCustomDir={metadata['cache_dir']},ShaderDumpEnable=1'"

-        metadata["generate_native_code"] = options.generate_native_code
-
-        if options.generate_native_code:
-            with tempfile.TemporaryDirectory() as temp_dir:
-                with tempfile.NamedTemporaryFile(mode='wb', suffix='.spv', dir=temp_dir, delete=False) as fsrc:
-                    fsrc.write(spirv)
-                fbin = fsrc.name + '.o'
-
-                ocloc_cmd = [
-                    'ocloc', 'compile', '-file', fsrc.name, '-o', fbin, '-spirv_input', '-device', device_arch,
-                    '-options', metadata["build_flags"] + shader_dump_opt
-                ]
-
-                try:
-                    output = subprocess.check_output(ocloc_cmd, stderr=subprocess.STDOUT, text=True)
-                    if 'spilled' in output and metadata["build_flags"].find("-cl-intel-256-GRF-per-thread") == -1:
-                        """
-                        The exact message is something like:
-                            warning: kernel matmul_kernel  compiled SIMD16 allocated 128 regs and spilled around 217
-                        is "spilled" enough for now?
-                        """
-                        metadata["build_flags"] += " -cl-intel-256-GRF-per-thread"
-                        # re-run with new build flags
-                        ocloc_cmd[-1] = metadata["build_flags"] + shader_dump_opt
-                        subprocess.check_output(ocloc_cmd, stderr=subprocess.STDOUT, text=True)
-                except subprocess.CalledProcessError as e:
-                    if e.returncode == 255:
-                        error = 'Internal Triton ZEBIN codegen error'
-                    elif e.returncode == 128 + signal.SIGSEGV:
-                        error = '`ocloc` raised SIGSEGV'
-                    else:
-                        error = f'`ocloc` failed with error code {e.returncode}'
-
-                    raise RuntimeError(f'{error}\n'
-                                       f'`ocloc` stderr:\n{e.output}\n'
-                                       f'Repro command: {ocloc_cmd}\n') from e
-
-                with open(fbin, 'rb') as f:
-                    zebin = f.read()
-            return zebin
-        return spirv
+        # metadata["generate_native_code"] = options.generate_native_code
+
+        # if options.generate_native_code:
+        with tempfile.TemporaryDirectory() as temp_dir:
+            with tempfile.NamedTemporaryFile(mode='wb', suffix='.spv', dir=temp_dir, delete=False) as fsrc:


These commented-out lines should be removed rather than left as commented code. If this logic is no longer needed, clean it up completely.

Copilot · 2025-09-25T14:59:33Z

third_party/intel/backend/compiler.py

+
+            with open(fbin, 'rb') as f:
+                zebin = f.read()
+        return zebin


The unreachable return spirv statement on line 442 was removed, but this creates inconsistent indentation. The return statement should be properly aligned with the method body.

Copilot · 2025-09-25T14:59:34Z

python/triton/compiler/compiler.py

        self.module, self.function, self.n_regs, self.n_spills, self.n_max_threads = driver.active.utils.load_binary(
-            self.name, self.kernel, self.metadata.shared, self.metadata.build_flags,
-            not self.metadata.generate_native_code, device)
+            self.name, self.kernel, self.metadata.shared, self.metadata.build_flags, False, device)


The hardcoded False value replaces not self.metadata.generate_native_code. This magic boolean should be documented or use a named constant to clarify its purpose.

etiotto

Instead of using ocloc to generate the native binary, can we use L0 to generate it ?

How about trying: https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/PROG.html#module-caching-with-native-binaries

etiotto · 2025-09-25T15:01:12Z

third_party/intel/backend/compiler.py

        self.device_arch = knobs.intel.device_arch or mod.parse_device_arch(target.arch.get('architecture', 0))
        self.properties = self.parse_target(target.arch)
-        self.binary_ext = "spv"
+        self.binary_ext = "zebin"


Why are we changing from spv to zebin, we want to generate a spv file still, will this affect the generation of the SPIRV binary ?

The SPIRV file is still generated and saved in the Triton cache director.
This is an extra step to generate the zebin to the triton.compile stages.

etiotto · 2025-09-25T15:06:12Z

third_party/intel/backend/compiler.py

            # The IGC (Intel Graphic Compiler) only parses the options at first time in JIT-ing the binary per process.
            # Have to use the `ocloc` to generate the binary in sub-process to work around the limitation.
-            assert options.generate_native_code, "Only support native code generation with shader dump"
+            # assert options.generate_native_code, "Only support native code generation with shader dump"


remove commented out code

chengjunlu requested review from anmyachev and whitneywhtsang September 25, 2025 05:47

chengjunlu changed the title ~~Add a new stage to generate zebin to align CUDA stages in triton.compile~~ [Draft] Add a new stage to generate zebin to align CUDA stages in triton.compile Sep 25, 2025

chengjunlu linked an issue Sep 25, 2025 that may be closed by this pull request

[Pytorch upstream] Feature request: Save SPIR-V Build flag to CompiledKernel metadata for Inductor. #5153

Open

chengjunlu force-pushed the chengjun/add_zebin_stage branch from 4cba65d to a943a26 Compare September 25, 2025 06:00

anmyachev reviewed Sep 25, 2025

View reviewed changes

Add a new stage to generate zebin for XPU.

a943a26

Signed-off-by: Lu,Chengjun <chengjun.lu@intel.com>

etiotto requested a review from Copilot September 25, 2025 14:58

Copilot AI reviewed Sep 25, 2025

View reviewed changes

etiotto reviewed Sep 25, 2025

View reviewed changes

etiotto marked this pull request as draft October 9, 2025 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Draft] Add a new stage to generate `zebin` to align CUDA stages in triton.compile #5189

[Draft] Add a new stage to generate `zebin` to align CUDA stages in triton.compile #5189

chengjunlu commented Sep 25, 2025

Uh oh!

anmyachev Sep 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 25, 2025

Uh oh!

Copilot AI Sep 25, 2025

Uh oh!

Copilot AI Sep 25, 2025

Uh oh!

etiotto left a comment

Uh oh!

etiotto Sep 25, 2025

Uh oh!

chengjunlu Sep 26, 2025

Uh oh!

etiotto Sep 25, 2025

Uh oh!

Uh oh!

[Draft] Add a new stage to generate zebin to align CUDA stages in triton.compile #5189

Are you sure you want to change the base?

[Draft] Add a new stage to generate zebin to align CUDA stages in triton.compile #5189

Conversation

chengjunlu commented Sep 25, 2025

Uh oh!

anmyachev Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

etiotto left a comment

Choose a reason for hiding this comment

Uh oh!

etiotto Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

chengjunlu Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

etiotto Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[Draft] Add a new stage to generate `zebin` to align CUDA stages in triton.compile #5189

[Draft] Add a new stage to generate `zebin` to align CUDA stages in triton.compile #5189