Skip to content

Conversation

chengjunlu
Copy link
Contributor

Add a new stage to generate zebin for XPU.

@chengjunlu chengjunlu changed the title Add a new stage to generate zebin to align CUDA stages in triton.compile [Draft] Add a new stage to generate zebin to align CUDA stages in triton.compile Sep 25, 2025
@chengjunlu chengjunlu force-pushed the chengjun/add_zebin_stage branch from 4cba65d to a943a26 Compare September 25, 2025 06:00
stages["ttgir"] = lambda src, metadata: self.gluon_to_ttgir(src, metadata, options)
stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options)
stages["spv"] = lambda src, metadata: self.make_spv(src, metadata, options, self.device_arch)
stages["zebin"] = lambda src, metadata: self.make_zebin(src, metadata, options, self.device_arch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't make this step mandatory yet (due to #5153 (comment)), but if we make it optional using options.generate_native_code, we can do a good refactoring right now.

Signed-off-by: Lu,Chengjun <chengjun.lu@intel.com>
@etiotto etiotto requested a review from Copilot September 25, 2025 14:58
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new "zebin" compilation stage for XPU backend to align with CUDA compilation stages in triton.compile. The change introduces zebin as a binary format alternative to SPIRV for Intel XPU targets.

  • Adds make_zebin method to generate zebin binary format from SPIRV input
  • Updates binary extension from "spv" to "zebin" for XPU backend
  • Modifies compilation pipeline to handle zebin as a binary format alongside cubin and hsaco

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
third_party/intel/backend/compiler.py Adds zebin compilation stage and updates binary extension
python/triton/compiler/compiler.py Updates file parsing and compilation pipeline to support zebin format

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +399 to +406
# assert options.generate_native_code, "Only support native code generation with shader dump"
shader_dump_opt = f" -igc_opts ',DumpToCustomDir={metadata['cache_dir']},ShaderDumpEnable=1'"

metadata["generate_native_code"] = options.generate_native_code

if options.generate_native_code:
with tempfile.TemporaryDirectory() as temp_dir:
with tempfile.NamedTemporaryFile(mode='wb', suffix='.spv', dir=temp_dir, delete=False) as fsrc:
fsrc.write(spirv)
fbin = fsrc.name + '.o'

ocloc_cmd = [
'ocloc', 'compile', '-file', fsrc.name, '-o', fbin, '-spirv_input', '-device', device_arch,
'-options', metadata["build_flags"] + shader_dump_opt
]

try:
output = subprocess.check_output(ocloc_cmd, stderr=subprocess.STDOUT, text=True)
if 'spilled' in output and metadata["build_flags"].find("-cl-intel-256-GRF-per-thread") == -1:
"""
The exact message is something like:
warning: kernel matmul_kernel compiled SIMD16 allocated 128 regs and spilled around 217
is "spilled" enough for now?
"""
metadata["build_flags"] += " -cl-intel-256-GRF-per-thread"
# re-run with new build flags
ocloc_cmd[-1] = metadata["build_flags"] + shader_dump_opt
subprocess.check_output(ocloc_cmd, stderr=subprocess.STDOUT, text=True)
except subprocess.CalledProcessError as e:
if e.returncode == 255:
error = 'Internal Triton ZEBIN codegen error'
elif e.returncode == 128 + signal.SIGSEGV:
error = '`ocloc` raised SIGSEGV'
else:
error = f'`ocloc` failed with error code {e.returncode}'

raise RuntimeError(f'{error}\n'
f'`ocloc` stderr:\n{e.output}\n'
f'Repro command: {ocloc_cmd}\n') from e

with open(fbin, 'rb') as f:
zebin = f.read()
return zebin
return spirv
# metadata["generate_native_code"] = options.generate_native_code

# if options.generate_native_code:
with tempfile.TemporaryDirectory() as temp_dir:
with tempfile.NamedTemporaryFile(mode='wb', suffix='.spv', dir=temp_dir, delete=False) as fsrc:
Copy link
Preview

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These commented-out lines should be removed rather than left as commented code. If this logic is no longer needed, clean it up completely.

Copilot uses AI. Check for mistakes.


with open(fbin, 'rb') as f:
zebin = f.read()
return zebin
Copy link
Preview

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unreachable return spirv statement on line 442 was removed, but this creates inconsistent indentation. The return statement should be properly aligned with the method body.

Copilot uses AI. Check for mistakes.

Comment on lines 483 to +484
self.module, self.function, self.n_regs, self.n_spills, self.n_max_threads = driver.active.utils.load_binary(
self.name, self.kernel, self.metadata.shared, self.metadata.build_flags,
not self.metadata.generate_native_code, device)
self.name, self.kernel, self.metadata.shared, self.metadata.build_flags, False, device)
Copy link
Preview

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded False value replaces not self.metadata.generate_native_code. This magic boolean should be documented or use a named constant to clarify its purpose.

Copilot uses AI. Check for mistakes.

Copy link
Contributor

@etiotto etiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using ocloc to generate the native binary, can we use L0 to generate it ?

How about trying: https://oneapi-src.github.io/level-zero-spec/level-zero/latest/core/PROG.html#module-caching-with-native-binaries

self.device_arch = knobs.intel.device_arch or mod.parse_device_arch(target.arch.get('architecture', 0))
self.properties = self.parse_target(target.arch)
self.binary_ext = "spv"
self.binary_ext = "zebin"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing from spv to zebin, we want to generate a spv file still, will this affect the generation of the SPIRV binary ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SPIRV file is still generated and saved in the Triton cache director.
This is an extra step to generate the zebin to the triton.compile stages.

# The IGC (Intel Graphic Compiler) only parses the options at first time in JIT-ing the binary per process.
# Have to use the `ocloc` to generate the binary in sub-process to work around the limitation.
assert options.generate_native_code, "Only support native code generation with shader dump"
# assert options.generate_native_code, "Only support native code generation with shader dump"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented out code

@etiotto etiotto marked this pull request as draft October 9, 2025 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Pytorch upstream] Feature request: Save SPIR-V Build flag to CompiledKernel metadata for Inductor.
3 participants