-
Notifications
You must be signed in to change notification settings - Fork 81
Description
By default, the Rust compiler uses the x86-64-v1
feature level, which does not include the AVX instructions.
cg_gcc
, however, creates assembly files containing those instructions.
use std::hint::black_box;
#[unsafe(no_mangle)]
fn reverse_u8() {
// odd length and offset by 1 to be as unaligned as possible
let mut v: Vec<_> = (0..1 + (0xFFFFF / size_of::<u8>() as u64))
.map(|x| x as u8)
.collect();
black_box( black_box(&mut v[1..]).reverse());
}
As you can see(by analysing the assembly created by GCC and LLVM), GCC uses AVX instructions, while LLVM does not.
; Sample of GCC code
mov r11d, 4
vpunpcklqdq xmm9, xmm6, xmm6
vmovdqa xmm14, XMMWORD PTR .LC8[rip]
vpunpcklqdq xmm5, xmm5, xmm5
vmovq xmm6, rsi
sal rdi, 4
lea rax, -16[r9+rsi]
vmovdqa XMMWORD PTR 80[rsp], xmm5
vmovq xmm5, r11
mov r11d, 6
vpunpcklqdq xmm6, xmm6, xmm6
vpunpcklqdq xmm5, xmm5, xmm5
mov r10, rax
vmovdqa XMMWORD PTR 64[rsp], xmm6
vmovdqa xmm15, XMMWORD PTR .LC0[rip]
vmovdqa XMMWORD PTR 96[rsp], xmm5
; Sample of LLVM code
movdqa xmm0, xmmword ptr [rip + .LCPI0_0]
mov ecx, 112
movdqa xmm1, xmmword ptr [rip + .LCPI0_1]
movdqa xmm2, xmmword ptr [rip + .LCPI0_2]
movdqa xmm3, xmmword ptr [rip + .LCPI0_3]
movdqa xmm4, xmmword ptr [rip + .LCPI0_4]
movdqa xmm5, xmmword ptr [rip + .LCPI0_5]
movdqa xmm6, xmmword ptr [rip + .LCPI0_6]
movdqa xmm7, xmmword ptr [rip + .LCPI0_7]
movdqa xmm8, xmmword ptr [rip + .LCPI0_8]
LLVM-based rustc
will only emit AVX instructions when they are explicitly enabled(-Ctarget-cpu=x86-64-v3
).
; LLVM code compiled with AVX enabled
je .LBB0_18
mov ecx, 480
vmovdqa ymm0, ymmword ptr [rip + .LCPI0_0]
vmovdqa ymm1, ymmword ptr [rip + .LCPI0_1]
vmovdqa ymm2, ymmword ptr [rip + .LCPI0_2]
vmovdqa ymm3, ymmword ptr [rip + .LCPI0_3]
vmovdqa ymm4, ymmword ptr [rip + .LCPI0_4]
vmovaps ymm5, ymmword ptr [rip + .LCPI0_5]
vmovaps ymm6, ymmword ptr [rip + .LCPI0_6]
vmovaps ymm7, ymmword ptr [rip + .LCPI0_7]
cg_gcc
should mimic this behaviour. Currently, it seems like cg_gcc
just inherits the target features of the host processor, which is inccorect(a processor with AVX should still build code that does not depend on AVX).
Moreover, -Ctarget-cpu
seems to be currently not respected.
Despite explicitly setting -Ctarget-cpu=nehalem
, I can see that cg_gcc
still uses AVX instructions. That CPU does not support them.
https://godbolt.org/z/7s6Gss8bE
This will lead code compiled by cg_gcc
to crash on CPUs older than the build host.