Skip to content

cg_gcc does not respect -Ctarget-cpu, and uses the host feature set by default.  #752

@FractalFir

Description

@FractalFir

By default, the Rust compiler uses the x86-64-v1 feature level, which does not include the AVX instructions.

cg_gcc, however, creates assembly files containing those instructions.

Example.

use std::hint::black_box;
#[unsafe(no_mangle)]
fn reverse_u8() {
    // odd length and offset by 1 to be as unaligned as possible
    let mut v: Vec<_> = (0..1 + (0xFFFFF / size_of::<u8>() as u64))
        .map(|x| x as u8)
        .collect();
    black_box( black_box(&mut v[1..]).reverse());
}

As you can see(by analysing the assembly created by GCC and LLVM), GCC uses AVX instructions, while LLVM does not.

;  Sample of  GCC code 
        mov     r11d, 4
        vpunpcklqdq     xmm9, xmm6, xmm6
        vmovdqa xmm14, XMMWORD PTR .LC8[rip]
        vpunpcklqdq     xmm5, xmm5, xmm5
        vmovq   xmm6, rsi
        sal     rdi, 4
        lea     rax, -16[r9+rsi]
        vmovdqa XMMWORD PTR 80[rsp], xmm5
        vmovq   xmm5, r11
        mov     r11d, 6
        vpunpcklqdq     xmm6, xmm6, xmm6
        vpunpcklqdq     xmm5, xmm5, xmm5
        mov     r10, rax
        vmovdqa XMMWORD PTR 64[rsp], xmm6
        vmovdqa xmm15, XMMWORD PTR .LC0[rip]
        vmovdqa XMMWORD PTR 96[rsp], xmm5
; Sample of LLVM code 
        movdqa  xmm0, xmmword ptr [rip + .LCPI0_0]
        mov     ecx, 112
        movdqa  xmm1, xmmword ptr [rip + .LCPI0_1]
        movdqa  xmm2, xmmword ptr [rip + .LCPI0_2]
        movdqa  xmm3, xmmword ptr [rip + .LCPI0_3]
        movdqa  xmm4, xmmword ptr [rip + .LCPI0_4]
        movdqa  xmm5, xmmword ptr [rip + .LCPI0_5]
        movdqa  xmm6, xmmword ptr [rip + .LCPI0_6]
        movdqa  xmm7, xmmword ptr [rip + .LCPI0_7]
        movdqa  xmm8, xmmword ptr [rip + .LCPI0_8]

LLVM-based rustc will only emit AVX instructions when they are explicitly enabled(-Ctarget-cpu=x86-64-v3).

; LLVM code compiled with AVX enabled 
        je      .LBB0_18
        mov     ecx, 480
        vmovdqa ymm0, ymmword ptr [rip + .LCPI0_0]
        vmovdqa ymm1, ymmword ptr [rip + .LCPI0_1]
        vmovdqa ymm2, ymmword ptr [rip + .LCPI0_2]
        vmovdqa ymm3, ymmword ptr [rip + .LCPI0_3]
        vmovdqa ymm4, ymmword ptr [rip + .LCPI0_4]
        vmovaps ymm5, ymmword ptr [rip + .LCPI0_5]
        vmovaps ymm6, ymmword ptr [rip + .LCPI0_6]
        vmovaps ymm7, ymmword ptr [rip + .LCPI0_7]

cg_gcc should mimic this behaviour. Currently, it seems like cg_gcc just inherits the target features of the host processor, which is inccorect(a processor with AVX should still build code that does not depend on AVX).

Moreover, -Ctarget-cpu seems to be currently not respected.

Despite explicitly setting -Ctarget-cpu=nehalem, I can see that cg_gcc still uses AVX instructions. That CPU does not support them.

https://godbolt.org/z/7s6Gss8bE

This will lead code compiled by cg_gcc to crash on CPUs older than the build host.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions