Skip to content

Conversation

tsaarni
Copy link
Member

@tsaarni tsaarni commented Sep 23, 2025

Commit Message: build: use GNU ld instead of gold for GCC build
Additional Description:
When building with GCC using --config=gcc the build system now uses GNU ld. Gold is deprecated beginning from GNU binutils 2.44. It is no longer included in standard binutils release and it is scheduled for complete removal. See https://lwn.net/Articles/1007541/.
Risk Level: Low
Testing:
Docs Changes:
Release Notes: GCC builds now use GNU ld. The gold linker was deprecated in binutils 2.44.
Platform Specific Features:

Fixes #41171

@tsaarni
Copy link
Member Author

tsaarni commented Sep 23, 2025

I still need to figure out why the matrix test is failing, but I was able to build successfully on Ubuntu 25.04, which ships with binutils 2.44 and no gold linker (though a separate binutils-gold package is still available).

Here are the steps I used for a manual build:

Start the build container:

docker run --rm -it \
    --env USER_UID=$(id -u) \
    --env USER_GID=$(id -g) \
    --volume "$(pwd)":/envoy \
    --workdir /envoy \
    ubuntu:25.04 /bin/bash

Inside the container:

# Install build dependencies
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get install -y --no-install-recommends build-essential python3 curl ca-certificates git
curl --location --output /usr/local/bin/bazel https://github.com/bazelbuild/bazelisk/releases/latest/download/bazelisk-linux-amd64
chmod +x /usr/local/bin/bazel

# Confirm that gold is NOT installed
gold --version

# Create a non-root user for the build
userdel ubuntu
groupadd --gid $USER_GID envoybuild
useradd --uid $USER_UID --gid $USER_GID --create-home --shell /bin/bash envoybuild
su - envoybuild

# Build Envoy
cd /envoy
bazel build //source/exe:envoy-static

# Verify the binary
bazel-bin/source/exe/envoy-static --version

@phlax phlax self-assigned this Sep 23, 2025
@phlax
Copy link
Member

phlax commented Sep 23, 2025

gcc is also failing - not sure if same reason - in that case it fails the configure step - annoyingly it swallows the actual failure

the matrix test is basically a test of a minimal build environment - to ensure what defaults are picked - and that the bazelrc/etc doesnt break anything in those cases

error is

collect2: fatal error: cannot find 'ld'

which is strange as its there i think - and ultimately points to the correct binary afaict

@phlax
Copy link
Member

phlax commented Sep 23, 2025

this is the likely cuplrit

removing the gold fuse-ld from the toolchain config makes it default to using lld (or trying to)

not sure if it should be doing that but i would say the safer thing in that case is to explicitly set it to bfd

@tsaarni tsaarni force-pushed the gcc-remove-gold branch 3 times, most recently from a69bb36 to 2b7167b Compare September 23, 2025 15:16
@tsaarni
Copy link
Member Author

tsaarni commented Sep 23, 2025

I had to reintroduce the explicit linker configuration to the build:gcc options in .bazelrc to pass the matrix tests, so I added them back to everywhere.

However, the build still fails elsewhere with:

/usr/bin/ld.bfd: unrecognized option '--start-lib'

see failure log.

This may be coming from Bazel's rules_cc, specifically from the function _find_linker_path() (link) which is called by configure_unix_toolchain() (link). This did not seem to happen in local build.

@phlax
Copy link
Member

phlax commented Sep 23, 2025

is it not this line:

$ git grep start-lib
...
bazel/rbe/toolchains/configs/linux/gcc/cc/cc_toolchain_config.bzl:                                flags = ["-Wl,--start-lib"],

@phlax
Copy link
Member

phlax commented Sep 23, 2025

fwiw we did just update rules_cc - so it could be your local is using a different version to ci

@tsaarni
Copy link
Member Author

tsaarni commented Sep 23, 2025

Ah I somehow missed the flag in bazel/rbe/toolchains/configs/linux/gcc/cc/cc_toolchain_config.bzl 😳 Surely it is that.
I have rules_cc bump 0.2.8 locally which I guess is the update you mean, but is this toolchain config only for remote bazel builds? Anyways, I will try remove that from gcc toolchain config.

@phlax
Copy link
Member

phlax commented Sep 23, 2025

i think it gets used locally also afaict at least

@tsaarni tsaarni force-pushed the gcc-remove-gold branch 2 times, most recently from 564bd43 to 84f8960 Compare September 24, 2025 04:37
@phlax
Copy link
Member

phlax commented Sep 24, 2025

i think this is close to the finish line - current error

collect2: fatal error: ld terminated with signal 9 [Killed]

which is an OOM in an RBE worker - weve hit similar in the past - not sure if just upping the machine for the failing test will resolve but probably worth a try

try adding:

    rbe_pool = "6gig",

to config_fuzz_test

@phlax
Copy link
Member

phlax commented Sep 24, 2025

hmm - unfortunately it now looks like a game of whack-a-mole

also if there are more than one or 2 of these tests that only require more memory for gcc linking then we would need to add selects so the (many more) llvm tests arent also using bigger machines

(perhaps unreliably) chatgpt suggests there are ways to optimize mem for the bfd linker - that might be an option

@phlax
Copy link
Member

phlax commented Sep 24, 2025

the other option that im wondering about ... just stick with gold until we cant, in the hope that some better option magically appears

cc @jwendell as maintainer on project that builds with gcc

@tsaarni
Copy link
Member Author

tsaarni commented Sep 24, 2025

I can still try optimization like --no-keep-memory and --reduce-memory-overheads but yeah, seems trickier than expected. It would be nice to have at least an option to compile without gold as it did compile successfully, maybe due to having enough memory.

I guess yet another option could be to try mold but I don't see it included in all distros, e.g. I'm currently trying to find out a "future-proof" way to compile Envoy on SLES using only supported packages.

@jwendell
Copy link
Member

@tsaarni I'm trying to build your branch locally, but it's failing:

/build/bazel_root/install/5309d864f9edb3a2e8380ffc84e6b95c/process-wrapper '--timeout=0' '--kill_delay=15' '--stats=/build/bazel_root/base/sandbox/processwrapper-sandbox/358/stats.out' /usr/bin/gcc @bazel-out/k8-opt-exec-ST-a828a81199fe/bin/external/com_google_protobuf/upb_generator/c/protoc-gen-upb_stage0-2.params)
/usr/bin/ld.bfd: unrecognized option '--start-lib'

the params file: https://gist.github.com/jwendell/43533d974bfc379cd85afe963564d3ce

@tsaarni
Copy link
Member Author

tsaarni commented Sep 24, 2025

Since GNU ld does not support --start-lib --stop-lib, I removed those options from the RBE toolchain config in this PR. I’m not sure where else they might still be coming from. How are you starting the build, @jwendell?

@jwendell
Copy link
Member

@tsaarni

./ci/run_envoy_docker.sh './ci/do_ci.sh gcc //test/...'

@jwendell
Copy link
Member

@tsaarni I figured it out, feel free to incorporate my changes in this PR: jwendell@5301a3d

Gold is deprecated beginning from GNU binutils 2.44.  It is no longer
included in standard binutils release and it is scheduled for complete
removal.

Signed-off-by: Tero Saarni <tero.saarni@est.tech>
@tsaarni
Copy link
Member Author

tsaarni commented Sep 26, 2025

Thanks @jwendell! I was aware of that toolchain configuration option, but I didn’t realize it could also be set in .bazelrc.

I’ve also tried adding --no-keep-memory and --reduce-memory-overheads linker options to save memory but the remote build still fails to OOM (fatal error: ld terminated with signal 9). I suppose GNU ld is not up to the task without giving the EngFlow remote workers more memory.

@phlax
Copy link
Member

phlax commented Sep 26, 2025

I suppose GNU ld is not up to the task without giving the EngFlow remote workers more memory.

hmm, yeah - i think thats going to be a blocker tbh

been thinking about this a bit the last few days

the plan is to move to use hermetic toolchains for both llvm and gcc - at that point i think it would be feasible to use lld for the gcc build - as it woudnt rely on the host install - or potentially some other linker that can give us comparable performance/resource usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCC build uses deprecated gold linker
3 participants