Skip to content

Start building Clang runtimes on-demand #5338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: trunk
Choose a base branch
from

Conversation

chandlerc
Copy link
Contributor

This is the first step to having Clang's runtime libraries fully available for the Carbon toolchain. This PR focuses on the lowest level runtimes, the CRT files and the builtins library.

The goal is to intercept Clang runs where it needs these target-dependent pieces to be available, and build them on demand using our Clang-running infrastructure. This avoids most of the subprocess overhead, but there is still some due to missing features in Clang.

This requires exporting the sources for these runtimes from the Bazel build, and installing them in our target-independent resource directory. We then build a simplified "build" of these sources within the ClangRunner itself to produce the specific artifacts and layout expected by Clang.

It also required fixing our use of Clang on macOS to have a default system root in order to successfully compile or link.

It also required cleaning up how the ClangRunner used target information more generally -- instead of taking the target as a constructor parameter, it manages its target internally and relies on the Clang target-specifying command line flags.

Note that I looked at whether we could split this into another layer separate from the ClangRunner, but that proved frustratingly difficult to manage. While we support building these on-demand as part of a detected link, that doesn't seem feasible as we don't have the necessary separation between compilation runs of Clang and link runs of Clang. However, I have tried to factor the internals to provide as clear of separation as I could across these.

Currently, the only part of the commandline that is detected and forwarded to the runtimes build is the target. Eventually, the plan is to expand this so that we can build a maximally tailored set of runtimes for a given compilation.

The other big TODO here is to actually implement caching storage of these runtimes so they aren't built on every execution. Right now, this uses a somewhat hack-y build of a temporary directory, but this isn't expected to be suitable long-term. Building these runtimes on every link makes those commands take approximately 15 seconds with an ASan build like our default development build, and just over 2 seconds in an optimized build. Hopefully this is OK in the brief interim until I can put caching in place.

@github-actions github-actions bot requested a review from dwblaikie April 20, 2025 08:18
@chandlerc chandlerc force-pushed the runtimes-on-demand branch 2 times, most recently from 79c20c6 to 398d21f Compare April 20, 2025 09:51
@chandlerc chandlerc changed the title Start building Clang runtimes runtimes on-demand Start building Clang runtimes on-demand Apr 20, 2025
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work getting this done!

@dwblaikie dwblaikie removed their request for review April 21, 2025 18:21
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update, PTAL.

Comment on lines +430 to +157
generate_runtime_sources_h(name = "runtime_sources.h")
cc_library(
name = name,
hdrs = ["runtime_sources.h"],
**kwargs
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm following the same pattern as the command line tool macro, and would prefer to keep them somewhat consistent... I can make the header file name a parameter to fully match that rule if that would work better? Or maybe there is a follow-up to both of these? I don't have a particularly strong opinion about where the rule lives - all of the options feel a bit awkward.

Comment on lines +430 to +157
generate_runtime_sources_h(name = "runtime_sources.h")
cc_library(
name = name,
hdrs = ["runtime_sources.h"],
**kwargs
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link to the pattern you're referring to, to help me understand the consistency? I initially thought you meant the run_tool rule, or maybe the install directory's symlink rules, but you must mean something else?

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filegroup change looks great!

Comment on lines 155 to 158
srcs = [
runtime
for runtime in CRT_FILES.values() + BUILTINS_FILEGROUPS.values()
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
srcs = [
runtime
for runtime in CRT_FILES.values() + BUILTINS_FILEGROUPS.values()
],
srcs = CRT_FILES.values() + BUILTINS_FILEGROUPS.values(),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@chandlerc chandlerc force-pushed the runtimes-on-demand branch 2 times, most recently from 54aa139 to 9a999e2 Compare April 25, 2025 22:41
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL!

Comment on lines +430 to +157
generate_runtime_sources_h(name = "runtime_sources.h")
cc_library(
name = name,
hdrs = ["runtime_sources.h"],
**kwargs
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 155 to 158
srcs = [
runtime
for runtime in CRT_FILES.values() + BUILTINS_FILEGROUPS.values()
],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@chandlerc chandlerc requested a review from jonmeow April 28, 2025 22:26
Comment on lines +430 to +157
generate_runtime_sources_h(name = "runtime_sources.h")
cc_library(
name = name,
hdrs = ["runtime_sources.h"],
**kwargs
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. FWIW I think llvm_tools seems easier to understand in the BUILD because it includes the filename as a parameter, rather than hardcoding it. To put this differently, it feels more reusable when it's a parameter, and bzl is for reusable or special things beyond the rule invocations of a BUILD. Also the filename and target name seem closely tied, so setting them in the same place is nice.

That said, for things that aren't reusable, I lean more in the run_tool direction: use bzl functions as little as possible, putting more in the BUILD. That also would still have the effect of setting both the filename and target name in the same place. If you're okay with that, maybe it's more of a separate PR for cleanup?

i.e., maybe no changes here, but I do think we should align how we're using one-off build rules:

  • Invoke one-off rule directly from BUILD, as in run_tool (and the simpler pkg_naming_variables)
  • Invoke one-off rule using wrapper function in bzl
    • With one name in BUILD and one name in bzl, as here
    • With both names in BUILD, as in llvm_tools

@@ -193,7 +205,7 @@ make_install_filegroups(

py_test(
name = "llvm_symlinks_test",
size = "small",
size = "medium",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is taking too long to remain small, perhaps it could be sharded to try to keep bazel test //... fast?

Note, not something I think needs to be done here.

chandlerc and others added 13 commits May 12, 2025 16:25
This is the first step to having Clang's runtime libraries fully
available for the Carbon toolchain. This PR focuses on the lowest level
runtimes, the CRT files and the builtins library.

The goal is to intercept Clang runs where it needs these
target-dependent pieces to be available, and build them on demand using
our Clang-running infrastructure. This avoids most of the subprocess
overhead, but there is still some due to missing features in Clang.

This requires exporting the sources for these runtimes from the Bazel
build, and installing them in our target-independent resource directory.
We then build a simplified "build" of these sources within the
`ClangRunner` itself to produce the specific artifacts and layout
expected by Clang.

It also required fixing our use of Clang on macOS to have a default
system root in order to successfully compile or link.

It also required cleaning up how the `ClangRunner` used target
information more generally -- instead of taking the target as
a constructor parameter, it manages its target internally and relies on
the Clang target-specifying command line flags.

Note that I looked at whether we could split this into another layer
separate from the `ClangRunner`, but that proved frustratingly difficult
to manage. While we support building these on-demand as part of
a detected link, that doesn't seem feasible as we don't have the
necessary separation between compilation runs of Clang and link runs of
Clang. However, I have tried to factor the internals to provide as clear
of separation as I could across these.

Currently, the only part of the commandline that is detected and
forwarded to the runtimes build is the target. Eventually, the plan is
to expand this so that we can build a maximally tailored set of runtimes
for a given compilation.

The other big TODO here is to actually implement caching storage of
these runtimes so they aren't built on every execution. Right now, this
uses a somewhat hack-y build of a temporary directory, but this isn't
expected to be suitable long-term. Building these runtimes on *every*
link makes those commands take approximately 15 seconds with an ASan
build like our default development build, and just over 2 seconds in an
optimized build. Hopefully this is OK in the brief interim until I can
put caching in place.
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
chandlerc and others added 2 commits May 12, 2025 16:25
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
@chandlerc chandlerc force-pushed the runtimes-on-demand branch from ac76c2c to 130e1ad Compare May 12, 2025 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants