Skip to content

[clang] Delay normalization of -fmodules-cache-path #150123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jansvoboda11
Copy link
Contributor

This PR is part of an effort to remove file system usage from the command line parsing code. The reason for that is that it's impossible to do file system access correctly without a configured VFS, and the VFS can only be configured after the command line is parsed. I don't want to intertwine command line parsing and VFS configuration, so I decided to perform the file system access after the command line is parsed and the VFS is configured - ideally right before the file system entity is used for the first time.

This patch delays normalization of the module cache path until CompilerInstance is asked for the cache path in the current compilation context. This needed some rewriting of the serialization function that makes a path relative, which was triggering an OOB access for paths in the form of "<BasePath>/.".

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Jul 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 22, 2025

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Jan Svoboda (jansvoboda11)

Changes

This PR is part of an effort to remove file system usage from the command line parsing code. The reason for that is that it's impossible to do file system access correctly without a configured VFS, and the VFS can only be configured after the command line is parsed. I don't want to intertwine command line parsing and VFS configuration, so I decided to perform the file system access after the command line is parsed and the VFS is configured - ideally right before the file system entity is used for the first time.

This patch delays normalization of the module cache path until CompilerInstance is asked for the cache path in the current compilation context. This needed some rewriting of the serialization function that makes a path relative, which was triggering an OOB access for paths in the form of "&lt;BasePath&gt;/.".


Full diff: https://github.com/llvm/llvm-project/pull/150123.diff

5 Files Affected:

  • (modified) clang/include/clang/Driver/Options.td (+2-1)
  • (modified) clang/lib/Frontend/CompilerInstance.cpp (+6-1)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+2-18)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+27-50)
  • (added) clang/test/Modules/modules-cache-path-canonicalization-output.c (+18)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 97b2eb9c44a88..5f2f69195a7bd 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3226,7 +3226,8 @@ defm declspec : BoolOption<"f", "declspec",
 def fmodules_cache_path : Joined<["-"], "fmodules-cache-path=">, Group<i_Group>,
   Flags<[]>, Visibility<[ClangOption, CC1Option]>,
   MetaVarName<"<directory>">,
-  HelpText<"Specify the module cache path">;
+  HelpText<"Specify the module cache path">,
+  MarshallingInfoString<HeaderSearchOpts<"ModuleCachePath">>;
 def fmodules_user_build_path : Separate<["-"], "fmodules-user-build-path">, Group<i_Group>,
   Flags<[]>, Visibility<[ClangOption, CC1Option]>,
   MetaVarName<"<directory>">,
diff --git a/clang/lib/Frontend/CompilerInstance.cpp b/clang/lib/Frontend/CompilerInstance.cpp
index c7b82db292a14..82c80832bfeef 100644
--- a/clang/lib/Frontend/CompilerInstance.cpp
+++ b/clang/lib/Frontend/CompilerInstance.cpp
@@ -538,9 +538,14 @@ void CompilerInstance::createPreprocessor(TranslationUnitKind TUKind) {
 }
 
 std::string CompilerInstance::getSpecificModuleCachePath(StringRef ModuleHash) {
+  if (getHeaderSearchOpts().ModuleCachePath.empty())
+    return "";
+
   // Set up the module path, including the hash for the module-creation options.
   SmallString<256> SpecificModuleCache(getHeaderSearchOpts().ModuleCachePath);
-  if (!SpecificModuleCache.empty() && !getHeaderSearchOpts().DisableModuleHash)
+  FileMgr->makeAbsolutePath(SpecificModuleCache);
+  llvm::sys::path::remove_dots(SpecificModuleCache);
+  if (!getHeaderSearchOpts().DisableModuleHash)
     llvm::sys::path::append(SpecificModuleCache, ModuleHash);
   return std::string(SpecificModuleCache);
 }
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 3a36250da57a3..a172602a8c73b 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -3306,9 +3306,6 @@ static void GenerateHeaderSearchArgs(const HeaderSearchOptions &Opts,
   if (Opts.UseLibcxx)
     GenerateArg(Consumer, OPT_stdlib_EQ, "libc++");
 
-  if (!Opts.ModuleCachePath.empty())
-    GenerateArg(Consumer, OPT_fmodules_cache_path, Opts.ModuleCachePath);
-
   for (const auto &File : Opts.PrebuiltModuleFiles)
     GenerateArg(Consumer, OPT_fmodule_file, File.first + "=" + File.second);
 
@@ -3411,8 +3408,7 @@ static void GenerateHeaderSearchArgs(const HeaderSearchOptions &Opts,
 }
 
 static bool ParseHeaderSearchArgs(HeaderSearchOptions &Opts, ArgList &Args,
-                                  DiagnosticsEngine &Diags,
-                                  const std::string &WorkingDir) {
+                                  DiagnosticsEngine &Diags) {
   unsigned NumErrorsBefore = Diags.getNumErrors();
 
   HeaderSearchOptions *HeaderSearchOpts = &Opts;
@@ -3425,17 +3421,6 @@ static bool ParseHeaderSearchArgs(HeaderSearchOptions &Opts, ArgList &Args,
   if (const Arg *A = Args.getLastArg(OPT_stdlib_EQ))
     Opts.UseLibcxx = (strcmp(A->getValue(), "libc++") == 0);
 
-  // Canonicalize -fmodules-cache-path before storing it.
-  SmallString<128> P(Args.getLastArgValue(OPT_fmodules_cache_path));
-  if (!(P.empty() || llvm::sys::path::is_absolute(P))) {
-    if (WorkingDir.empty())
-      llvm::sys::fs::make_absolute(P);
-    else
-      llvm::sys::fs::make_absolute(WorkingDir, P);
-  }
-  llvm::sys::path::remove_dots(P);
-  Opts.ModuleCachePath = std::string(P);
-
   // Only the -fmodule-file=<name>=<file> form.
   for (const auto *A : Args.filtered(OPT_fmodule_file)) {
     StringRef Val = A->getValue();
@@ -5064,8 +5049,7 @@ bool CompilerInvocation::CreateFromArgsImpl(
   InputKind DashX = Res.getFrontendOpts().DashX;
   ParseTargetArgs(Res.getTargetOpts(), Args, Diags);
   llvm::Triple T(Res.getTargetOpts().Triple);
-  ParseHeaderSearchArgs(Res.getHeaderSearchOpts(), Args, Diags,
-                        Res.getFileSystemOpts().WorkingDir);
+  ParseHeaderSearchArgs(Res.getHeaderSearchOpts(), Args, Diags);
   if (Res.getFrontendOpts().GenReducedBMI ||
       Res.getFrontendOpts().ProgramAction ==
           frontend::GenerateReducedModuleInterface ||
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index a6957e54b66f1..4148caf93fd34 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -1172,51 +1172,36 @@ static bool cleanPathForOutput(FileManager &FileMgr,
   return Changed | llvm::sys::path::remove_dots(Path);
 }
 
-/// Adjusts the given filename to only write out the portion of the
-/// filename that is not part of the system root directory.
+/// Adjusts the given filename to only write out the portion of the filename
+/// that is not part of the base directory.
 ///
 /// \param Filename the file name to adjust.
 ///
-/// \param BaseDir When non-NULL, the PCH file is a relocatable AST file and
-/// the returned filename will be adjusted by this root directory.
+/// \param BasePath When not empty, the AST file is relocatable and the returned
+/// filename will be adjusted to be relative to this path.
 ///
-/// \returns either the original filename (if it needs no adjustment) or the
-/// adjusted filename (which points into the @p Filename parameter).
-static const char *
-adjustFilenameForRelocatableAST(const char *Filename, StringRef BaseDir) {
-  assert(Filename && "No file name to adjust?");
-
-  if (BaseDir.empty())
-    return Filename;
-
-  // Verify that the filename and the system root have the same prefix.
-  unsigned Pos = 0;
-  for (; Filename[Pos] && Pos < BaseDir.size(); ++Pos)
-    if (Filename[Pos] != BaseDir[Pos])
-      return Filename; // Prefixes don't match.
-
-  // We hit the end of the filename before we hit the end of the system root.
-  if (!Filename[Pos])
-    return Filename;
-
-  // If there's not a path separator at the end of the base directory nor
-  // immediately after it, then this isn't within the base directory.
-  if (!llvm::sys::path::is_separator(Filename[Pos])) {
-    if (!llvm::sys::path::is_separator(BaseDir.back()))
-      return Filename;
-  } else {
-    // If the file name has a '/' at the current position, skip over the '/'.
-    // We distinguish relative paths from absolute paths by the
-    // absence of '/' at the beginning of relative paths.
-    //
-    // FIXME: This is wrong. We distinguish them by asking if the path is
-    // absolute, which isn't the same thing. And there might be multiple '/'s
-    // in a row. Use a better mechanism to indicate whether we have emitted an
-    // absolute or relative path.
-    ++Pos;
-  }
+/// \returns true when \c Filename was adjusted, false otherwise.
+static bool adjustFilenameForRelocatableAST(SmallVectorImpl<char> &Filename,
+                                            StringRef BasePath) {
+  auto FileIt = llvm::sys::path::begin({Filename.begin(), Filename.size()});
+  auto FileEnd = llvm::sys::path::end({Filename.begin(), Filename.size()});
+
+  auto BaseIt = llvm::sys::path::begin(BasePath);
+  auto BaseEnd = llvm::sys::path::end(BasePath);
+
+  for (; FileIt != FileEnd && BaseIt != BaseEnd; ++FileIt, ++BaseIt)
+    if (*FileIt != *BaseIt)
+      return false;
+
+  if (FileIt == FileEnd)
+    return false;
+
+  SmallString<128> Clean;
+  for (; FileIt != FileEnd; ++FileIt)
+    llvm::sys::path::append(Clean, *FileIt);
 
-  return Filename + Pos;
+  Filename.assign(Clean.begin(), Clean.end());
+  return true;
 }
 
 std::pair<ASTFileSignature, ASTFileSignature>
@@ -1699,7 +1684,7 @@ void ASTWriter::WriteControlBlock(Preprocessor &PP, StringRef isysroot) {
 
   AddString(HSOpts.Sysroot, Record);
   AddString(HSOpts.ResourceDir, Record);
-  AddString(HSOpts.ModuleCachePath, Record);
+  AddPath(HSOpts.ModuleCachePath, Record);
   AddString(HSOpts.ModuleUserBuildPath, Record);
   Record.push_back(HSOpts.DisableModuleHash);
   Record.push_back(HSOpts.ImplicitModuleMaps);
@@ -5329,16 +5314,8 @@ bool ASTWriter::PreparePathForOutput(SmallVectorImpl<char> &Path) {
     return false;
 
   bool Changed = cleanPathForOutput(PP->getFileManager(), Path);
-
-  // Remove a prefix to make the path relative, if relevant.
-  const char *PathBegin = Path.data();
-  const char *PathPtr =
-      adjustFilenameForRelocatableAST(PathBegin, BaseDirectory);
-  if (PathPtr != PathBegin) {
-    Path.erase(Path.begin(), Path.begin() + (PathPtr - PathBegin));
+  if (adjustFilenameForRelocatableAST(Path, BaseDirectory))
     Changed = true;
-  }
-
   return Changed;
 }
 
diff --git a/clang/test/Modules/modules-cache-path-canonicalization-output.c b/clang/test/Modules/modules-cache-path-canonicalization-output.c
new file mode 100644
index 0000000000000..ad71b69e5299e
--- /dev/null
+++ b/clang/test/Modules/modules-cache-path-canonicalization-output.c
@@ -0,0 +1,18 @@
+// This checks that implicitly-built modules produce identical PCM
+// files regardless of the spelling of the same module cache path.
+
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -fsyntax-only %t/tu.c \
+// RUN:   -fmodules-cache-path=%t/cache -fdisable-module-hash
+// RUN: mv %t/cache/M.pcm %t/M.pcm
+// RUN: %clang_cc1 -fmodules -fimplicit-module-maps -fsyntax-only %t/tu.c \
+// RUN:   -fmodules-cache-path=%t/./cache -fdisable-module-hash
+// RUN: diff %t/./cache/M.pcm %t/M.pcm
+
+//--- tu.c
+#include "M.h"
+//--- M.h
+//--- module.modulemap
+module M { header "M.h" }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants