Skip to content

Conversation

rmaz
Copy link
Contributor

@rmaz rmaz commented Aug 29, 2025

Clang modules sort the umbrella dir headers by name before adding to the
module's includes to ensure deterministic output across different file systems.
This is insufficient however, as the header search table is also serialized.
This includes all the loaded headers by file reference, which are allocated
incrementally. To ensure stable output we have to also create the file
references in sorted order.

Clang modules sort the umbrella dir headers by name before adding
to the module's includes to ensure deterministic output across
different file systems. This is insufficent however, as the header
search table is also serialized. This includes all the loaded
headers by file reference, which are allocated incrementally. To
ensure stable output we have to also create the file references in
sorted order.
@rmaz rmaz requested a review from jansvoboda11 August 29, 2025 21:25
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Aug 29, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 29, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Richard Howell (rmaz)

Changes

Clang modules sort the umbrella dir headers by name before adding to the
module's includes to ensure deterministic output across different file systems.
This is insufficient however, as the header search table is also serialized.
This includes all the loaded headers by file reference, which are allocated
incrementally. To ensure stable output we have to also create the file
references in sorted order.


Full diff: https://github.com/llvm/llvm-project/pull/156108.diff

9 Files Affected:

  • (modified) clang/lib/Frontend/FrontendAction.cpp (+21-19)
  • (added) clang/test/Modules/Inputs/umbrella_header_order/module.modulemap (+3)
  • (added) clang/test/Modules/Inputs/umbrella_header_order/umbrella/A.h ()
  • (added) clang/test/Modules/Inputs/umbrella_header_order/umbrella/B.h ()
  • (added) clang/test/Modules/Inputs/umbrella_header_order/umbrella/C.h ()
  • (added) clang/test/Modules/Inputs/umbrella_header_order/umbrella/D.h ()
  • (added) clang/test/Modules/Inputs/umbrella_header_order/umbrella/E.h ()
  • (added) clang/test/Modules/Inputs/umbrella_header_order/umbrella/F.h ()
  • (added) clang/test/Modules/umbrella_dir_order.m (+11)
diff --git a/clang/lib/Frontend/FrontendAction.cpp b/clang/lib/Frontend/FrontendAction.cpp
index a7d6a068fe2d0..6b1fcac75ac2b 100644
--- a/clang/lib/Frontend/FrontendAction.cpp
+++ b/clang/lib/Frontend/FrontendAction.cpp
@@ -623,7 +623,7 @@ static std::error_code collectModuleHeaderIncludes(
     llvm::sys::path::native(UmbrellaDir->Entry.getName(), DirNative);
 
     llvm::vfs::FileSystem &FS = FileMgr.getVirtualFileSystem();
-    SmallVector<std::pair<std::string, FileEntryRef>, 8> Headers;
+    SmallVector<std::pair<std::string, std::string>, 8> HeaderPaths;
     for (llvm::vfs::recursive_directory_iterator Dir(FS, DirNative, EC), End;
          Dir != End && !EC; Dir.increment(EC)) {
       // Check whether this entry has an extension typically associated with
@@ -633,17 +633,6 @@ static std::error_code collectModuleHeaderIncludes(
                .Default(false))
         continue;
 
-      auto Header = FileMgr.getOptionalFileRef(Dir->path());
-      // FIXME: This shouldn't happen unless there is a file system race. Is
-      // that worth diagnosing?
-      if (!Header)
-        continue;
-
-      // If this header is marked 'unavailable' in this module, don't include
-      // it.
-      if (ModMap.isHeaderUnavailableInModule(*Header, Module))
-        continue;
-
       // Compute the relative path from the directory to this file.
       SmallVector<StringRef, 16> Components;
       auto PathIt = llvm::sys::path::rbegin(Dir->path());
@@ -655,20 +644,33 @@ static std::error_code collectModuleHeaderIncludes(
            ++It)
         llvm::sys::path::append(RelativeHeader, *It);
 
-      std::string RelName = RelativeHeader.c_str();
-      Headers.push_back(std::make_pair(RelName, *Header));
+      HeaderPaths.push_back(
+          std::make_pair(Dir->path().str(), RelativeHeader.c_str()));
     }
 
     if (EC)
       return EC;
 
     // Sort header paths and make the header inclusion order deterministic
-    // across different OSs and filesystems.
-    llvm::sort(Headers, llvm::less_first());
-    for (auto &H : Headers) {
+    // across different OSs and filesystems. As the header search table
+    // serialization order depends on the file reference UID, we need to create
+    // file references in deterministic order too.
+    llvm::sort(HeaderPaths, llvm::less_first());
+    for (auto &[Path, RelPath] : HeaderPaths) {
+      auto Header = FileMgr.getOptionalFileRef(Path);
+      // FIXME: This shouldn't happen unless there is a file system race. Is
+      // that worth diagnosing?
+      if (!Header)
+        continue;
+
+      // If this header is marked 'unavailable' in this module, don't include
+      // it.
+      if (ModMap.isHeaderUnavailableInModule(*Header, Module))
+        continue;
+
       // Include this header as part of the umbrella directory.
-      Module->addTopHeader(H.second);
-      addHeaderInclude(H.first, Includes, LangOpts, Module->IsExternC);
+      Module->addTopHeader(*Header);
+      addHeaderInclude(RelPath, Includes, LangOpts, Module->IsExternC);
     }
   }
 
diff --git a/clang/test/Modules/Inputs/umbrella_header_order/module.modulemap b/clang/test/Modules/Inputs/umbrella_header_order/module.modulemap
new file mode 100644
index 0000000000000..5c64e33068822
--- /dev/null
+++ b/clang/test/Modules/Inputs/umbrella_header_order/module.modulemap
@@ -0,0 +1,3 @@
+module x {
+    umbrella "umbrella"
+}
\ No newline at end of file
diff --git a/clang/test/Modules/Inputs/umbrella_header_order/umbrella/A.h b/clang/test/Modules/Inputs/umbrella_header_order/umbrella/A.h
new file mode 100644
index 0000000000000..e69de29bb2d1d
diff --git a/clang/test/Modules/Inputs/umbrella_header_order/umbrella/B.h b/clang/test/Modules/Inputs/umbrella_header_order/umbrella/B.h
new file mode 100644
index 0000000000000..e69de29bb2d1d
diff --git a/clang/test/Modules/Inputs/umbrella_header_order/umbrella/C.h b/clang/test/Modules/Inputs/umbrella_header_order/umbrella/C.h
new file mode 100644
index 0000000000000..e69de29bb2d1d
diff --git a/clang/test/Modules/Inputs/umbrella_header_order/umbrella/D.h b/clang/test/Modules/Inputs/umbrella_header_order/umbrella/D.h
new file mode 100644
index 0000000000000..e69de29bb2d1d
diff --git a/clang/test/Modules/Inputs/umbrella_header_order/umbrella/E.h b/clang/test/Modules/Inputs/umbrella_header_order/umbrella/E.h
new file mode 100644
index 0000000000000..e69de29bb2d1d
diff --git a/clang/test/Modules/Inputs/umbrella_header_order/umbrella/F.h b/clang/test/Modules/Inputs/umbrella_header_order/umbrella/F.h
new file mode 100644
index 0000000000000..e69de29bb2d1d
diff --git a/clang/test/Modules/umbrella_dir_order.m b/clang/test/Modules/umbrella_dir_order.m
new file mode 100644
index 0000000000000..179a59eb94609
--- /dev/null
+++ b/clang/test/Modules/umbrella_dir_order.m
@@ -0,0 +1,11 @@
+// RUN: cd %S
+// RUN: %clang_cc1 -fmodules -fno-implicit-modules -x objective-c -fmodule-name=x -emit-module Inputs/umbrella_header_order/module.modulemap -o %t/mod.pcm
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/mod.pcm | FileCheck %s
+
+// CHECK: <INPUT_FILE abbrevid=4 op0=1 op1=36 op2=0 op3=0 op4=0 op5=1 op6=1 op7=16/> blob data = 'module.modulemap'
+// CHECK: <INPUT_FILE abbrevid=4 op0=2 op1=0 op2=0 op3=0 op4=0 op5=0 op6=0 op7=12/> blob data = 'umbrella/A.h'
+// CHECK: <INPUT_FILE abbrevid=4 op0=3 op1=0 op2=0 op3=0 op4=0 op5=0 op6=0 op7=12/> blob data = 'umbrella/B.h'
+// CHECK: <INPUT_FILE abbrevid=4 op0=4 op1=0 op2=0 op3=0 op4=0 op5=0 op6=0 op7=12/> blob data = 'umbrella/C.h'
+// CHECK: <INPUT_FILE abbrevid=4 op0=5 op1=0 op2=0 op3=0 op4=0 op5=0 op6=0 op7=12/> blob data = 'umbrella/D.h'
+// CHECK: <INPUT_FILE abbrevid=4 op0=6 op1=0 op2=0 op3=0 op4=0 op5=0 op6=0 op7=12/> blob data = 'umbrella/E.h'
+// CHECK: <INPUT_FILE abbrevid=4 op0=7 op1=0 op2=0 op3=0 op4=0 op5=0 op6=0 op7=12/> blob data = 'umbrella/F.h'

@rmaz rmaz requested a review from drodriguez August 29, 2025 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants