Skip to content

Convert several UTF-8 related functions into macros #23459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: blead
Choose a base branch
from

Conversation

khwilliamson
Copy link
Contributor

These functions are all essentially one-liners that just call something else. Now that 93f23f0 is in blead, it is easy to convert them to macros that don't have extra overhead

  • This set of changes does not require a perldelta entry.

@tonycoz
Copy link
Contributor

tonycoz commented Jul 28, 2025

but macros require fewer resources

do they?

One serious problem with macros (and more macros and more macros) is they have global scope, and I don't mean top-level scope - I mean they replace anywhere.

I know the existing macros break C++ headers, I've had to work around this by making sure I include the perl headers after anything else when writing XS.

(names like "seed" and "Copy" don't help)

In file included from /usr/lib/llvm-21/include/llvm/ADT/STLExtras.h:21,
                 from /usr/lib/llvm-21/include/llvm/ADT/DenseMap.h:20,
                 from /usr/lib/llvm-21/include/llvm/ExecutionEngine/Orc/CoreCont
ainers.h:16,
                 from /usr/lib/llvm-21/include/llvm/ExecutionEngine/Orc/Material
izationUnit.h:17,
                 from /usr/lib/llvm-21/include/llvm/ExecutionEngine/Orc/Absolute
Symbols.h:16,
                 from /usr/lib/llvm-21/include/llvm/ExecutionEngine/Orc/LLJIT.h:
17,
                 from LLVM.xs:10:
/usr/lib/llvm-21/include/llvm/ADT/Hashing.h:500:32: error: macro "seed" passed 1
 arguments, but takes just 0
  500 |     : seed(get_execution_seed()) {}
      |                                ^
/usr/lib/llvm-21/include/llvm/IR/ValueMap.h:266:34: error: macro "Copy" requires 4 arguments, but only 1 given
  266 |     ValueMapCallbackVH Copy(*this);
      |                                  ^
In file included from /usr/lib/x86_64-linux-gnu/perl/5.36/CORE/perl.h:2999:
/usr/lib/x86_64-linux-gnu/perl/5.36/CORE/handy.h:2791: note: macro "Copy" defined here
 2791 | #define Copy(s,d,n,t)   (MEM_WRAP_CHECK_(n,t) perl_assert_ptr(d), perl_assert_ptr(s), (void)memcpy((char*)(d),(const char*)(s), (n) * sizeof(t)))
      | 
/usr/lib/llvm-21/include/llvm/IR/ValueMap.h:279:34: error: macro "Copy" requires 4 arguments, but only 1 given
  279 |     ValueMapCallbackVH Copy(*this);
      |                                  ^
In file included from /usr/lib/llvm-21/include/llvm/IR/IRBuilder.h:34,
                 from LLVM.xs:12:
/usr/lib/llvm-21/include/llvm/IR/Instructions.h:2695:34: error: macro "block_end" requires 2 arguments, but only 1 given
 2695 |   const_block_iterator block_end() const {
      |                                  ^
/usr/lib/x86_64-linux-gnu/perl/5.36/CORE/embed.h:75: note: macro "block_end" defined here
   75 | #define block_end(a,b)          Perl_block_end(aTHX_ a,b)
      | 
/usr/lib/llvm-21/include/llvm/IR/Instructions.h:2700:48: error: macro "block_end" requires 2 arguments, but only 1 given
 2700 |     return make_range(block_begin(), block_end());
      |                                                ^

The function is hereby removed in favor of calling the plain
uvoffuni_to_utf8_flags macro that already exists
These were inline functions, but macros require fewer resources;  I
intented them to be macros all along, but until
93f23f0, that was harder.
This was an inline function that just called another function; a macro
uses fewer resources
@khwilliamson
Copy link
Contributor Author

In the case of this p.r., there is less runtime overhead, and the names are very unlikely to conflict with others.

But what we could do is change handy.h to

#define Perl_Copy (PERL_MEM_WRAP_CHECK_(n,t) perl_assert_ptr(d), perl_assert_ptr(s), (void)Perl_memcpy((char*)(d),(const char*)(s), (n) * sizeof(t)))

And change embed.h so it looks more like


#ifndef Copy
#  define Copy Perl_Copy
#endif

When you compiled your code using plain Copy it wouldn't work, so you'd have to change to use the long name. Any macros we define that use plain Copy would also have to change to use the long name.

The long name calls would have to include aTHX_ when appropriate.

I believe I know how to fix regen/embed.pl to automatically DTRT for thread context; it would use the new scheme only if the macro name begins with Perl_

This requies extra work, and would be impractical if a significant number of our short names have conflicts.

@tonycoz
Copy link
Contributor

tonycoz commented Jul 29, 2025

In the case of this p.r., there is less runtime overhead, and

evidence?

Testing this change vs blead, the exact same code was generated for the call to uv_to_utf8() in Perl_moreswitches for the in '0' case in perl.c.

If you think this will improve runtime overhead in some other specific case, let me know and I'll take a look.

the names are very unlikely to conflict with others.

I agree in this case.

But what we could do is change handy.h to

I think the ship has sailed here.

My point is that macros are pretty bad, claiming they're better than an inline function where an inline function works needs some evidence.

@khwilliamson
Copy link
Contributor Author

I think I see part of the disconnect. I can see that the inline functions would not require more resources than macros. And I converted them to inline at the last minute in 5.41, as a result of having to revert some commits. This was intended to be a temporary measure. But they work well enough there, so we could leave them. I also am still influenced by the times when inlined functions were a precious resource because compilers had a very finite limit on how many you could have. Making them inline functions adds more boiler plate to the code base.

The other function is in utf8.c, and merely implements the long form of uvoffuni_to_utf8_flags, Calling the long form adds an indirection. By changing the flag on an embed.fnc entry, that code in utf8.c can be removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants