Skip to content

Conversation

@tmc
Copy link
Contributor

@tmc tmc commented Oct 28, 2025

Darwin ARM64 uses byte-aligned argument packing, not 8-byte slots.
The current code uses 8-byte slots for everything, causing two problems:

  1. Functions with >8 parameters fail when using int32, float32, etc.
  2. Struct arguments are packed incorrectly.

For example:

// C: int32_t sum(int32_t a1, ..., int32_t a11)
var fn func(int32, int32, int32, int32, int32, int32, int32, int32, int32, int32, int32) int32
purego.RegisterLibFunc(&fn, lib, "sum_11_int32")
result := fn(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
// fails: stack arguments misaligned

The fix implements proper Darwin ARM64 calling conventions:

  • Stack arguments use natural alignment (int32 = 4 bytes, not 8).
  • Multiple small values pack into single 8-byte slots.
  • Struct arguments use byte-level copying, not slot alignment.
  • Callbacks unpack arguments using Darwin-specific rules.
  • Register allocation tries registers before falling back to stack.

The code splits ARM64 struct handling into darwin and non-darwin versions
to preserve correct behavior on Linux and other ARM64 systems.

Fixes #352

@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from 3553dfa to 61448b8 Compare October 28, 2025 16:02
@tmc
Copy link
Contributor Author

tmc commented Oct 28, 2025

Rebased and fixed this up.

@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from 0e9a914 to 4aa156e Compare October 28, 2025 16:35
@tmc
Copy link
Contributor Author

tmc commented Oct 28, 2025

note: I may have a clean way to extend this abi change to callbacks, will push once I validate.

@tmc
Copy link
Contributor Author

tmc commented Oct 28, 2025

note: I may have a clean way to extend this abi change to callbacks, will push once I validate.

Success! I've included the c-packed arguments for callbacks as well on darwin/arm64.

@hajimehoshi hajimehoshi marked this pull request as draft October 28, 2025 17:33
@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from 2a73c27 to 0ee3fd6 Compare October 29, 2025 15:53
@tmc tmc changed the title Fix Darwin ARM64 stack parameter packing for sub-8-byte types purego: fix Darwin ARM64 calling convention for struct arguments, callbacks, and stack packing Oct 29, 2025
@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch 2 times, most recently from 3d21862 to c39e130 Compare October 29, 2025 17:08
@tmc tmc marked this pull request as ready for review October 29, 2025 18:16
@hajimehoshi hajimehoshi requested a review from Copilot October 30, 2025 04:34
@hajimehoshi
Copy link
Member

Awesome! Would it be possible to split the PR?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements support for Darwin ARM64 calling convention for stack argument packing in purego. The key change is handling the difference between Darwin ARM64's byte-level packing (where arguments use their natural size) vs. the 8-byte slot approach used on other platforms.

  • Darwin ARM64 now uses C-style byte-level packing for stack arguments with proper alignment
  • Callback handling split into platform-specific implementations for Darwin ARM64 vs. generic
  • Smart stack limit checking based on actual byte requirements rather than simple argument counts

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
func.go Added Darwin ARM64 stack bundling logic with C-style packing and calculateStackBytesNeeded helper
func_darwin_arm64.go Platform-specific helpers for determining when stack packing is needed and struct register fitting
func_notdarwin.go Stub implementations for non-Darwin ARM64 platforms
syscall_sysv.go Split callback handling into callbackWrapGeneric and callbackWrapDarwinARM64
struct_darwin_arm64.go Darwin ARM64-specific struct register placement with byte-level copying for non-HFA/HVA
struct_notdarwin_arm64.go Delegates to standard AArch64 logic for non-Darwin ARM64
struct_arm64.go Renamed placeRegisters to placeRegistersAArch64 for clarity
testdata/abitest/abi_test.c Added comprehensive test cases for struct passing and stack arguments
testdata/libcbtest/callback_packing_test.c New test file for callback argument packing
func_test.go Added extensive tests for argument passing and stack limits
callback_test.go Added callback packing tests for int32, mixed types, and small types

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from ee5135d to ed55e2d Compare October 30, 2025 14:37
@tmc
Copy link
Contributor Author

tmc commented Oct 30, 2025

Awesome! Would it be possible to split the PR?

Open to that, can you suggest what you'd like to see separated?

edit: I can pull the callback fixes out to a separate PR, sound good?

edit2: The changes to argument packing cause a failure in the callback tests which is making it difficult to separate this into two logical PRs.

@hajimehoshi
Copy link
Member

Open to that, can you suggest what you'd like to see separated?

I'll take a look tomorrow, but I'd like to know @TotallyGamerJet's opinion

@tmc
Copy link
Contributor Author

tmc commented Oct 30, 2025

Going to mark this as draft and see if I can split it up successfully.

@tmc tmc marked this pull request as draft October 30, 2025 19:30
@tmc tmc requested a review from Copilot October 30, 2025 19:46
@tmc
Copy link
Contributor Author

tmc commented Oct 30, 2025

Separating the callback handling would require refactoring some tests to not do the somewhat artificial Go->Go vs the Go->c pathway with a change in how the values are arranged. I've tried to simplify and improve the organization of this so it's easier to review. If it still appears too large let me know and I can work on refactoring the callback tests.

IMO these changes go hand in hand and this rounds out the purego darwin amd64 ABI support.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

syscall_sysv.go:208

  • [nitpick] The blank line at line 200 was removed between the default: case and the conditional logic. While this is a minor formatting change, consider whether this improves or reduces readability. The blank line previously helped separate the case label from its logic.
			if intsN >= numOfIntegerRegisters() {
				pos = stack
				stack++
			} else {
				// the integers begin after the floats in frame
				pos = intsN + numOfFloatRegisters
			}
			intsN++

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from 8e967c2 to 0c9abc9 Compare October 31, 2025 19:21
@tmc
Copy link
Contributor Author

tmc commented Oct 31, 2025

I figured out a way to split this out with some temporary handling that avoids the packing for go callbacks until the callback fixes go in.

@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch 4 times, most recently from 435d33e to 3c78dff Compare October 31, 2025 19:52
tmc added 2 commits October 31, 2025 12:59
Implement comprehensive improvements to Darwin ARM64 stack argument handling
with C-style packing and smarter validation.

Key improvements:
- Replace simple argument counting with byte-based stack validation that
  accounts for Darwin ARM64's efficient argument packing
- Add C-style stack argument bundling that creates properly aligned packed
  structs for stack arguments, matching Darwin's calling convention
- Separate Darwin ARM64 struct handling from other ARM64 platforms with
  dedicated files (struct_darwin_arm64.go, struct_notdarwin_arm64.go)
- Implement shouldBundleStackArgs() to detect when to start stack packing
- Add collectStackArgs() to separate register vs stack arguments
- Add bundleStackArgs() for proper C-style struct packing with alignment

For Darwin ARM64 specifically:
- Non-HFA/HVA structs use byte-level copying in 8-byte chunks
- HFA/HVA structs continue using element-by-element placement
- Stack arguments are bundled into aligned structs before copying
- Supports complex mixed argument scenarios (primitives + structs)

Testing improvements:
- Add comprehensive TestABI_ArgumentPassing with 15+ test cases
- Test various combinations: 10+ int32s, mixed types, HFA structs
- Add TestABI_TooManyArguments to verify improved error handling
- Test cases cover register exhaustion and stack overflow scenarios

This enables Darwin ARM64 to handle significantly more arguments (e.g., 20
int32s) while maintaining ABI compatibility and providing better error
messages for truly excessive argument counts.
Adds callback detection to avoid applying C-style tight packing to callback
functions, since callback unpacking still uses the traditional 8-byte slot
convention and would break with tightly-packed arguments.

Key changes:
- Add isCallbackFunction() to detect purego callback addresses
- Skip tight packing in RegisterFunc when cfn is a callback
- Add helper functions getCallbackStart() and getMaxCB() for address range checks
- Include TODO comments marking this as temporary until callback unpacking
  is updated to handle tight packing

This ensures callbacks continue working correctly while the tight packing
feature is being developed for Darwin ARM64.
@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from 3c78dff to f497229 Compare October 31, 2025 19:59
@tmc tmc changed the title purego: fix Darwin ARM64 calling convention for struct arguments, callbacks, and stack packing Fix Darwin ARM64 calling convention for struct arguments, callbacks, and stack packing Oct 31, 2025
@tmc tmc marked this pull request as ready for review October 31, 2025 20:06
@tmc
Copy link
Contributor Author

tmc commented Oct 31, 2025

OK I separated the callback portion of this to #364 -- this required adding some temporary shims to avoid this arg packing when a destination is a go callback.

This commit addresses review feedback from @TotallyGamerJet:

1. Merge struct_others.go and struct_other.go into struct_notdarwin.go
   - Consolidates platform-specific stubs for non-Darwin platforms
   - Reduces confusion from having similarly-named files
   - Uses build tag !darwin to cover all non-Darwin platforms

2. Refactor shouldBundleStackArgs to avoid multi-level if statements
   - Restructures control flow with early returns
   - Improves readability by reducing nesting depth
   - Makes the logic flow more linear and easier to follow

3. Extract copyStruct8ByteChunks helper function
   - Deduplicates code for copying struct memory in 8-byte chunks
   - Used by both placeRegisters and bundleStackArgs
   - Follows DRY principle for Darwin ARM64 byte-level packing
@tmc tmc force-pushed the improve-darwin-arm64-stack-packing branch from 93945f3 to 6563f9f Compare November 16, 2025 02:19
Improve build tag specificity and organization for struct handling files:

- Add explicit architecture build tags to struct_amd64.go and struct_arm64.go
- Add explicit darwin && arm64 tag to struct_darwin_arm64.go
- Refine struct_notdarwin.go to exclude arm64 and amd64 (!darwin && !arm64 && !amd64)
- Refine struct_notdarwin_arm64.go to be ARM64-specific (!darwin && arm64)
- Create struct_notdarwin_amd64.go for non-Darwin AMD64 stub functions
- Move architecture-specific stub functions from struct_notdarwin.go to arch-specific files
  (shouldBundleStackArgs, structFitsInRegisters, collectStackArgs, bundleStackArgs)
- Remove generic estimateStackBytes stub from struct_notdarwin.go

This ensures each file has precise build constraints matching its actual
platform coverage and places architecture-specific stubs in the appropriate files.

Fixes Linux ARM64 and AMD64 compilation which previously failed with function
redeclaration errors due to missing build tags.
@tmc
Copy link
Contributor Author

tmc commented Nov 17, 2025

I think i've addressed all comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Apple ARM64 ABI for stack arguments

3 participants