Skip to content

Conversation

@fearful-symmetry
Copy link
Contributor

Tinkering with an alternative to elastic/beats#42933, by trying to drain the socket on close if we get an error while trying to unset the pid.

This is meant as an alternative fix for elastic/beats#26031, where we do a non-blocking write and try to drain the socket instead of starting a second goroutine.

In draft, as I'm still testing and figuring out how netlink handles errors.

@fearful-symmetry fearful-symmetry self-assigned this Mar 10, 2025
@haesbaert
Copy link

Our assumption is that send blocks because netlink is congested, so it stops reading from the send buffer until we drain the receive buffer, first I wanted to confirm this:

The path is:
userland does sendmsg(2):
in kernel:
netlink_sendmsg -> netlink_unicast -> netlink_attachskb
https://elixir.bootlin.com/linux/v4.18.20/source/net/netlink/af_netlink.c#L1217-L1255

See comments inline

int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
		      long *timeo, struct sock *ssk)
{
	struct netlink_sock *nlk;

	nlk = nlk_sk(sk);

        /* haesbaert: if netlink is having a hard time */
	if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
	     test_bit(NETLINK_S_CONGESTED, &nlk->state))) {
		DECLARE_WAITQUEUE(wait, current);
                /* haesbaert: if there is no timeout (case for nonblocking) return EAGAIN */
		if (!*timeo) {
			if (!ssk || netlink_is_kernel(ssk))
				netlink_overrun(sk);
			sock_put(sk);
			kfree_skb(skb);
			return -EAGAIN;
		}

                /* haesbaert: otherwise block for *timeo */
		__set_current_state(TASK_INTERRUPTIBLE);
		add_wait_queue(&nlk->wait, &wait);

                /* haesbaert: interlocked in queue, recover from the sleep race */
		if ((atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf ||
		     test_bit(NETLINK_S_CONGESTED, &nlk->state)) &&
		    !sock_flag(sk, SOCK_DEAD))
			*timeo = schedule_timeout(*timeo); /* haesbaert: actual sleep is here */

		__set_current_state(TASK_RUNNING);
		remove_wait_queue(&nlk->wait, &wait);
		sock_put(sk);

		if (signal_pending(current)) {
			kfree_skb(skb);
			return sock_intr_errno(*timeo);
		}
		return 1;
	}
	netlink_skb_set_owner_r(skb, sk);
	return 0;
}

So on our case the socket is blocking, and default timeout is MAX_SCHEDULE_TIMEOUT:
https://elixir.bootlin.com/linux/v4.18.20/source/net/core/sock.c#L2807
which is infinity.

We can verify this by doing a getopt SO_SNDTIME@SOL_SOCKET, if timeout is zero, it's actually infinity (sockgetopt in the kernel translates this for us):

	fd = socket(PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_AUDIT);
	if (fd == -1)
		err(1, "socket");
	optlen = sizeof(tv);
	bzero(&tv, sizeof(tv));
	if (getsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &tv, &optlen) == -1)
		err(1, "getsockopt");
	printf("get %ld:%ld (optlen=%d/%zd)\n",
	    tv.tv_sec, tv.tv_usec, optlen, sizeof(tv));

produces: get 0:0 (optlen=16/16)

So the sender blocks when netlink is congested, so if we drain the receive buffer it should recover.

Copy link

@haesbaert haesbaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

The loop is missing some error checking and it's unbound, it's also requesting for ACKs when it shouldn't.

I took a stab, it compiles but I didn't test it, I don't see a season to keep trying to send multiple times, try once, if it got EAGAIN, drain, try again, done. I wrote this, feel free to use it or not:

	sendit := func(msg syscall.NetlinkMessage) error {
	again:
		_, err := c.Netlink.SendNoWait(msg)
		if errors.Is(err, syscall.EINTR) {
			goto again
		}
		return err
	}

	// If our request to unset the PID would block, then try to
	// drain events from the netlink socket, resend, try again.
	// In netlink, EAGAIN on the send side means the kernel
	// stopped reading the send buffer since the receive buffer is
	// full.
	err := sendit(msg)
	if !errors.Is(err, syscall.EAGAIN) {
		return
	}
	// Drain until drainMax or we get EAGAIN
	for drainMax := 10000; drainMax > 0; drainMax-- {
		_, err = c.Netlink.Receive(true, noParse)
		if err == nil || errors.Is(err, syscall.EINTR) || errors.Is(err, syscall.ENOBUFS) {
			continue
		} else if errors.Is(err, syscall.EAGAIN) {
			break
		}

		return
	}
	_ = sendit(msg)

Copy link

@haesbaert haesbaert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a return for other errors of send, otherwise this looks fine.

@haesbaert
Copy link

LGTM

@fearful-symmetry fearful-symmetry marked this pull request as ready for review March 12, 2025 21:28
audit.go Outdated
// if we get an interrupt, retry the send
if err == nil {
return nil
} else if errors.Is(err, syscall.EINTR) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(*NetlinkClient).SendNoWait is a thin wrapper around syscall.SendTo. No functions in syscall ever wrap errors and given go1compat, this will not change (go1compat says that syscall is not covered, but this relates to changes forced by changes in the underlying OS and any change to types returned by syscalls would neither be likely from the Go team, nor quietly accepted by external reviewer).

tl;dr; No errors.Is is required here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, even if syscall.* isn't going to be wrapping anything, I feel like it's safer to assume that c.Netlink.* might wrap something in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it shouldn't, but it feels a little safer to be defensive. Also the linter complains if you don't use errors.Is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preferred approach would be to ensure that the behaviour is covered by tests, and then leave it without the extra weight. This would find regressions and ensure that bloat is not added to SendNoWait. I think the linter is wrong here. However, I'm not an owner, so it doesn't really matter.

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @fearful-symmetry

@andrewkroh andrewkroh added the Team:Security-Linux Platform Linux Platform Team in Security Solution label Mar 17, 2025
@fearful-symmetry fearful-symmetry merged commit 1916e11 into elastic:main Mar 18, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Team:Security-Linux Platform Linux Platform Team in Security Solution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants