Skip to content

For 4.1.4: Fix channel number reuse (backport #14317) #14318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 4, 2025

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Jul 31, 2025

This commit fixes the following test flake that occurred in CI:

make -C deps/rabbit ct-amqp_dotnet t=cluster_size_1:redelivery

After receiving the end frame, the server session proc replies with the end frame.

Usually when the test case succeeds, the server connection process receives a DOWN for the session proc and untracks its channel number such that a subsequent begin frame for the same channel number will create a new session proc in the server.

In the flake however, the client receives the end, and pipelines new begin, attach, and flow frames. These frames are received in the server connection's mailbox before the monitor for the old session proc fires. That's why these new frames are sent to the old session proc causing the test case to fail.

This reveals a bug in the server.
This commit fixes this bug similarly as done in the AMQP 0.9.1 channel in

%% We issue the channel.close_ok response after a handshake with
%% the reader, the other half of which is ready_for_close. That
%% way the reader forgets about the channel before we send the
%% response (and this channel process terminates). If we didn't do
%% that, a channel.open for the same channel number, which a
%% client is entitled to send as soon as it has received the
%% close_ok, might be received by the reader before it has seen
%% the termination and hence be sent to the old, now dead/dying
%% channel process, instead of a new process, and thus lost.
ReaderPid ! {channel_closing, self()},

Channel reuse by the client is valid and actually common, e.g. if channel-max is 0.


This is an automatic backport of pull request #14317 done by Mergify.

This commit fixes the following test flake that occurred in CI:
```
make -C deps/rabbit ct-amqp_dotnet t=cluster_size_1:redelivery
```

After receiving the end frame, the server session proc replies with the end frame.

Usually when the test case succeeds, the server connection process receives
a DOWN for the session proc and untracks its channel number such that a
subsequent begin frame for the same channel number will create a new session
proc in the server.

In the flake however, the client receives the end, and pipelines new begin,
attach, and flow frames. These frames are received in the server connection's
mailbox before the monitor for the old session proc fires. That's why these
new frames are sent to the old session proc causing the test case to
fail.

This reveals a bug in the server.
This commit fixes this bug similarly as done in the AMQP 0.9.1 channel in
https://github.com/rabbitmq/rabbitmq-server/blob/94b4a6aafdfac6b6cae102f50b188e5ea4a32c0e/deps/rabbit/src/rabbit_channel.erl#L1146-L1155

Channel reuse by the client is valid and actually common, e.g. if channel-max
is 0.

(cherry picked from commit 6413d2d)
@mergify mergify bot assigned ansd Jul 31, 2025
@michaelklishin michaelklishin added this to the 4.1.4 milestone Jul 31, 2025
@michaelklishin michaelklishin changed the title Fix channel number reuse (backport #14317) For 4.1.4: Fix channel number reuse (backport #14317) Jul 31, 2025
@ansd ansd merged commit 863f033 into v4.1.x Aug 4, 2025
815 of 818 checks passed
@ansd ansd deleted the mergify/bp/v4.1.x/pr-14317 branch August 4, 2025 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants