For 4.1.4: Fix channel number reuse (backport #14317) #14318

mergify · 2025-07-31T11:54:51Z

This commit fixes the following test flake that occurred in CI:

make -C deps/rabbit ct-amqp_dotnet t=cluster_size_1:redelivery

After receiving the end frame, the server session proc replies with the end frame.

Usually when the test case succeeds, the server connection process receives a DOWN for the session proc and untracks its channel number such that a subsequent begin frame for the same channel number will create a new session proc in the server.

In the flake however, the client receives the end, and pipelines new begin, attach, and flow frames. These frames are received in the server connection's mailbox before the monitor for the old session proc fires. That's why these new frames are sent to the old session proc causing the test case to fail.

This reveals a bug in the server.
This commit fixes this bug similarly as done in the AMQP 0.9.1 channel in

rabbitmq-server/deps/rabbit/src/rabbit_channel.erl

Lines 1146 to 1155 in 94b4a6a

    
           %% We issue the channel.close_ok response after a handshake with 
        
           %% the reader, the other half of which is ready_for_close. That 
        
           %% way the reader forgets about the channel before we send the 
        
           %% response (and this channel process terminates). If we didn't do 
        
           %% that, a channel.open for the same channel number, which a 
        
           %% client is entitled to send as soon as it has received the 
        
           %% close_ok, might be received by the reader before it has seen 
        
           %% the termination and hence be sent to the old, now dead/dying 
        
           %% channel process, instead of a new process, and thus lost. 
        
           ReaderPid ! {channel_closing, self()},

Channel reuse by the client is valid and actually common, e.g. if channel-max is 0.

This is an automatic backport of pull request #14317 done by Mergify.

This commit fixes the following test flake that occurred in CI: ``` make -C deps/rabbit ct-amqp_dotnet t=cluster_size_1:redelivery ``` After receiving the end frame, the server session proc replies with the end frame. Usually when the test case succeeds, the server connection process receives a DOWN for the session proc and untracks its channel number such that a subsequent begin frame for the same channel number will create a new session proc in the server. In the flake however, the client receives the end, and pipelines new begin, attach, and flow frames. These frames are received in the server connection's mailbox before the monitor for the old session proc fires. That's why these new frames are sent to the old session proc causing the test case to fail. This reveals a bug in the server. This commit fixes this bug similarly as done in the AMQP 0.9.1 channel in https://github.com/rabbitmq/rabbitmq-server/blob/94b4a6aafdfac6b6cae102f50b188e5ea4a32c0e/deps/rabbit/src/rabbit_channel.erl#L1146-L1155 Channel reuse by the client is valid and actually common, e.g. if channel-max is 0. (cherry picked from commit 6413d2d)

mergify bot assigned ansd Jul 31, 2025

michaelklishin added this to the 4.1.4 milestone Jul 31, 2025

michaelklishin changed the title ~~Fix channel number reuse (backport #14317)~~ For 4.1.4: Fix channel number reuse (backport #14317) Jul 31, 2025

ansd merged commit 863f033 into v4.1.x Aug 4, 2025
815 of 818 checks passed

ansd deleted the mergify/bp/v4.1.x/pr-14317 branch August 4, 2025 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

For 4.1.4: Fix channel number reuse (backport #14317) #14318

For 4.1.4: Fix channel number reuse (backport #14317) #14318

Uh oh!

mergify bot commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

	%% We issue the channel.close_ok response after a handshake with
	%% the reader, the other half of which is ready_for_close. That
	%% way the reader forgets about the channel before we send the
	%% response (and this channel process terminates). If we didn't do
	%% that, a channel.open for the same channel number, which a
	%% client is entitled to send as soon as it has received the
	%% close_ok, might be received by the reader before it has seen
	%% the termination and hence be sent to the old, now dead/dying
	%% channel process, instead of a new process, and thus lost.
	ReaderPid ! {channel_closing, self()},

For 4.1.4: Fix channel number reuse (backport #14317) #14318

For 4.1.4: Fix channel number reuse (backport #14317) #14318

Uh oh!

Conversation

mergify bot commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!