-
Notifications
You must be signed in to change notification settings - Fork 8k
Bluetooth: Mesh: Make network msg cache netkey aware #96866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Bluetooth: Mesh: Make network msg cache netkey aware #96866
Conversation
61b34a8
to
f37629f
Compare
Improve the network message cache to be aware of network keys to prevent false duplicate detection across different subnets. This ensures that messages with the same source address and sequence number but from different network keys are not incorrectly identified as duplicates, as it can happen in certain cases. See ES-27446. The approach uses credential pointer which never changes during runtime of the node as a proxy for netkey identification. This approach is simplified but still solves the need and avoids major changes to msg_cache_add() function as NID/NetKey index information is not sufficiently unambiguous at the time when this API is called. Because the information is hidden in two sub[] arrays within rx context and to decipher this info (to arrive at correct NID/NetKey) you will need several checks to figure out which state of the key refresh procedure is active, and which friendship/LPN function is active, and exactly which credential is being used. Signed-off-by: Omkar Kulkarni <omkar.kulkarni@nordicsemi.no>
f37629f
to
b6c3dda
Compare
|
LGTM, please convert it to "Ready for review" |
@@ -402,6 +406,7 @@ static void bt_mesh_net_local(struct k_work *work) | |||
while ((node = sys_slist_get(&bt_mesh.local_queue))) { | |||
struct loopback_buf *buf = CONTAINER_OF(node, struct loopback_buf, node); | |||
struct bt_mesh_net_rx rx = { | |||
.cred = NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary.
C99 Standard, ISO/IEC 9899:1999, Section 6.7.8 (Initialization), paragraph 21:
"If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration."
|
||
return bt_mesh_net_decrypt(&cred->enc, out, BT_MESH_NET_IVI_RX(rx), | ||
proxy) == 0; | ||
int err = bt_mesh_net_decrypt(&cred->enc, out, BT_MESH_NET_IVI_RX(rx), proxy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move err
definition to the beginning of the function.
uint32_t src : 15, /* MSb of source is always 0 */ | ||
seq : 17; | ||
seq : 17; | ||
const struct bt_mesh_net_cred *cred; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When cache is quite big, we can run into this situation:
- device A receives message M from device B (in direct range) over subnet A with cred pointer A
- right after this, Configuration Manager removes device A from subnet A and adds to subnet B
- subnets in the code is an array, so the code will allocate first empty space in the array, which will be the address of subnet A.
- a subnet bridge device which is in range of device A (but not in range of device B), eventually with some delay receives message M from device B and bridges it over subnet B.
- device A receives the bridged message B over subnet B, it sees that SRC and SEQ are the same as for previously received message, cred pointer points to the same address and drops the message.
- the first message M over subnet A was rejected by device A because there is no appkey attached to the netkey of this subnet A. But message B sent over subnet B should have been processed because it has appkey attached to the netkey of that subnet B.
This a very theoretical case, but there is a risk of using pointer to the key here as it points to an address that can be reused. Just wanted to highlight this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using array pointer can also let a message pass the message cache if secondary key was used to decrypt message M for the first time, but the KR procedure completed, key pointers were swapped and now the same message M relayed with some delay will go in again because cred
pointers won't match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using array pointer can also let a message pass the message cache if secondary key was used to decrypt message M for the first time, but the KR procedure completed, key pointers were swapped and now the same message M relayed with some delay will go in again because
cred
pointers won't match.
I guess after swapping keys, deobfuscation gives garbage address and sequence number since privacy is swapped too. There will not be relaying rings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess after swapping keys, deobfuscation gives garbage address and sequence number since privacy is swapped too. There will not be relaying rings.
The same key will be used. It will be on a different place (keys[0]
instead of keys[1]
). But pointer will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is not issues of pointer usage. Network cache should be purged after removing subnetwork (friendship termination???). It prevents considering deprecated data for networking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Network cache should be purged after removing subnetwork (friendship termination???).
But this is not the case for key refresh.
Purging or updating message cache may work, but I guess there is some extra risk in this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, then storing subnetwork array index instead should work. It will solve swapping keys issue.
Additionally, BT_MESH_KEY_DELETED event handler in net.c should purge network cache.
It solves your described scenarios
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use NetKey index the same way here as pointers but instead of them? If we assume that the probability of generating SEQ and SRC as garbage is unlikely by any other key, then using netkey index is the same as using pointer to keys that were used to successfully decrypt the last message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems NetKey index should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with NetKey index is arriving at it in the API in a way that does not break API is problematic (because of friendship and LPNs). I had made prototype with that first but discarded it for simplicity of this approach. Concern about using old pointer address as index is valid, but this really far-fetched case. The problem gets resolved as soon as a the same message with a new SEQ number arrives. So, even if message is incorrectly ignored at this case, the system continues function as expected on the next same (or different) model message from same source with different SEQ. What this solves is a systematic failure that always happens when stable network configuration exists and implementations are behaving in certain ways.
Improve the network message cache to be aware of network keys to prevent false duplicate detection across different subnets. This ensures that messages with the same source address and sequence number but from different network keys are not incorrectly identified as duplicates, as it can happen in certain cases. See ES-27446.
The approach uses credential pointer which never changes during runtime of the node as a proxy for netkey identification. This approach is simplified but still solves the need and avoids major changes to msg_cache_add() function as NID/NetKey index information is not sufficiently unambiguous at the time when this API is called. Because the information is hidden in two sub[] arrays within rx context and to decipher this info (to arrive at correct NID/NetKey) you will need several checks to figure out which state of the key refresh procedure is active, and which friendship/LPN function is active, and exactly which credential is being used.