Skip to content

Conversation

@A-Henning
Copy link

The plugin was unconditionally exiting when connected to secondary MongoDB nodes, causing no piggybacking data generation and the loss off All MongoDB Services on the Dummy Host.

Fix: Only exit if there is truly no primary node available.


Expected vs Observed Behavior

Expected: Plugin should generate piggybacking data for MongoDB monitoring services even when connected to secondary nodes, as long as a primary exists in the replica set.

Observed: Plugin exits early with unconditional return statement when connected to secondary nodes, preventing any piggybacking data generation and causing complete loss of all MongoDB Services on the dummy host.

Operating System

Ubuntu 20.04/22.04 LTS (CheckMK agent host)
CheckMK version: 2.4.0p7.cce

Local Setup

  • MongoDB Atlas cluster behind Azure Private Endpoint
  • Load balancer (managed by mongo) routing connections to secondary MongoDB nodes
  • CheckMK agent with mk_mongodb.py plugin for piggybacking

Reproduce (routing is managed by Mongo - so its only reproducable if you are being routed to a secondary node)

  1. Set up MongoDB Atlas cluster with Azure Private Endpoint
  2. Configure CheckMK mk_mongodb.py plugin to connect via Private Endpoint
  3. MongoDB Atlas Load Blancer may route to secondary MongoDB node
  4. Plugin exits early on line 985 without generating piggybacking data
  5. All MongoDB services disappear from the target dummy host

Root Cause

Line 985 contains return without checking if a primary exists in the replica set.

Solution

Replace unconditional return with conditional logic that only exits if no primary is available.

Changes

  • Modified line 985 in agents/plugins/mk_mongodb.py
  • Added condition to check for primary existence before returning
  • Allows monitoring to continue from secondary nodes when primary is available

Testing

  • Tested with MongoDB Atlas behind Azure Private Endpoint
  • Verified piggybacking data generation from secondary nodes
  • Confirmed all services appear correctly on target hosts

Impact

Fixes MongoDB Atlas monitoring for users with:

  • Azure Private Endpoints
  • MongoDB Atlas Load balancers routing to secondary nodes
  • Any setup where connection lands on secondary first

The plugin was unconditionally exiting when connected to secondary MongoDB nodes, causing monitoring failures with Azure Private Endpoints and load balancers that route to secondary nodes.

Fix: Only exit if there is truly no primary node available.
@github-actions
Copy link

github-actions bot commented Aug 28, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@A-Henning
Copy link
Author

A-Henning commented Aug 28, 2025

I have read the CLA Document and I hereby sign the CLA or my organization already has a signed CLA.

@BenediktSeidl
Copy link
Contributor

Thanks for your contribution! I'm trying to fully understand your situation: If i read the code and the comment correctly, the primary node vanishes completely, is this correct? So it's not just that the load balancer switches to another node, but the primary node is removed from the cluster?

@A-Henning
Copy link
Author

Thanks for your contribution! I'm trying to fully understand your situation: If i read the code and the comment correctly, the primary node vanishes completely, is this correct? So it's not just that the load balancer switches to another node, but the primary node is removed from the cluster?

The Node does not vanish from the cluster. The Load Balancer switches to a secondary Node which causes the MongoDB Plugin to exit early. It does therefore not generate Piggyback Data for my MongoDB Dummy-Host which results in the loss of all CheckMK Services that this Plugin provides.

@BenediktSeidl
Copy link
Contributor

Thanks for the clarification. We will have an internal discussion about this, and then come back to you.

Comment on lines 983 to +987
if "primary" in repl_info and not repl_info.get("primary"):
_write_section_replica(None)
return
# Fixed: Only return if there is truly no primary
if "primary" in repl_info and not repl_info.get("primary"):
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating the if check, how about just indenting the return statement?

Suggested change
if "primary" in repl_info and not repl_info.get("primary"):
_write_section_replica(None)
return
# Fixed: Only return if there is truly no primary
if "primary" in repl_info and not repl_info.get("primary"):
return
if "primary" in repl_info and not repl_info.get("primary"):
_write_section_replica(None)
return

@BenediktSeidl
Copy link
Contributor

@A-Henning thanks for your interest in improving Checkmk. We discussed this internally (and briefly lost the ticket we assigned this task to, so sorry about the delay) and came to the conclusion that the current change would alter the behavior of the MongoDB agent for all our customers: Before your change only the primary node outputs the full data, after your change also the other nodes would return the complete information.

Also, after your modifications the agent crashes for arbiterOnly nodes with pymongo.errors.NotPrimaryError: node is not in primary or recovering state, full error: {'ok': 0.0, 'errmsg': 'node is not in primary or recovering state', 'code': 13436, 'codeName': 'NotPrimaryOrSecondary'} on agents/plugins/mk_mongodb.py", line 72, in get_database_info: db_names = client.list_database_names()

In order to accept your changes you have to make sure that the behavior for existing customers does not change, and that the agent plugin does not crash with arbiter nodes.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants