Skip to content

Conversation

michalismeng
Copy link
Contributor

@michalismeng michalismeng commented Aug 26, 2025

What's Changed

  1. Add methods to query MCP server health endpoints:
    • Health info of all servers
    • Health info of a single server
    • Is-healthy check
  2. Cache server health responses using a TTL cache.

Notes

  • Added the cachetools library for the caching mechanism (see https://github.com/tkem/cachetools/). The library is actively maintained, MIT-licensed and requires Python 3.9+, so it seems ok for use with the mcpd SDK.
  • Let's address "health checks as pre-requisites for other calls" in a follow up PR, since this is already a lengthy one.

Refs #11

@michalismeng michalismeng marked this pull request as ready for review September 8, 2025 07:36
Copy link
Contributor

@peteski22 peteski22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @michalismeng I added a set of comments/recommendations I think we need to change, but this is a great effort so far. If you're OK with checking out the feedback and doing some updates I think this is definitely something that will help the SDK users.

🙏🏼

@peteski22
Copy link
Contributor

@michalismeng I should add: please feel free to discuss things on the PR comments, tag me for re-reviews, ask for help etc.

I'd be happy to help contribute to your PR/branch if you haven't got the bandwidth at the moment, as this is good and super helpful work! 😄 just let me know.

@michalismeng
Copy link
Contributor Author

@peteski22 Thanks for your detailed comments and review! Indeed I was a bit busy last week. I think I've addressed most of your comments.

Regarding

Start local/small
I don't think we need these utility methods yet, so let's start with them private and part of McpdClient?

My initial thought of introducing the cache_utils module was because the utilities weren't directly related to McpdClient. But I don't have a strong opinion, so I moved them to there (unfortunately it becomes a bit tricky to handle decorators in classes).

Let me know if you'd like any more changes and also feel free to make any improvements as you see fit!

@peteski22
Copy link
Contributor

@peteski22 Thanks for your detailed comments and review! Indeed I was a bit busy last week. I think I've addressed most of your comments.

Regarding

Start local/small
I don't think we need these utility methods yet, so let's start with them private and part of McpdClient?

My initial thought of introducing the cache_utils module was because the utilities weren't directly related to McpdClient. But I don't have a strong opinion, so I moved them to there (unfortunately it becomes a bit tricky to handle decorators in classes).

Let me know if you'd like any more changes and also feel free to make any improvements as you see fit!

Thanks again @michalismeng, much appreciated. I’ve forked your fork and opened a PR with a few small tweaks so we don’t end up in long review loops while you’re busy. If it looks good, we can merge it into your repo, then this PR will be ready to go.

peteski22 and others added 7 commits September 18, 2025 13:01
* Move assertions outside pytest.raises() context to ensure execution
* Fix assertion to check exception type instead of ExceptionInfo instance
* Add length check to ensure test covers all cacheable exceptions
* Make patching consistent with rest of file using @patch.object decorator
* Add mock reset between test iterations
* Add _SERVER_HEALTH_CACHE_MAXSIZE constant for maintainability
* Add proper docstrings for both class constants
* Add test to verify constant value
* Follows same pattern as _CACHEABLE_EXCEPTIONS
* Add comprehensive test coverage for is_transient() method
* Test both transient (timeout, unknown) and non-transient (ok, unreachable) states
* Also improve test coverage for is_healthy() method
* Add usage example in server_health() docstring showing retry logic pattern
* Add complete docstring for _raise_for_server_health() with Args and Raises sections
* Enhance _get_server_health() docs with reference to public API
* Improve _get_tool_definitions() with comprehensive exception documentation
* Helps developers working on public APIs understand private method contracts
* Only catch ServerUnhealthyError and ServerNotFoundError as 'not healthy'
* Let connection/auth/timeout errors propagate to caller
* Update docstring to document which exceptions can be raised
* Update test to verify new error propagation behavior

This improves error visibility - infrastructure issues (connection, auth) are
now distinguishable from server health status issues.
* Add threading.RLock to protect cache access
* Pass lock to cached decorator for synchronized operations
* Wrap cache.clear() and cache.pop() operations with lock
* Document thread safety in class docstring

The performance impact is negligible since network I/O dominates
execution time. Cache stampede risk is minimal due to 10-second TTL
and typically small number of MCP servers.
@michalismeng
Copy link
Contributor Author

michalismeng commented Sep 18, 2025

Thank you as well @peteski22! Everything looks good, I just added an additional test case for test_is_healthy_error.

Copy link
Contributor

@peteski22 peteski22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this @michalismeng 🙏🏼

LGTM

@peteski22 peteski22 merged commit b48657b into mozilla-ai:main Sep 18, 2025
7 checks passed
@peteski22
Copy link
Contributor

@michalismeng 🥳 https://github.com/mozilla-ai/mcpd-sdk-python/releases/tag/0.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants