-
Notifications
You must be signed in to change notification settings - Fork 5
Scheduler refactor and callbacks' first iteration first. #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dynatrace_extension/sdk/extension.py
Outdated
if not self._is_fastcheck: | ||
try: | ||
self._heartbeat_iteration() | ||
# TODO: is it surely okay to shedule hearbeat this way? Originaly it was scheduled in the very same scheduler, which would starve heartbeat if any callback took too long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some doubts about this part. I feel like the heartbeat should have the lowest priority, but it doesn't make much sense if most callbacks have been run in separate threads.
For now and for me it seems to be okayish implementation to the previous one, but I'm curious about your opinion.
3e822d9
to
41dea63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the scheduler implementation to use APScheduler instead of the standard library's sched
module, which also fixes a race condition bug where callbacks were being scheduled for execution based on a potentially stale timestamp during the first iteration.
- Replaced custom scheduler with APScheduler BackgroundScheduler and IntervalTrigger
- Fixed race condition in callback first iteration by removing manual timestamp tracking
- Updated tests to use the new scheduler API and removed threading test dependencies
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
dynatrace_extension/sdk/extension.py | Core scheduler refactor replacing sched with APScheduler and removing manual timestamp management |
dynatrace_extension/sdk/callback.py | Removed manual iteration tracking and timestamp calculation methods |
dynatrace_extension/sdk/status.py | Added mockable datetime function for testing |
tests/sdk/test_extension.py | Updated tests to use new scheduler API and removed threading dependencies |
tests/sdk/test_status.py | Updated test setup/teardown and scheduler calls for new API |
tests/sdk/test_endpoints_sfm.py | Replaced freezegun with manual datetime mocking and updated scheduler usage |
pyproject.toml | Removed freezegun dependency |
Comments suppressed due to low confidence (3)
tests/sdk/test_extension.py:138
- This comment indicates the test expectation changed due to implementation details. Consider if this test is still validating the intended behavior or if it needs adjustment.
# call_count expected to be 2 because scheduled `query` method used the original, nonmocked _run_callback
tests/sdk/test_extension.py:236
- The test logic has changed significantly from the original version. Verify that this still properly tests the callback scheduling behavior it was intended to validate.
self.assertEqual(extension._scheduled_callbacks[2].executions_total, 1)
41dea63
to
4560f98
Compare
I am concerned about this, it is a major rewrite of the scheduling for very little benefit. It also introduces a dependency, we have agreed that the SDK should not have any dependencies, this can introduce conflicts for extension dependencies. |
b359f8a
to
3bcd629
Compare
Hey @dlopes7 , thanks for the comment, I partly understand your doubts. Let's split those into to separate discussions maybe.
Let me know how all of that resonates with you. I'm not going to fight for this PR just for a sake of what's been already implemented.
That's for sure something to consider, but I'm not really familiar with discussions you had. Could you shed some more light on the topic? Please add anyone you want from DXS if you'd like involve someone into the discussion. |
We had discussions with @PiotrMeller and @MaciejGrynda at the beginning of the SDK development. What happened in EF1 is that the dependencies that were included with the RemotePluginModule eventually caused conflicts with extension dependencies. That is one of the reasons why we don't use "requests" for instance to make HTTP calls, that would not only increase the bundle size but also cause people to either NOT declare requests as a dependency of their extension, or be unable to pin a different version of requests, or even worse break the SDK because they introduce a specific version of requests. This dependency (APScheduler) is not as wide used as requests, but we are breaking an interesting rule of having no dependencies on the SDK to fix a minor inconvenience (the timestamp race condition), which I believe could be fixed directly in the code. And although APSCheduler fixes some things we do manually it also does a ton of other things we will never use, seems like an overkill for a simple issue |
@dlopes7 , thanks for the answer. I agree with this reasoning and will close the PR after sync with team. |
Fixed callbacks' first iteration bug with scheduler refactor to use APScheduler.
While analyzing the bug I realized we could simplify the scheduler implementation and use APScheduler, which handles ThreadPoolExecutors on its own. Scheduling jobs in intervals allowed to remove iterative scheduling of each consecutive callback in their implementation.
Bug description:
The _callback_iteration method performs the following tasks:
Submits the callback for execution to the ThreadPoolExecutor
Schedules the next iteration of the callback
Both tasks run in parallel in separate threads.
During the scheduling of the next iteration, the method relies on a start_timestamp, which is initially set when the callback instance is created. On the first execution of the callback, this timestamp is updated to the current time.
Since callback execution and iteration scheduling occur in parallel, there is a possibility of reading the original start_timestamp (set during instance creation) instead of its updated value after the first execution.
If the difference between the initial start_timestamp and the current time exceeds the configured interval between callback iterations, the next iteration might be scheduled in the past. This situation causes the callback method to execute immediately, which is unintended.