Scheduler refactor and callbacks' first iteration first. #128

DtDamianGrzes · 2025-07-25T18:21:47Z

Fixed callbacks' first iteration bug with scheduler refactor to use APScheduler.
While analyzing the bug I realized we could simplify the scheduler implementation and use APScheduler, which handles ThreadPoolExecutors on its own. Scheduling jobs in intervals allowed to remove iterative scheduling of each consecutive callback in their implementation.

Bug description:
The _callback_iteration method performs the following tasks:

Submits the callback for execution to the ThreadPoolExecutor

Schedules the next iteration of the callback

Both tasks run in parallel in separate threads.

During the scheduling of the next iteration, the method relies on a start_timestamp, which is initially set when the callback instance is created. On the first execution of the callback, this timestamp is updated to the current time.

Since callback execution and iteration scheduling occur in parallel, there is a possibility of reading the original start_timestamp (set during instance creation) instead of its updated value after the first execution.

If the difference between the initial start_timestamp and the current time exceeds the configured interval between callback iterations, the next iteration might be scheduled in the past. This situation causes the callback method to execute immediately, which is unintended.

DtDamianGrzes · 2025-07-25T18:25:42Z

dynatrace_extension/sdk/extension.py

        if not self._is_fastcheck:
            try:
-                self._heartbeat_iteration()
+                # TODO: is it surely okay to shedule hearbeat this way? Originaly it was scheduled in the very same scheduler, which would starve heartbeat if any callback took too long


I have some doubts about this part. I feel like the heartbeat should have the lowest priority, but it doesn't make much sense if most callbacks have been run in separate threads.
For now and for me it seems to be okayish implementation to the previous one, but I'm curious about your opinion.

Copilot

Pull Request Overview

This PR refactors the scheduler implementation to use APScheduler instead of the standard library's sched module, which also fixes a race condition bug where callbacks were being scheduled for execution based on a potentially stale timestamp during the first iteration.

Replaced custom scheduler with APScheduler BackgroundScheduler and IntervalTrigger
Fixed race condition in callback first iteration by removing manual timestamp tracking
Updated tests to use the new scheduler API and removed threading test dependencies

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
dynatrace_extension/sdk/extension.py	Core scheduler refactor replacing sched with APScheduler and removing manual timestamp management
dynatrace_extension/sdk/callback.py	Removed manual iteration tracking and timestamp calculation methods
dynatrace_extension/sdk/status.py	Added mockable datetime function for testing
tests/sdk/test_extension.py	Updated tests to use new scheduler API and removed threading dependencies
tests/sdk/test_status.py	Updated test setup/teardown and scheduler calls for new API
tests/sdk/test_endpoints_sfm.py	Replaced freezegun with manual datetime mocking and updated scheduler usage
pyproject.toml	Removed freezegun dependency

Comments suppressed due to low confidence (3)

tests/sdk/test_extension.py:138

This comment indicates the test expectation changed due to implementation details. Consider if this test is still validating the intended behavior or if it needs adjustment.

        # call_count expected to be 2 because scheduled `query` method used the original, nonmocked _run_callback

tests/sdk/test_extension.py:236

The test logic has changed significantly from the original version. Verify that this still properly tests the callback scheduling behavior it was intended to validate.

        self.assertEqual(extension._scheduled_callbacks[2].executions_total, 1)

…PScheduler

dlopes7 · 2025-08-11T15:07:22Z

I am concerned about this, it is a major rewrite of the scheduling for very little benefit.

It also introduces a dependency, we have agreed that the SDK should not have any dependencies, this can introduce conflicts for extension dependencies.

DtDamianGrzes · 2025-08-12T08:25:46Z

Hey @dlopes7 , thanks for the comment, I partly understand your doubts. Let's split those into to separate discussions maybe.

I am concerned about this, it is a major rewrite of the scheduling for very little benefit.
I agree that's not a little change, but I also believe that, looking closely, it's a simplification of what we already did. sched doesn't provide a way to schedule callbacks in intervals. We do that on our own, additionally manually implementing queuing, holding and calculating "next timestamps", delegating jobs to their executors (which required additional level of callbacks) and so on. Altogether I saw that as something a bit overcomplicated and possibly prone to bugs (as in the described example). While reading the current version it seemed to me that's quite a lot of manual implementation for something rather simple and I expected there should be some module that provides that in a ready-to-use fashion. APScheduler seems to do exactly what we did manually and is quite commonly used.
Regarding the risk of this change, I'd say that if any issues occurred after merging this commit I'd opt for reverting it and trying to fix the issues. That would require rebuilding of extensions though, so I'm not sure if it's a good approach.

Let me know how all of that resonates with you. I'm not going to fight for this PR just for a sake of what's been already implemented.

It also introduces a dependency, we have agreed that the SDK should not have any dependencies, this can introduce conflicts for extension dependencies.

That's for sure something to consider, but I'm not really familiar with discussions you had. Could you shed some more light on the topic?

Please add anyone you want from DXS if you'd like involve someone into the discussion.

dlopes7 · 2025-08-12T12:46:29Z

We had discussions with @PiotrMeller and @MaciejGrynda at the beginning of the SDK development. What happened in EF1 is that the dependencies that were included with the RemotePluginModule eventually caused conflicts with extension dependencies.

That is one of the reasons why we don't use "requests" for instance to make HTTP calls, that would not only increase the bundle size but also cause people to either NOT declare requests as a dependency of their extension, or be unable to pin a different version of requests, or even worse break the SDK because they introduce a specific version of requests.

This dependency (APScheduler) is not as wide used as requests, but we are breaking an interesting rule of having no dependencies on the SDK to fix a minor inconvenience (the timestamp race condition), which I believe could be fixed directly in the code.

And although APSCheduler fixes some things we do manually it also does a ton of other things we will never use, seems like an overkill for a simple issue

DtDamianGrzes · 2025-08-12T13:34:58Z

@dlopes7 , thanks for the answer. I agree with this reasoning and will close the PR after sync with team.

DtDamianGrzes requested review from Copilot and afurche July 25, 2025 18:22

This comment was marked as outdated.

Sign in to view

DtDamianGrzes commented Jul 25, 2025

View reviewed changes

DtDamianGrzes force-pushed the feature/scheduler_refactor_and_callbacks_fix branch from 3e822d9 to 41dea63 Compare July 25, 2025 18:27

dynatrace-extensions deleted a comment from Copilot AI Jul 25, 2025

DtDamianGrzes requested a review from Copilot July 25, 2025 18:28

Copilot AI reviewed Jul 25, 2025

View reviewed changes

Fixed callbacks' first iteration bug with scheduler refactor to use A…

4560f98

…PScheduler

DtDamianGrzes force-pushed the feature/scheduler_refactor_and_callbacks_fix branch from 41dea63 to 4560f98 Compare July 25, 2025 18:32

dynatrace-extensions deleted a comment from Copilot AI Jul 25, 2025

Added apscheduler to dependencies

345fea4

Linter fixes

3bcd629

DtDamianGrzes force-pushed the feature/scheduler_refactor_and_callbacks_fix branch from b359f8a to 3bcd629 Compare August 11, 2025 15:36

DtDamianGrzes requested review from dlopes7, MaciejGrynda and pczapran August 12, 2025 08:27

DtDamianGrzes closed this Aug 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scheduler refactor and callbacks' first iteration first. #128

Scheduler refactor and callbacks' first iteration first. #128

Uh oh!

DtDamianGrzes commented Jul 25, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

DtDamianGrzes Jul 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

dlopes7 commented Aug 11, 2025

Uh oh!

DtDamianGrzes commented Aug 12, 2025 •

edited

Loading

Uh oh!

dlopes7 commented Aug 12, 2025

Uh oh!

DtDamianGrzes commented Aug 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Scheduler refactor and callbacks' first iteration first. #128

Scheduler refactor and callbacks' first iteration first. #128

Uh oh!

Conversation

DtDamianGrzes commented Jul 25, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

DtDamianGrzes Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

dlopes7 commented Aug 11, 2025

Uh oh!

DtDamianGrzes commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dlopes7 commented Aug 12, 2025

Uh oh!

DtDamianGrzes commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DtDamianGrzes commented Aug 12, 2025 •

edited

Loading

DtDamianGrzes commented Aug 12, 2025 •

edited

Loading