Skip to content

feat: add periodic status logging and status_message_callback parameter for customization #1265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 2, 2025

Conversation

Mantisus
Copy link
Collaborator

Description

  • Add periodic logging of the current crawler status
  • Add status_message_logging_interval parameter to configure the logging interval
  • Add status_message_callback parameter to allow custom status message handling
  • Add a new CRAWLER_STATUS Event that emits status messages through EventManager

Issues

@Mantisus Mantisus self-assigned this Jun 24, 2025
@Mantisus Mantisus requested a review from Copilot June 24, 2025 00:17
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces periodic status logging and the ability to customize status messages via a new callback parameter. Key changes include:

  • Adding the status_message_callback and status_message_logging_interval parameters to the BasicCrawler.
  • Emitting a new CRAWLER_STATUS event and associated data type.
  • Extending the event manager and updating logging configuration to support custom log levels.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

File Description
tests/unit/crawlers/_basic/test_basic_crawler.py Added tests for the new status message callback and event emission.
src/crawlee/events/_types.py, src/crawlee/events/_event_manager.py, src/crawlee/events/init.py Introduced and exposed the new EventCrawlerStatusData and related overloads.
src/crawlee/crawlers/_basic/_basic_crawler.py Integrated periodic status logging via a RecurringTask and configurable callback along with corresponding documentation updates.
src/crawlee/_log_config.py Added a helper function (string_to_log_level) and updated log level configuration.
Comments suppressed due to low confidence (2)

src/crawlee/crawlers/_basic/_basic_crawler.py:1590

  • The docstring for 'status_message_callback' suggests that the callback should call 'crawler.setStatusMessage()' explicitly, but in _crawler_state_task the callback is invoked without any further logging. Consider clarifying the documentation or adjusting the implementation to ensure consistent behavior.
        if self._status_message_callback:

tests/unit/crawlers/_basic/test_basic_crawler.py:1429

  • [nitpick] The test defines a synchronous 'status_callback', but if real-world usage may require asynchronous processing for the status callback, it might be beneficial to either update the callback's signature or document the expectation clearly.
    def status_callback(state: StatisticsState, previous_state: StatisticsState | None, message: str) -> None:

@Mantisus Mantisus requested review from janbuchar and vdusek June 24, 2025 00:33
@Mantisus Mantisus requested a review from janbuchar June 24, 2025 23:03
@janbuchar janbuchar self-requested a review June 25, 2025 12:48
@Mantisus Mantisus requested a review from vdusek June 27, 2025 22:47
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please resolve conflicts?

Mantisus and others added 2 commits July 2, 2025 11:12
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
@Mantisus Mantisus requested a review from vdusek July 2, 2025 08:22
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last comment, otherwise LGTM

@vdusek vdusek merged commit b992fb2 into apify:master Jul 2, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BasicCrawler status logging
3 participants