-
Notifications
You must be signed in to change notification settings - Fork 419
feat: add periodic status logging and status_message_callback
parameter for customization
#1265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces periodic status logging and the ability to customize status messages via a new callback parameter. Key changes include:
- Adding the status_message_callback and status_message_logging_interval parameters to the BasicCrawler.
- Emitting a new CRAWLER_STATUS event and associated data type.
- Extending the event manager and updating logging configuration to support custom log levels.
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
File | Description |
---|---|
tests/unit/crawlers/_basic/test_basic_crawler.py | Added tests for the new status message callback and event emission. |
src/crawlee/events/_types.py, src/crawlee/events/_event_manager.py, src/crawlee/events/init.py | Introduced and exposed the new EventCrawlerStatusData and related overloads. |
src/crawlee/crawlers/_basic/_basic_crawler.py | Integrated periodic status logging via a RecurringTask and configurable callback along with corresponding documentation updates. |
src/crawlee/_log_config.py | Added a helper function (string_to_log_level) and updated log level configuration. |
Comments suppressed due to low confidence (2)
src/crawlee/crawlers/_basic/_basic_crawler.py:1590
- The docstring for 'status_message_callback' suggests that the callback should call 'crawler.setStatusMessage()' explicitly, but in _crawler_state_task the callback is invoked without any further logging. Consider clarifying the documentation or adjusting the implementation to ensure consistent behavior.
if self._status_message_callback:
tests/unit/crawlers/_basic/test_basic_crawler.py:1429
- [nitpick] The test defines a synchronous 'status_callback', but if real-world usage may require asynchronous processing for the status callback, it might be beneficial to either update the callback's signature or document the expectation clearly.
def status_callback(state: StatisticsState, previous_state: StatisticsState | None, message: str) -> None:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please resolve conflicts?
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last comment, otherwise LGTM
Description
status_message_logging_interval
parameter to configure the logging intervalstatus_message_callback
parameter to allow custom status message handlingCRAWLER_STATUS
Event
that emits status messages throughEventManager
Issues