-
Notifications
You must be signed in to change notification settings - Fork 369
[Feature Request] Dynamically Update Environment Parameters (and Reset Transforms) in MultiaSyncDataCollector Workers #2896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello! cc @Darktex RE discussion on how to handle the weight sync within the collector |
Hi, @vmoens Thank you for your response—I’m really encouraged to hear that you’re working on a weight updater API. This approach seems promising in avoiding the need to shut down and restart the entire MultiaSyncDataCollector when updating environment configurations. I’d also like to share some additional performance data I gathered with different worker counts. For clarity, I’ve organized the numbers into the following table:
One aspect I find somewhat puzzling is that when using a higher number of workers, the data collection step after switching the environment takes noticeably longer. I would have expected better scalability with more workers. This could indicate some additional overheads—perhaps related to inter-worker communication, synchronization, or resource contention—that might be affecting performance. I also want to clarify that these experimental results were obtained by forcibly injecting messages into the workers’ communication channels. Here’s a brief explanation of the approach I used:
A few points I’d like to clarify further:
Could you also share any additional details on the API’s progress or expected timeline, or point me to any early documentation/examples? I’m eager to test these changes and provide further feedback if needed. Thank you again for your hard work on addressing these issues—I look forward to hearing more! |
Motivation
Currently, training paradigms that require dynamically changing environment configuration parameters during a single training run (e.g., curriculum learning, adaptive difficulty, switching between environment variants/tasks within a bandit framework) face significant performance bottlenecks when using
MultiaSyncDataCollector
.My specific problem is that I need to update a configuration parameter within the custom Gymnasium environment instances running on the worker processes periodically. The only reliable way to achieve this currently is to shut down the entire
MultiaSyncDataCollector
and create a new one with an updatedcreate_env_fn
. This process incurs a substantial time cost (tens of seconds per configuration switch in my case), making training loops with frequent updates impractically slow. I'm frustrated because the core computation (environment steps and policy inference) is fast, but the infrastructure management (process shutdown/restart) dominates the wall-clock time during these transitions.Solution
I propose the addition of a mechanism within
torchrl
, specifically forMultiaSyncDataCollector
(and potentially other parallel collectors), that allows users to:env.update_config(new_param=value)
method).ObservationNorm
,Compose
, etc.) within the worker environments to re-initialize their state based on the new environment configuration. This might involve allowing users to specify which transform methods (e.g.,transform.init_stats()
, a customtransform.reset_state()
) should be called after the environment parameters are updated.This would allow for efficient, in-place updates of the environment setup across all workers, eliminating the costly shutdown/restart cycle.
Alternatives
collector.shutdown()
and instantiate a newMultiaSyncDataCollector
with the updated configuration increate_env_fn
.collector.reset()
with Arguments: I attempted to pass the new parameter viacollector.reset(my_param=new_value)
, hoping it would be forwarded toenv.reset()
in the workers.TypeError: reset() got an unexpected keyword argument 'my_param'
) and, more importantly, it wouldn't address the re-initialization requirement for stateful transforms likeObservationNorm
.collector.pipes
: One could theoretically try sending custom messages throughcollector.pipes
to the workers.torchrl
's internal worker loop logic to handle these custom messages, making it brittle, hard to maintain, and breaking encapsulation. It also requires careful handling of synchronization and transform state.Additional context
The need for dynamic updates is particularly relevant for complex training procedures where the environment's characteristics change over time based on agent progress or predefined schedules. The correct handling of stateful transforms (
ObservationNorm
being a key example) is essential for stability, as using stale normalization statistics after a parameter change can lead to incorrect observations and poor learning. The error messageTypeError: MultiaSyncDataCollector.reset() got an unexpected keyword argument '...'
confirms the limitation of the currentreset
approach for passing parameters.Checklist
The text was updated successfully, but these errors were encountered: