-
Notifications
You must be signed in to change notification settings - Fork 102
Description
I'm using BenchMARL for my project with a custom environment and custom actor/critic models. My setup uses MASAC algorithm with continuous, centralized environment containing 3 agents in a single group with share_param_critic=False (each agent has its own critic).
In my critic model design, each critic needs to see all agents' states and actions. My critic has 3 input heads:
Head 1: Current agent's state + action
Head 2: Other agent's state + action
Head 3: Other agent's state + action
For this to work correctly, I need to know the exact ordering of data passed to the forward function.
I receive global_action through keys sent by MASAC and construct global_state from observations (following MPE example pattern).
My critical question: Does the data ordering remain consistent throughout the algorithm pipeline (algorithm → buffer → loss computation in TorchRL)?
Expected ordering:
[act0, act1, act2] for actions
[obs0, obs1, obs2] for observations
I attempted to test this ordering by creating a fixed-action actor and fixed-state task, but couldn't reach a definitive answer. Since this is critical for my project's correctness, I decided to ask the team directly.
Environment Details
Algorithm: MASAC
Environment: Custom continuous, centralized
Agents: 3 agents, single group
Critic: Individual critics (share_param_critic=False)
Architecture: Each critic processes global state + global actions
Specific Request
Can you confirm that the data ordering [agent0, agent1, agent2] remains consistent throughout the entire training pipeline, or does it change at any point during algorithm execution, buffer storage, or loss computation?
Thank you for your attention and cooperation.