-
Notifications
You must be signed in to change notification settings - Fork 102
Description
Hi, I'm trying to work on a simple multi-agent navigation task where agents need to reach a goal position.
I'm giving a little bit of context:
Whenever an agent reaches the desired goal, its done flag becomes true, and whenever an action is passed, I just skip this particular agent and I return a special observation with all -1 and a 0 reward.
I've noticed that in the tensordict generated by a rollout, there are multiple keys {a1,a2,..., done, termination, truncation}.
Since I'm not setting these global done, termination, and truncation keys manually, I was wondering how they are set. In my case, I would like done to be true only when all the agents reach the goal, i.e., {a1: {done==True},a2: {done==True} ...}
Is this already the case?
If not, is it possible to change it? I would like the episode to terminate only when all the agents reach the goal or when the max_timestep is reached.
I'm asking this because I saw that agents are able to reach the goal individually (by looking at my metrics), but they were not even once able to reach the goal at the same time