-
Notifications
You must be signed in to change notification settings - Fork 103
Open
Description
Hello,
How does evaluation_deterministic_actions in the experiment config affect the agents' behaviour during evaluation phase?
- My understanding of policy-based algorithms is that they learn a policy, and they sample actions from the policy. How can they deterministically choose an action from a random distribution? Do they deterministically pick the action which has the highest probability?
- For value-based algorithms like IQL, if I turn off evaluation_deterministic_actions, does that mean IQL will still use the e-greedy scheme to choose actions at evaluation phase?
- Is it possible to make IQL behave according to some other randomization scheme other than e-greedy during training and/or evaluation?
Thank you.
Metadata
Metadata
Assignees
Labels
No labels