Probabilistic actors during evaluation

Hello,

How does evaluation_deterministic_actions in the experiment config affect the agents' behaviour during evaluation phase?

1. My understanding of policy-based algorithms is that they learn a policy, and they sample actions from the policy. How can they deterministically choose an action from a random distribution? Do they deterministically pick the action which has the highest probability?
2. For value-based algorithms like IQL, if I turn off evaluation_deterministic_actions, does that mean IQL will still use the e-greedy scheme to choose actions at evaluation phase?
3. Is it possible to make IQL behave according to some other randomization scheme other than e-greedy during training and/or evaluation?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Probabilistic actors during evaluation #208

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Probabilistic actors during evaluation #208

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions