Skip to content

Probabilistic actors during evaluation #208

@Fool-Yang

Description

@Fool-Yang

Hello,

How does evaluation_deterministic_actions in the experiment config affect the agents' behaviour during evaluation phase?

  1. My understanding of policy-based algorithms is that they learn a policy, and they sample actions from the policy. How can they deterministically choose an action from a random distribution? Do they deterministically pick the action which has the highest probability?
  2. For value-based algorithms like IQL, if I turn off evaluation_deterministic_actions, does that mean IQL will still use the e-greedy scheme to choose actions at evaluation phase?
  3. Is it possible to make IQL behave according to some other randomization scheme other than e-greedy during training and/or evaluation?

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions