We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent 3dbd84c commit 795e362Copy full SHA for 795e362
tutorials/sphinx-tutorials/multiagent_ppo.py
@@ -57,7 +57,7 @@
57
#
58
# This type of algorithms is usually trained *on-policy*. This means that, at every learning iteration, we have a
59
# **sampling** and a **training** phase. In the **sampling** phase of iteration :math:`t`, rollouts are collected
60
-# form agents' interactions in the environment using the current policies :math:`\mathbf{\pi}_t`.
+# from agents' interactions in the environment using the current policies :math:`\mathbf{\pi}_t`.
61
# In the **training** phase, all the collected rollouts are immediately fed to the training process to perform
62
# backpropagation. This leads to updated policies which are then used again for sampling.
63
# The execution of this process in a loop constitutes *on-policy learning*.
0 commit comments