[Minor] typo in "Multi-Agent Reinforcement Learning (PPO) with TorchRL Tutorial" docs (#2938)

sendhil · web-flow · commit 795e362cb82b · 2025-05-02T22:03:16.000+01:00
diff --git a/tutorials/sphinx-tutorials/multiagent_ppo.py b/tutorials/sphinx-tutorials/multiagent_ppo.py
@@ -57,7 +57,7 @@
 #
 # This type of algorithms is usually trained *on-policy*. This means that, at every learning iteration, we have a
 # **sampling** and a **training** phase. In the **sampling** phase of iteration :math:`t`, rollouts are collected
-# form agents' interactions in the environment using the current policies :math:`\mathbf{\pi}_t`.
+# from agents' interactions in the environment using the current policies :math:`\mathbf{\pi}_t`.
 # In the **training** phase, all the collected rollouts are immediately fed to the training process to perform
 # backpropagation. This leads to updated policies which are then used again for sampling.
 # The execution of this process in a loop constitutes *on-policy learning*.

Original file line number	Diff line number	Diff line change
`@@ -57,7 +57,7 @@`
`57`	`57`	`#`
`58`	`58`	`# This type of algorithms is usually trained on-policy. This means that, at every learning iteration, we have a`
`59`	`59`	# sampling and a training phase. In the sampling phase of iteration :math:`t`, rollouts are collected
`60`		-# form agents' interactions in the environment using the current policies :math:`\mathbf{\pi}_t`.
	`60`	+# from agents' interactions in the environment using the current policies :math:`\mathbf{\pi}_t`.
`61`	`61`	`# In the training phase, all the collected rollouts are immediately fed to the training process to perform`
`62`	`62`	`# backpropagation. This leads to updated policies which are then used again for sampling.`
`63`	`63`	`# The execution of this process in a loop constitutes on-policy learning.`