Skip to content

Fix: The Alphazero demo title #5737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Mar 16, 2023
Merged
4 changes: 2 additions & 2 deletions docs/practices/reinforcement_learning/AlphaZero.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -493,7 +493,7 @@
"\n",
"传统的AI博弈树搜索算法效率都很低,因为这些算法在做出最终选择前需要穷尽每一种走法。即便很少的分支因子的游戏,其每一步的搜索空间也会爆炸式增长。分支因子就是所有可能的走法的数量,这个数量会随着游戏的进行不断变化。因此,你可以试着计算一个游戏的平均分支因子数,国际象棋的平均分支因子是35,而围棋则是250。这意味着,在国际象棋中,仅走两步就有1,225(35²)种可能的棋面,而在围棋中,这个数字会变成62,500(250²)。因此,上述的价值策略神经网络将指导并告诉我们哪些博弈路径值得探索,从而避免被许多无用的搜索路径所淹没。再结合蒙特卡洛树选择最佳的走法。\n",
"\n",
"## 棋类游戏的蒙特卡洛树搜索(MCTS)\n",
"### 棋类游戏的蒙特卡洛树搜索(MCTS)\n",
"使用MCTS的具体做法是这样的,给定一个棋面,MCTS共进行N次模拟。主要的搜索阶段有4个:选择,扩展,仿真和回溯\n",
"\n",
"![](https://ai-studio-static-online.cdn.bcebos.com/73384055df364b44a49e7e206a9015790be7b3c0aa1942d0a4e57aa617fad087)\n",
Expand All @@ -507,7 +507,7 @@
"* 第四步是回溯 (backpropagation), 将我们最后得到的胜负结果回溯加到MCTS树结构上。注意除了之前的MCTS树要回溯外,新加入的节点也要加上一次胜负历史记录。\n",
"\n",
"以上就是MCTS搜索的整个过程。这4步一般是通用的,但是MCTS树结构上保存的内容而一般根据要解决的问题和建模的复杂度而不同。\n",
"## 基于神经网络的蒙特卡洛树搜索(MCTS)\n",
"### 基于神经网络的蒙特卡洛树搜索(MCTS)\n",
"N(s,a) :记录边的访问次数;\n",
"W(s,a): 合计行动价值;\n",
"Q(s,a) :平均行动价值;\n",
Expand Down