Fix typos

co63oc · co63oc · commit e113669f8e42 · 2023-05-31T09:44:33.000+08:00
diff --git a/docs/advanced_guide/performance_improving/amp/amp.md b/docs/advanced_guide/performance_improving/amp/amp.md
@@ -22,7 +22,7 @@ Automatic Mixed Precision (AMP) 是一种自动混合使用半精度（FP16）
 
 如前文所述，使用 FP16 数据类型可能会造成计算精度上的损失，但对深度学习领域而言，并不是所有计算都要求很高的精度，一些局部的精度损失对最终训练效果影响很微弱，却能使吞吐和训练速度带来大幅提升。因此，混合精度计算的需求应运而生。具体而言，训练过程中将一些对精度损失不敏感且能利用 Tensor Cores 进行加速的运算使用半精度处理，而对精度损失敏感部分依然保持 FP32 计算精度，用以最大限度提升访存和计算效率。
 
-为了避免对每个具体模型人工地去设计和尝试精度混合的方法，PaddlePaadle 框架提供自动混合精度训练（AMP）功能，解放"炼丹师"的双手。在 PaddlePaddle 中使用 AMP 训练是一件十分容易的事情，用户只需要增加一行代码即可将原有的 FP32 训练转变为 AMP 训练。下面以`MNIST`为例介绍 PaddlePaddle AMP 功能的使用示例。
+为了避免对每个具体模型人工地去设计和尝试精度混合的方法，PaddlePaddle 框架提供自动混合精度训练（AMP）功能，解放"炼丹师"的双手。在 PaddlePaddle 中使用 AMP 训练是一件十分容易的事情，用户只需要增加一行代码即可将原有的 FP32 训练转变为 AMP 训练。下面以`MNIST`为例介绍 PaddlePaddle AMP 功能的使用示例。
 
 **MNIST 网络定义**
 
diff --git a/docs/advanced_guide/performance_improving/inference_improving/paddle_xpu_infer_cn.md b/docs/advanced_guide/performance_improving/inference_improving/paddle_xpu_infer_cn.md
@@ -73,11 +73,11 @@ def main():
     predictor = create_predictor(config)
 
     input_names = predictor.get_input_names()
-    input_hanlde = predictor.get_input_handle(input_names[0])
+    input_handle = predictor.get_input_handle(input_names[0])
 
     fake_input = np.ones((args.batch_size, 3, 224, 224)).astype("float32")
-    input_hanlde.reshape([args.batch_size, 3, 224, 224])
-    input_hanlde.copy_from_cpu(fake_input)
+    input_handle.reshape([args.batch_size, 3, 224, 224])
+    input_handle.copy_from_cpu(fake_input)
 
     for i in range(args.warmup):
       predictor.run()
diff --git a/docs/api_guides/low_level/distributed/cluster_train_data_en.rst b/docs/api_guides/low_level/distributed/cluster_train_data_en.rst
@@ -59,6 +59,6 @@ After the data is split, you can define a file_dispatcher function that determin
         trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
         files_pattern = "cluster/housing.data.*"
 
-        my_files = file_dispatcher(files_pattern, triners, trainer_id)
+        my_files = file_dispatcher(files_pattern, trainers, trainer_id)
 
 In the example above, `files_pattern` is a `glob expression <https://docs.python.org/2.7/library/glob.html>`_ of the training file and can generally be represented by a wildcard.
diff --git a/docs/api_guides/low_level/layers/detection.rst b/docs/api_guides/low_level/layers/detection.rst
@@ -65,7 +65,7 @@ SSD
 
 * multi_box_head ：得到不同 prior box 的位置和置信度。API Reference 请参考 :ref:`cn_api_fluid_layers_multi_box_head`
 
-* detection_output：对 prioir box 解码，通过多分类 NMS 得到检测结果。API Reference 请参考 :ref:`cn_api_fluid_layers_detection_output`
+* detection_output：对 prior box 解码，通过多分类 NMS 得到检测结果。API Reference 请参考 :ref:`cn_api_fluid_layers_detection_output`
 
 * ssd_loss：通过位置偏移预测值，置信度，检测框位置和真实框位置和标签计算损失。API Reference 请参考 :ref:`cn_api_fluid_layers_ssd_loss`
 
diff --git a/docs/api_guides/low_level/layers/learning_rate_scheduler_en.rst b/docs/api_guides/low_level/layers/learning_rate_scheduler_en.rst
@@ -39,8 +39,8 @@ The following content describes the APIs related to the learning rate scheduler:
 
 * :code:`LambdaDecay`: Decay the learning rate by lambda function. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_LambdaDecay`
 
-* :code:`ReduceOnPlateau`: Adjuge the learning rate according to monitoring index(In general, it's loss), and decay the learning rate when monitoring index becomes stable. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_ReduceOnPlateau`
+* :code:`ReduceOnPlateau`: Adjust the learning rate according to monitoring index(In general, it's loss), and decay the learning rate when monitoring index becomes stable. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_ReduceOnPlateau`
 
 * :code:`OneCycleLR`: One cycle decay. That is, the initial learning rate first increases to maximum learning rate, and then it decreases to minimum learning rate which is much less than initial learning rate. For related API Reference please refer to :ref:`cn_api_paddle_optimizer_lr_OneCycleLR`
 
-* :code:`CyclicLR`: Cyclic decay. That is, the learning rate cycles between minimum and maximum learning rate with a constant frequency in specified a sacle method. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_CyclicLR`
+* :code:`CyclicLR`: Cyclic decay. That is, the learning rate cycles between minimum and maximum learning rate with a constant frequency in specified a scale method. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_CyclicLR`
diff --git a/docs/design/concepts/cpp_data_feeding.md b/docs/design/concepts/cpp_data_feeding.md
@@ -163,7 +163,7 @@ while_op(not_completed) {
     } else {
         reset_op(double_buffer_reader)
         increase_op(pass_count)
-        not_completed = less_than_op(pass_count, reqiured_pass_num)
+        not_completed = less_than_op(pass_count, required_pass_num)
     }
 }
 ```
diff --git a/docs/design/concepts/functions_operators_layers.md b/docs/design/concepts/functions_operators_layers.md
@@ -71,7 +71,7 @@ def layer.fc(X):
     return operator.add(operator.mul(X, W), b)
 ```
 
-If we don't have `operator.mul` and `operator.add`, the definiton of `layer.fc` would be complicated:
+If we don't have `operator.mul` and `operator.add`, the definition of `layer.fc` would be complicated:
 
 ```python
 def layer.fc(X):
diff --git a/docs/design/concepts/index.html b/docs/design/concepts/index.html
@@ -199,7 +199,7 @@
 
 Here's a brief comparison between Godep and Glide
 : https://github.com/Masterminds/glide/wiki/Go-Package-Manager-Comparison. There are
-also many complaints about using `Godep`. There's also a new "official" pakcage
+also many complaints about using `Godep`. There's also a new "official" package
 management tool has been started at: https://github.com/golang/dep to resolve
 such problems, but it's currently at Alpha stage. So the best choice now is
 glide obviously.
diff --git a/docs/design/concepts/parallel_executor.md b/docs/design/concepts/parallel_executor.md
@@ -37,7 +37,7 @@ The performance of `ResNeXt152` on `TitanX` which `batch_size=12` is shown below
 
 [Static single assignment form](https://en.wikipedia.org/wiki/Static_single_assignment_form)(`SSA` for short) is a common form for compiler optimization. To implement concurrent execution, we uses an `SSA` graph as an intermedia representation of `ProgramDesc`.
 
-The `Program` is a directed acyclic graph, since a variable can be assigned multiple times. We enforce a variable will be assigned once, by adding version number to varaibles. We parsing the `Program` into a `SSA` graph. Also, ProgramExecutor duplicate `Program` into multi-devices. We also add a device number to varaibles and insert `NCCLAllReduce` into Graph.
+The `Program` is a directed acyclic graph, since a variable can be assigned multiple times. We enforce a variable will be assigned once, by adding version number to variables. We parsing the `Program` into a `SSA` graph. Also, ProgramExecutor duplicate `Program` into multi-devices. We also add a device number to variables and insert `NCCLAllReduce` into Graph.
 
 The data structure of `SSA` graph is:
 
@@ -94,11 +94,11 @@ The `wait` are implemented by two strategies:
 1. Invoke `DeviceContext->Wait()`, It will wait all operators on this device contexts complete.
 2. Uses `cudaStreamWaitEvent` to sending a event to the stream. It is a non-blocking call. The wait operators will be executed in GPU.
 
-Generally, the `cudaStreamWaitEvent` will have a better perforamnce. However, `DeviceContext->Wait()` strategy is easier to debug. The strategy can be changed in runtime.
+Generally, the `cudaStreamWaitEvent` will have a better performance. However, `DeviceContext->Wait()` strategy is easier to debug. The strategy can be changed in runtime.
 
 ## What's next?
 
 * Merging gradient of dense parameters has been done. However, the merging of sparse parameters has not been done.
-* The CPU version of Parallel Executor has not been implemented. The out-of-order logic will make CPU compuatation faster, too.
+* The CPU version of Parallel Executor has not been implemented. The out-of-order logic will make CPU computation faster, too.
 * A better strategy to merge gradients can be introduced. We can shrink the gradients from `float32` to `int8` or `int4` while merging. It will significantly speed up multi-GPUs training without much loss of precision.
-* Combine multi-Nodes implementation. By the benifit of out-of-order, sending and recving operator can be an blocking operator, and the transpiler does not need to concern about the best position of operator.
+* Combine multi-Nodes implementation. By the benefit of out-of-order, sending and receiving operator can be an blocking operator, and the transpiler does not need to concern about the best position of operator.
diff --git a/docs/design/concepts/tensor_array.md b/docs/design/concepts/tensor_array.md
@@ -75,7 +75,7 @@ Segmenting the `LoDTensor` is much more complicated than splitting a tensor, tha
 As the next step in RNN support, `dynamic_recurrent_op` should be introduced to handle inputs with variable-length sequences.
 
 The implementation is similar to `recurrent_op`.
-The key difference is the way **the original input `LoDTensors` and outupts are split to get the `input_segments` and the `output_segments`.**
+The key difference is the way **the original input `LoDTensors` and outputs are split to get the `input_segments` and the `output_segments`.**
 
 
 Though it can't be built over `recurrent_op` or `dynamic_recurrent_op` directly,

Original file line number	Diff line number	Diff line change
`@@ -163,7 +163,7 @@ while_op(not_completed) {`
`163`	`163`	`} else {`
`164`	`164`	`reset_op(double_buffer_reader)`
`165`	`165`	`increase_op(pass_count)`
`166`		`- not_completed = less_than_op(pass_count, reqiured_pass_num)`
	`166`	`+ not_completed = less_than_op(pass_count, required_pass_num)`
`167`	`167`	`}`
`168`	`168`	`}`
`169`	`169`	```