Skip to content

Commit e113669

Browse files
committed
Fix typos
1 parent c599bdf commit e113669

File tree

10 files changed

+16
-16
lines changed

10 files changed

+16
-16
lines changed

docs/advanced_guide/performance_improving/amp/amp.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Automatic Mixed Precision (AMP) 是一种自动混合使用半精度(FP16)
2222

2323
如前文所述,使用 FP16 数据类型可能会造成计算精度上的损失,但对深度学习领域而言,并不是所有计算都要求很高的精度,一些局部的精度损失对最终训练效果影响很微弱,却能使吞吐和训练速度带来大幅提升。因此,混合精度计算的需求应运而生。具体而言,训练过程中将一些对精度损失不敏感且能利用 Tensor Cores 进行加速的运算使用半精度处理,而对精度损失敏感部分依然保持 FP32 计算精度,用以最大限度提升访存和计算效率。
2424

25-
为了避免对每个具体模型人工地去设计和尝试精度混合的方法,PaddlePaadle 框架提供自动混合精度训练(AMP)功能,解放"炼丹师"的双手。在 PaddlePaddle 中使用 AMP 训练是一件十分容易的事情,用户只需要增加一行代码即可将原有的 FP32 训练转变为 AMP 训练。下面以`MNIST`为例介绍 PaddlePaddle AMP 功能的使用示例。
25+
为了避免对每个具体模型人工地去设计和尝试精度混合的方法,PaddlePaddle 框架提供自动混合精度训练(AMP)功能,解放"炼丹师"的双手。在 PaddlePaddle 中使用 AMP 训练是一件十分容易的事情,用户只需要增加一行代码即可将原有的 FP32 训练转变为 AMP 训练。下面以`MNIST`为例介绍 PaddlePaddle AMP 功能的使用示例。
2626

2727
**MNIST 网络定义**
2828

docs/advanced_guide/performance_improving/inference_improving/paddle_xpu_infer_cn.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,11 +73,11 @@ def main():
7373
predictor = create_predictor(config)
7474
7575
input_names = predictor.get_input_names()
76-
input_hanlde = predictor.get_input_handle(input_names[0])
76+
input_handle = predictor.get_input_handle(input_names[0])
7777
7878
fake_input = np.ones((args.batch_size, 3, 224, 224)).astype("float32")
79-
input_hanlde.reshape([args.batch_size, 3, 224, 224])
80-
input_hanlde.copy_from_cpu(fake_input)
79+
input_handle.reshape([args.batch_size, 3, 224, 224])
80+
input_handle.copy_from_cpu(fake_input)
8181
8282
for i in range(args.warmup):
8383
predictor.run()

docs/api_guides/low_level/distributed/cluster_train_data_en.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,6 @@ After the data is split, you can define a file_dispatcher function that determin
5959
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
6060
files_pattern = "cluster/housing.data.*"
6161
62-
my_files = file_dispatcher(files_pattern, triners, trainer_id)
62+
my_files = file_dispatcher(files_pattern, trainers, trainer_id)
6363
6464
In the example above, `files_pattern` is a `glob expression <https://docs.python.org/2.7/library/glob.html>`_ of the training file and can generally be represented by a wildcard.

docs/api_guides/low_level/layers/detection.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ SSD
6565

6666
* multi_box_head :得到不同 prior box 的位置和置信度。API Reference 请参考 :ref:`cn_api_fluid_layers_multi_box_head`
6767

68-
* detection_output:对 prioir box 解码,通过多分类 NMS 得到检测结果。API Reference 请参考 :ref:`cn_api_fluid_layers_detection_output`
68+
* detection_output:对 prior box 解码,通过多分类 NMS 得到检测结果。API Reference 请参考 :ref:`cn_api_fluid_layers_detection_output`
6969

7070
* ssd_loss:通过位置偏移预测值,置信度,检测框位置和真实框位置和标签计算损失。API Reference 请参考 :ref:`cn_api_fluid_layers_ssd_loss`
7171

docs/api_guides/low_level/layers/learning_rate_scheduler_en.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,8 @@ The following content describes the APIs related to the learning rate scheduler:
3939

4040
* :code:`LambdaDecay`: Decay the learning rate by lambda function. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_LambdaDecay`
4141

42-
* :code:`ReduceOnPlateau`: Adjuge the learning rate according to monitoring index(In general, it's loss), and decay the learning rate when monitoring index becomes stable. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_ReduceOnPlateau`
42+
* :code:`ReduceOnPlateau`: Adjust the learning rate according to monitoring index(In general, it's loss), and decay the learning rate when monitoring index becomes stable. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_ReduceOnPlateau`
4343

4444
* :code:`OneCycleLR`: One cycle decay. That is, the initial learning rate first increases to maximum learning rate, and then it decreases to minimum learning rate which is much less than initial learning rate. For related API Reference please refer to :ref:`cn_api_paddle_optimizer_lr_OneCycleLR`
4545

46-
* :code:`CyclicLR`: Cyclic decay. That is, the learning rate cycles between minimum and maximum learning rate with a constant frequency in specified a sacle method. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_CyclicLR`
46+
* :code:`CyclicLR`: Cyclic decay. That is, the learning rate cycles between minimum and maximum learning rate with a constant frequency in specified a scale method. For related API Reference please refer to :ref:`api_paddle_optimizer_lr_CyclicLR`

docs/design/concepts/cpp_data_feeding.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,7 @@ while_op(not_completed) {
163163
} else {
164164
reset_op(double_buffer_reader)
165165
increase_op(pass_count)
166-
not_completed = less_than_op(pass_count, reqiured_pass_num)
166+
not_completed = less_than_op(pass_count, required_pass_num)
167167
}
168168
}
169169
```

docs/design/concepts/functions_operators_layers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ def layer.fc(X):
7171
return operator.add(operator.mul(X, W), b)
7272
```
7373

74-
If we don't have `operator.mul` and `operator.add`, the definiton of `layer.fc` would be complicated:
74+
If we don't have `operator.mul` and `operator.add`, the definition of `layer.fc` would be complicated:
7575

7676
```python
7777
def layer.fc(X):

docs/design/concepts/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@
199199

200200
Here's a brief comparison between Godep and Glide
201201
: https://github.com/Masterminds/glide/wiki/Go-Package-Manager-Comparison. There are
202-
also many complaints about using `Godep`. There's also a new "official" pakcage
202+
also many complaints about using `Godep`. There's also a new "official" package
203203
management tool has been started at: https://github.com/golang/dep to resolve
204204
such problems, but it's currently at Alpha stage. So the best choice now is
205205
glide obviously.

docs/design/concepts/parallel_executor.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ The performance of `ResNeXt152` on `TitanX` which `batch_size=12` is shown below
3737

3838
[Static single assignment form](https://en.wikipedia.org/wiki/Static_single_assignment_form)(`SSA` for short) is a common form for compiler optimization. To implement concurrent execution, we uses an `SSA` graph as an intermedia representation of `ProgramDesc`.
3939

40-
The `Program` is a directed acyclic graph, since a variable can be assigned multiple times. We enforce a variable will be assigned once, by adding version number to varaibles. We parsing the `Program` into a `SSA` graph. Also, ProgramExecutor duplicate `Program` into multi-devices. We also add a device number to varaibles and insert `NCCLAllReduce` into Graph.
40+
The `Program` is a directed acyclic graph, since a variable can be assigned multiple times. We enforce a variable will be assigned once, by adding version number to variables. We parsing the `Program` into a `SSA` graph. Also, ProgramExecutor duplicate `Program` into multi-devices. We also add a device number to variables and insert `NCCLAllReduce` into Graph.
4141

4242
The data structure of `SSA` graph is:
4343

@@ -94,11 +94,11 @@ The `wait` are implemented by two strategies:
9494
1. Invoke `DeviceContext->Wait()`, It will wait all operators on this device contexts complete.
9595
2. Uses `cudaStreamWaitEvent` to sending a event to the stream. It is a non-blocking call. The wait operators will be executed in GPU.
9696
97-
Generally, the `cudaStreamWaitEvent` will have a better perforamnce. However, `DeviceContext->Wait()` strategy is easier to debug. The strategy can be changed in runtime.
97+
Generally, the `cudaStreamWaitEvent` will have a better performance. However, `DeviceContext->Wait()` strategy is easier to debug. The strategy can be changed in runtime.
9898
9999
## What's next?
100100
101101
* Merging gradient of dense parameters has been done. However, the merging of sparse parameters has not been done.
102-
* The CPU version of Parallel Executor has not been implemented. The out-of-order logic will make CPU compuatation faster, too.
102+
* The CPU version of Parallel Executor has not been implemented. The out-of-order logic will make CPU computation faster, too.
103103
* A better strategy to merge gradients can be introduced. We can shrink the gradients from `float32` to `int8` or `int4` while merging. It will significantly speed up multi-GPUs training without much loss of precision.
104-
* Combine multi-Nodes implementation. By the benifit of out-of-order, sending and recving operator can be an blocking operator, and the transpiler does not need to concern about the best position of operator.
104+
* Combine multi-Nodes implementation. By the benefit of out-of-order, sending and receiving operator can be an blocking operator, and the transpiler does not need to concern about the best position of operator.

docs/design/concepts/tensor_array.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ Segmenting the `LoDTensor` is much more complicated than splitting a tensor, tha
7575
As the next step in RNN support, `dynamic_recurrent_op` should be introduced to handle inputs with variable-length sequences.
7676

7777
The implementation is similar to `recurrent_op`.
78-
The key difference is the way **the original input `LoDTensors` and outupts are split to get the `input_segments` and the `output_segments`.**
78+
The key difference is the way **the original input `LoDTensors` and outputs are split to get the `input_segments` and the `output_segments`.**
7979

8080

8181
Though it can't be built over `recurrent_op` or `dynamic_recurrent_op` directly,

0 commit comments

Comments
 (0)