Skip to content

Commit db37981

Browse files
authored
Merge pull request #606 from luotao1/doc3
refine dataprovider related rst
2 parents 0d39b11 + 3d5060a commit db37981

8 files changed

+250
-313
lines changed
+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
DataProvider的介绍
2+
==================
3+
4+
DataProvider是PaddlePaddle负责提供数据的模块。其作用是将数据传入内存或显存,让神经网络可以进行训练或预测。用户可以通过简单使用Python接口 `PyDataProvider2 <pydataprovider2.html>`_ ,来自定义传数据的过程。如果有更复杂的使用,或者需要更高的效率,用户也可以在C++端自定义一个 ``DataProvider`` 。
5+
6+
PaddlePaddle需要用户在网络配置(trainer_config.py)中定义使用哪种DataProvider,并且在DataProvider中实现如何访问训练文件列表(train.list)或测试文件列表(test.list)。
7+
8+
- train.list和test.list存放在本地(推荐直接存放到训练目录,以相对路径引用)。一般情况下,两者均为纯文本文件,其中每一行对应一个数据文件地址:
9+
10+
- 如果数据文件存于本地磁盘,这个地址则为它的绝对路径或相对路径(相对于PaddlePaddle程序运行时的路径)。
11+
- 地址也可以为hdfs文件路径,或者数据库连接路径等。
12+
- 由于这个地址会被DataProvider使用,因此,如何解析该地址也是用户自定义DataProvider时需要考虑的地方。
13+
- 如果没有设置test.list,或设置为None,那么在训练过程中不会执行测试操作;否则,会根据命令行参数指定的测试方式,在训练过程中进行测试,从而防止过拟合。

doc_cn/ui/data_provider/index.rst

-17
This file was deleted.

doc_cn/ui/data_provider/mnist_config.py

+1
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,6 @@
55
test_list=None,
66
module='mnist_provider',
77
obj='process')
8+
89
img = data_layer(name='pixel', size=784)
910
label = data_layer(name='label', size=10)

doc_cn/ui/data_provider/mnist_provider.py

-22
This file was deleted.

doc_cn/ui/data_provider/pydataprovider2.rst

+227-257
Large diffs are not rendered by default.

doc_cn/ui/data_provider/sentimental_provider.py

+6-9
Original file line numberDiff line numberDiff line change
@@ -8,19 +8,16 @@ def on_init(settings, dictionary, **kwargs):
88

99
# set input types in runtime. It will do the same thing as
1010
# @provider(input_types) will do, but it is set dynamically during runtime.
11-
settings.input_types = [
11+
settings.input_types = {
1212
# The text is a sequence of integer values, and each value is a word id.
1313
# The whole sequence is the sentences that we want to predict its
1414
# sentimental.
15-
integer_value(
16-
len(dictionary), seq_type=SequenceType), # text input
15+
'data': integer_value_sequence(len(dictionary)), # text input
16+
'label': integer_value(2) # label positive/negative
17+
}
1718

18-
# label positive/negative
19-
integer_value(2)
20-
]
21-
22-
# save dictionary as settings.dictionary. It will be used in process
23-
# method.
19+
# save dictionary as settings.dictionary.
20+
# It will be used in process method.
2421
settings.dictionary = dictionary
2522

2623

doc_cn/ui/data_provider/write_new_dataprovider.rst

-4
This file was deleted.

doc_cn/ui/index.rst

+3-4
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
.. toctree::
99
:maxdepth: 1
1010

11-
data_provider/index.rst
12-
11+
data_provider/dataprovider.rst
12+
data_provider/pydataprovider2.rst
1313

1414
命令及命令行参数
1515
================
@@ -23,9 +23,8 @@
2323
* `参数分类 <../../doc/ui/cmd_argument/argument_outline.html>`_
2424
* `参数描述 <../../doc/ui/cmd_argument/detail_introduction.html>`_
2525

26-
2726
预测
28-
====
27+
=======
2928

3029
.. toctree::
3130
:maxdepth: 1

0 commit comments

Comments
 (0)