Skip to content

Commit 77b0bf4

Browse files
author
sweetsky0901
committed
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into cross_channel_norm
2 parents d18d7aa + ea5d6ea commit 77b0bf4

File tree

152 files changed

+1416
-877
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

152 files changed

+1416
-877
lines changed

CMakeLists.txt

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,6 @@ cmake_minimum_required(VERSION 3.0)
1616
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
1717
set(PADDLE_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})
1818
set(PADDLE_BINARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
19-
SET(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG")
20-
SET(CMAKE_C_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG")
2119

2220
include(system)
2321

@@ -201,6 +199,10 @@ if(WITH_GOLANG)
201199
endif(WITH_GOLANG)
202200

203201
set(PADDLE_PYTHON_BUILD_DIR "${CMAKE_CURRENT_BINARY_DIR}/python/build")
202+
203+
SET(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG")
204+
SET(CMAKE_C_FLAGS_RELWITHDEBINFO "-O3 -g -DNDEBUG")
205+
204206
add_subdirectory(paddle)
205207
if(WITH_PYTHON)
206208
add_subdirectory(python)

doc/design/block.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -291,10 +291,10 @@ public:
291291
}
292292
293293
void Run(const framework::Scope& scope,
294-
const platform::DeviceContext& dev_ctx) const override {
294+
const platform::Place& place) const override {
295295
PADDLE_ENFORCE(symbols_ready_, "operators and variables should be created first.");
296296
for (auto& op : runtime_table_.ops()) {
297-
op->Run(scope, dev_ctx);
297+
op->Run(scope, place);
298298
}
299299
}
300300

doc/design/support_new_device.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,14 @@ There are mainly three parts that we have to consider while integrating a new de
2525

2626
### Place and DeviceContext
2727

28+
Please remind that device and computing library are not one-to-one corresponding. A device can have a lot of computing libraries and a computing library can also support several devices.
2829

2930
#### Place
30-
Fluid uses class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent different devices and computing libraries. There are inheritance relationships between different kinds of `Place`.
31+
Fluid uses class [Place](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/place.h#L55) to represent the device memory where data is located. If we add another device, we have to add corresponding `DevicePlace`.
3132

3233
```
33-
| CPUPlace --> MKLDNNPlace
34-
Place --| CUDAPlace --> CUDNNPlace
34+
| CPUPlace
35+
Place --| CUDAPlace
3536
| FPGAPlace
3637
```
3738

@@ -43,7 +44,7 @@ typedef boost::variant<CUDAPlace, CPUPlace, FPGAPlace> Place;
4344

4445
#### DeviceContext
4546

46-
Fluid uses class [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30) to manage the resources in different hardwares, such as CUDA stream in `CDUADeviceContext`. There are also inheritance relationships between different kinds of `DeviceContext`.
47+
Fluid uses class [DeviceContext](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/device_context.h#L30) to manage the resources in different libraries, such as CUDA stream in `CDUADeviceContext`. There are also inheritance relationships between different kinds of `DeviceContext`.
4748

4849

4950
```
@@ -106,7 +107,7 @@ template <typename Place>
106107
size_t Used(Place place);
107108
```
108109

109-
To implementing these interfaces, we have to implement MemoryAllocator for different Devices
110+
To implement these interfaces, we have to implement MemoryAllocator for different Devices.
110111

111112

112113
#### Tensor
@@ -243,6 +244,7 @@ REGISTER_OP_CUDA_KERNEL(
243244
Generally, we will impelement OpKernel for all Device/Library of an Operator. We can easily train a Convolutional Neural Network in GPU. However, some OpKernel is not sutibale on a specific Device. For example, crf operator can only run on CPU, whereas most other operators can run at GPU. To achieve high performance in such circumstance, we have to switch between different Device/Library.
244245

245246

246-
We will discuss how to implement an efficient OpKernel switch policy.
247+
For more details, please refer to following docs:
247248

248-
- TBD
249+
- operator kernel type [doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md)
250+
- switch kernel [doc](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/switch_kernel.md)

paddle/framework/CMakeLists.txt

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ cc_test(op_proto_maker_test SRCS op_proto_maker_test.cc DEPS op_proto_maker)
3030
cc_library(op_info SRCS op_info.cc DEPS attribute framework_proto)
3131
cc_library(shape_inference SRCS shape_inference.cc DEPS ddim attribute)
3232
cc_library(operator SRCS operator.cc DEPS op_info device_context tensor scope glog shape_inference)
33-
cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry)
33+
cc_test(operator_test SRCS operator_test.cc DEPS operator op_registry init)
3434
cc_library(proto_desc SRCS var_desc.cc op_desc.cc block_desc.cc program_desc.cc DEPS shape_inference op_info operator glog)
3535

3636
cc_library(op_registry SRCS op_registry.cc DEPS op_proto_maker op_info operator glog proto_desc)
@@ -59,5 +59,8 @@ cc_test(var_type_inference_test SRCS var_type_inference_test.cc DEPS op_registry
5959
cc_library(selected_rows SRCS selected_rows.cc DEPS tensor)
6060
cc_test(selected_rows_test SRCS selected_rows_test.cc DEPS selected_rows)
6161

62-
cc_library(init SRCS init.cc DEPS gflags executor place stringpiece)
62+
cc_test(threadpool_test SRCS threadpool_test.cc)
63+
cc_library(init SRCS init.cc DEPS gflags device_context place stringpiece)
6364
cc_test(init_test SRCS init_test.cc DEPS init)
65+
66+
cc_test(op_kernel_type_test SRCS op_kernel_type_test.cc DEPS place device_context)

paddle/framework/data_layout.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ limitations under the License. */
1414

1515
#pragma once
1616

17+
#include <iostream>
18+
#include "paddle/platform/enforce.h"
19+
1720
namespace paddle {
1821
namespace framework {
1922

@@ -33,5 +36,23 @@ inline DataLayout StringToDataLayout(const std::string& str) {
3336
}
3437
}
3538

39+
inline std::string DataLayoutToString(const DataLayout& data_layout) {
40+
switch (data_layout) {
41+
case kNHWC:
42+
return "NHWC";
43+
case kNCHW:
44+
return "NCHW";
45+
case kAnyLayout:
46+
return "ANY_LAYOUT";
47+
default:
48+
PADDLE_THROW("unknown DataLayou %d", data_layout);
49+
}
50+
}
51+
52+
inline std::ostream& operator<<(std::ostream& out, DataLayout l) {
53+
out << DataLayoutToString(l);
54+
return out;
55+
}
56+
3657
} // namespace framework
3758
} // namespace paddle

paddle/framework/executor.cc

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,7 @@ namespace framework {
3333
const std::string kFeedOpType = "feed";
3434
const std::string kFetchOpType = "fetch";
3535

36-
DeviceContextPool* DeviceContextPool::pool = nullptr;
37-
38-
Executor::Executor(const std::vector<platform::Place>& places) {
39-
DeviceContextPool& pool = DeviceContextPool::Get();
40-
auto borrowed_contexts = pool.Borrow(places);
41-
device_contexts_.swap(borrowed_contexts);
42-
}
36+
Executor::Executor(const platform::Place& place) : place_(place) {}
4337

4438
static void CreateTensor(Variable* var, proto::VarDesc::VarType var_type) {
4539
if (var_type == proto::VarDesc::LOD_TENSOR) {
@@ -71,7 +65,6 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id,
7165
// - will change to use multiple blocks for RNN op and Cond Op
7266
PADDLE_ENFORCE_LT(static_cast<size_t>(block_id), pdesc.Size());
7367
auto& block = pdesc.Block(block_id);
74-
auto& device = device_contexts_[0];
7568

7669
Scope* local_scope = scope;
7770
if (create_vars) {
@@ -107,7 +100,7 @@ void Executor::Run(const ProgramDesc& pdesc, Scope* scope, int block_id,
107100
for (auto& op_desc : block.AllOps()) {
108101
auto op = paddle::framework::OpRegistry::CreateOp(*op_desc);
109102
VLOG(3) << op->DebugString();
110-
op->Run(*local_scope, *device);
103+
op->Run(*local_scope, place_);
111104
}
112105
if (create_local_scope) {
113106
scope->DeleteScope(local_scope);

paddle/framework/executor.h

Lines changed: 3 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,6 @@ limitations under the License. */
1414

1515
#pragma once
1616

17-
#include <map>
18-
#include <unordered_map>
19-
2017
#include "paddle/framework/op_info.h"
2118
#include "paddle/framework/program_desc.h"
2219
#include "paddle/framework/scope.h"
@@ -26,96 +23,13 @@ limitations under the License. */
2623
namespace paddle {
2724
namespace framework {
2825

29-
class DeviceContextPool {
30-
public:
31-
static DeviceContextPool& Get() {
32-
PADDLE_ENFORCE_NOT_NULL(pool, "Need to Create DeviceContextPool first!");
33-
return *pool;
34-
}
35-
36-
static DeviceContextPool& Create(const std::vector<platform::Place>& places) {
37-
if (pool == nullptr) {
38-
pool = new DeviceContextPool(places);
39-
}
40-
return *pool;
41-
}
42-
43-
const platform::DeviceContext* Borrow(const platform::Place& place) {
44-
auto range = device_contexts_.equal_range(place);
45-
if (range.first == range.second) {
46-
PADDLE_THROW(
47-
"'Place' is not supported, Please re-compile with WITH_GPU "
48-
"option");
49-
}
50-
return range.first->second;
51-
}
52-
53-
std::vector<const platform::DeviceContext*> Borrow(
54-
const std::vector<platform::Place>& places) {
55-
PADDLE_ENFORCE_GT(places.size(), 0);
56-
PADDLE_ENFORCE_LE(places.size(), device_contexts_.size());
57-
std::vector<const platform::DeviceContext*> borrowed_contexts;
58-
for (auto& place : places) {
59-
auto range = device_contexts_.equal_range(place);
60-
if (range.first == range.second) {
61-
PADDLE_THROW(
62-
"'Place' is not supported, Please re-compile with WITH_GPU "
63-
"option");
64-
}
65-
// TODO(dzhwinter) : assign the first found device. Will enhanced later.
66-
// device load balancer maybe useful here.
67-
borrowed_contexts.emplace_back(range.first->second);
68-
}
69-
return borrowed_contexts;
70-
}
71-
72-
explicit DeviceContextPool(const std::vector<platform::Place>& places) {
73-
PADDLE_ENFORCE_GT(places.size(), 0);
74-
for (size_t i = 0; i < places.size(); i++) {
75-
if (platform::is_cpu_place(places[i])) {
76-
device_contexts_.emplace(
77-
places[i], new platform::CPUDeviceContext(
78-
boost::get<platform::CPUPlace>(places[i])));
79-
} else if (platform::is_gpu_place(places[i])) {
80-
#ifdef PADDLE_WITH_CUDA
81-
device_contexts_.emplace(
82-
places[i], new platform::CUDADeviceContext(
83-
boost::get<platform::GPUPlace>(places[i])));
84-
#else
85-
PADDLE_THROW(
86-
"'GPUPlace' is not supported, Please re-compile with WITH_GPU "
87-
"option");
88-
#endif
89-
}
90-
}
91-
}
92-
93-
~DeviceContextPool() {}
94-
95-
private:
96-
static DeviceContextPool* pool;
97-
struct Hash {
98-
std::hash<int> hash_;
99-
size_t operator()(const platform::Place& place) const {
100-
return hash_(place.which());
101-
}
102-
};
103-
std::unordered_multimap<const platform::Place, const platform::DeviceContext*,
104-
Hash>
105-
device_contexts_;
106-
DISABLE_COPY_AND_ASSIGN(DeviceContextPool);
107-
};
108-
10926
class Executor {
11027
public:
11128
// TODO(dzhwinter) : Do not rely on this function, it will be removed
11229
explicit Executor(const platform::DeviceContext& device)
113-
: Executor(std::vector<platform::Place>({device.GetPlace()})) {}
114-
115-
explicit Executor(const platform::Place& place)
116-
: Executor(std::vector<platform::Place>({place})) {}
30+
: Executor(device.GetPlace()) {}
11731

118-
explicit Executor(const std::vector<platform::Place>& places);
32+
explicit Executor(const platform::Place& place);
11933

12034
/* @Brief
12135
* Runtime evaluation of the given ProgramDesc under certain Scope
@@ -128,7 +42,7 @@ class Executor {
12842
bool create_vars = true);
12943

13044
private:
131-
std::vector<const platform::DeviceContext*> device_contexts_;
45+
const platform::Place place_;
13246
};
13347

13448
} // namespace framework

paddle/framework/init.cc

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@
1414
#include <algorithm>
1515
#include <string>
1616

17-
#include "paddle/framework/executor.h"
1817
#include "paddle/framework/init.h"
18+
#include "paddle/platform/device_context.h"
1919
#include "paddle/platform/place.h"
2020
#include "paddle/string/piece.h"
2121

@@ -48,13 +48,13 @@ bool InitDevices(const std::vector<std::string> &devices) {
4848
std::vector<platform::Place> places;
4949
for (auto &device : devices) {
5050
auto p = string::Piece(device);
51-
if (string::Find(p, ':', 0) == string::Piece::npos) {
51+
if (string::HasPrefix(p, "CPU")) {
5252
places.emplace_back(platform::CPUPlace());
5353
} else if (string::HasPrefix(p, "GPU")) {
5454
#ifdef PADDLE_WITH_CUDA
5555
auto pos = string::RFind(p, ':', string::Piece::npos);
5656
auto number = device.substr(pos + 1);
57-
places.emplace_back(platform::GPUPlace(std::stoi(number)));
57+
places.emplace_back(platform::CUDAPlace(std::stoi(number)));
5858
#else
5959
LOG(WARNING)
6060
<< "'GPU' is not supported, Please re-compile with WITH_GPU option";
@@ -69,10 +69,9 @@ bool InitDevices(const std::vector<std::string> &devices) {
6969
return platform::is_cpu_place(place);
7070
}) == places.end()) {
7171
places.emplace_back(platform::CPUPlace());
72-
LOG(WARNING) << "Not specified any device, use CPU by Default.";
72+
LOG(WARNING) << "Not specified CPU device, create CPU by Default.";
7373
}
74-
DeviceContextPool::Create(places);
75-
return true;
74+
platform::DeviceContextPool::Create(places);
7675
return true;
7776
}
7877

paddle/framework/init_test.cc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,9 @@ TEST(Init, InitDevices) {
2323
#ifdef PADDLE_WITH_CUDA
2424
std::vector<std::string> ds2 = {"CPU", "GPU:0", "GPU:1"};
2525
ASSERT_EQ(InitDevices(ds2), true);
26+
27+
// test re-init
28+
std::vector<std::string> ds3 = {"GPU:0", "GPU:1"};
29+
ASSERT_EQ(InitDevices(ds3), true);
2630
#endif
2731
}

paddle/framework/library_type.h

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,25 @@ namespace framework {
2020
// For more details about the design of LibraryType, Please refer to
2121
// https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/operator_kernel_type.md#library
2222

23-
enum LibraryType { kPlain = 0; kMKLDNN = 1; kCUDNN = 2; }
23+
enum LibraryType { kPlain = 0, kMKLDNN = 1, kCUDNN = 2 };
24+
25+
inline std::string LibraryTypeToString(const LibraryType& library_type) {
26+
switch (library_type) {
27+
case kPlain:
28+
return "PLAIN";
29+
case kMKLDNN:
30+
return "MKLDNN";
31+
case kCUDNN:
32+
return "CUDNN";
33+
default:
34+
PADDLE_THROW("unknown LibraryType %d", library_type);
35+
}
36+
}
37+
38+
inline std::ostream& operator<<(std::ostream& out, LibraryType l) {
39+
out << LibraryTypeToString(l);
40+
return out;
41+
}
2442

2543
} // namespace
2644
} // framework

paddle/framework/lod_tensor.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ void SerializeToStream(std::ostream &os, const LoDTensor &tensor,
224224
while (size != 0) {
225225
size_t size_to_write = std::min(kBufSize, static_cast<size_t>(size));
226226
memory::Copy(cpu, buf.get(),
227-
boost::get<platform::GPUPlace>(tensor.place()),
227+
boost::get<platform::CUDAPlace>(tensor.place()),
228228
reinterpret_cast<const void *>(data), size_to_write,
229229
gpu_dev_ctx.stream());
230230
gpu_dev_ctx.Wait();

paddle/framework/lod_tensor_test.cu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ __global__ void test(size_t* a, int size) {
2727

2828
TEST(LoDTensor, LoDInGPU) {
2929
paddle::framework::LoDTensor lod_tensor;
30-
paddle::platform::GPUPlace place(0);
30+
paddle::platform::CUDAPlace place(0);
3131

3232
paddle::framework::LoD src_lod;
3333
src_lod.push_back(std::vector<size_t>{0, 2, 4, 6, 8, 10, 12, 14});

0 commit comments

Comments
 (0)