Skip to content

Commit db0df8f

Browse files
authored
Merge pull request #1235 from putcn/k8s_aws
added region check step, route53 config.
2 parents fa9c623 + 3f02ede commit db0df8f

File tree

1 file changed

+34
-5
lines changed

1 file changed

+34
-5
lines changed

doc/howto/usage/k8s/k8s_aws_en.md

+34-5
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
# Distributed PaddlePaddle Training on AWS with Kubernetes
23

34
We will show you step by step on how to run distributed PaddlePaddle training on AWS cluster with Kubernetes. Let's start from core concepts.
@@ -43,6 +44,12 @@ We rank each pod by sorting them by their ips. The rank of each pod could be the
4344

4445
## PaddlePaddle on AWS with Kubernetes
4546

47+
### Choose AWS Service Region
48+
This tutorial requires several AWS services work in the same region. Before we create anything in AWS, please check the following link
49+
https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/
50+
Choose a region which has the following services available: EC2, EFS, VPS, CloudFormation, KMS, VPC, S3.
51+
In this tutorial, we use "Oregon(us-west-2)" as example.
52+
4653
### Create AWS Account and IAM Account
4754

4855
Under each AWS account, we can create multiple [IAM](http://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) users. This allows us to grant some privileges to each IAM user and to create/operate AWS clusters as an IAM user.
@@ -73,7 +80,8 @@ Please be aware that this tutorial needs the following privileges for the user i
7380
#### kube-aws
7481

7582
[kube-aws](https://github.com/coreos/kube-aws) is a CLI tool to automate cluster deployment to AWS.
76-
83+
##### Verify kube-aws integrity
84+
Note: if you are using a non-official release (e.g RC release) kube-aws, you can skip this setp.
7785
Import the CoreOS Application Signing Public Key:
7886

7987
```
@@ -98,7 +106,7 @@ PLATFORM=darwin-amd64
98106
99107
gpg2 --verify kube-aws-${PLATFORM}.tar.gz.sig kube-aws-${PLATFORM}.tar.gz
100108
```
101-
109+
##### Install kube-aws
102110
Extract the binary:
103111

104112
```
@@ -241,22 +249,23 @@ Paste into following inline policies:
241249
]
242250
}
243251
```
244-
252+
`Version` : Its value has to be exactly "2012-10-17".
245253
`AWS_ACCOUNT_ID`: You can get it from following command line:
246254

247255
```
248256
aws sts get-caller-identity --output text --query Account
249257
```
250258

251-
`MY_CLUSTER_NAME`: Pick a MY_CLUSTER_NAME that you like, you will use it later as well.
259+
`MY_CLUSTER_NAME`: Pick a MY_CLUSTER_NAME that you like, you will use it later as well.
260+
Please note, stack name must satisfy regular expression pattern: [a-zA-Z][-a-zA-Z0-9*]*, which means no "_" or "-" in stack name, or kube-aws will throw error in later steps.
252261

253262
#### External DNS name
254263

255264
When the cluster is created, the controller will expose the TLS-secured API on a DNS name.
256265

257266
DNS name should have a CNAME points to cluster DNS name or an A record points to the cluster IP address.
258267

259-
We will need to use DNS name later in tutorial.
268+
We will need to use DNS name later in tutorial. If you don't already own one, you can choose any DNS name (e.g., `paddle`) and modify `/etc/hosts` to associate cluster IP with that DNS name for your local machine. And add name service (route53) in aws to associate the IP to paddle for cluster. We will find the cluster IP in later steps.
260269

261270
#### S3 bucket
262271

@@ -364,6 +373,26 @@ paddle-cl-ElbAPISe-EEOI3EZPR86C-531251350.us-west-2.elb.amazonaws.com. 59 IN A 5
364373

365374
In the above output, both ip `54.241.164.52`, `54.67.102.112` will work.
366375

376+
*If you own a DNS name*, set the A record to any of the above ip. Then you can skip to the step "Access the cluster".
377+
378+
*If you do not own a DNS name*:
379+
##### Update local DNS association
380+
Edit `/etc/hosts` to associate above ip with the DNS name.
381+
##### Add Route53 private name service in VPC
382+
- Open [Route53 Console](https://console.aws.amazon.com/route53/home)
383+
- Create hosted zone with following config
384+
- Domain name: "paddle"
385+
- Type: "Private hosted zone for amazon VPC"
386+
- VPC ID: <Your VPC ID>
387+
- Add A record
388+
- Click on the zone "paddle" just created
389+
- Click the button "Create record set"
390+
- Name : leave blank
391+
- type: "A"
392+
- Value: <kube-controller ec2 private ip>
393+
- Verify name service
394+
- Connect to any instance created by kube-aws via ssh
395+
- Run command "host paddle", see if the ip returned is the private ip of kube-controller
367396

368397
#### Access the cluster
369398

0 commit comments

Comments
 (0)