A full-stack AWS application that automatically backs up GitHub repositories to S3 when code is pushed to main or develop branches. Built with AWS CDK (TypeScript) and Java Lambda.
sequenceDiagram
GitHub->>API Gateway: Webhook (push event)
API Gateway->>Lambda: Invoke function
Lambda ->>AWS Secrets Manager: Read Github PAT
Lambda->>GitHub API: Fetch repo zipball
Lambda->>S3: Store archive
- AWS CDK (TypeScript): Infrastructure as Code
- API Gateway: Webhook endpoint for GitHub events
- Lambda Function (Java 17): Processes webhooks and fetches repository snapshots
- S3 Bucket: Stores repository ZIP files
- Secrets Manager: Securely stores GitHub Personal Access Token
- CloudWatch Logs: Comprehensive logging for monitoring
- π Automatic Backup: Triggers on push to
main
ordevelop
branches - π Secure: Uses AWS Secrets Manager for GitHub PAT storage
- π¦ Efficient: Uses GitHub Zipball API for direct streaming to S3
- π Serverless: Fully managed AWS services
- π Monitored: CloudWatch logging and API Gateway access logs
- ποΈ Infrastructure as Code: Complete CDK deployment
- AWS CLI configured with appropriate permissions
- Node.js 18+ and npm
- Java 17 JDK
- Maven 3.6+
- GitHub Personal Access Token with
repo
scope
git clone git@github.com:pravin-ba/RepositorytoS3.git
cd RepositorytoS3
npm install
cd lambda
mvn clean package
cd ..
npx cdk bootstrap # First time only
npx cdk deploy
aws secretsmanager put-secret-value \
--secret-id github/pat \
--secret-string "your-github-personal-access-token"
- Go to your GitHub repository β Settings β Webhooks
- Add webhook URL:
https://[api-id].execute-api.[region].amazonaws.com/prod/webhook
- Content type:
application/json
- Events: Select "Just the push event"
RepositorytoS3/
βββ lib/
β βββ repositoryto_s3-stack.ts # CDK infrastructure definition
βββ lambda/
β βββ src/main/java/com/example/lambda/
β β βββ GitHubSnapshotHandler.java # Lambda handler
β βββ src/test/java/com/example/lambda/
β β βββ GitHubSnapshotHandlerTest.java # Unit tests
β βββ pom.xml # Maven dependencies
β βββ Dockerfile # Docker config (optional)
βββ cdk.json # CDK configuration
βββ package.json # Node.js dependencies
βββ README.md # This file
The Lambda function uses these environment variables:
BUCKET_NAME
: S3 bucket for storing repository snapshotsGITHUB_PAT_SECRET_ARN
: ARN of the Secrets Manager secret containing GitHub PAT
- Runtime: Java 17
- Memory: 1024 MB
- Timeout: 5 minutes
- Handler:
com.example.lambda.GitHubSnapshotHandler::handleRequest
cd lambda
mvn test
Test the webhook endpoint with a sample payload:
curl -X POST https://[api-id].execute-api.[region].amazonaws.com/prod/webhook \
-H "Content-Type: application/json" \
-d '{
"ref": "refs/heads/main",
"repository": {
"name": "test-repo",
"owner": {
"name": "test-owner"
}
}
}'
- Lambda Logs:
/aws/lambda/RepositorytoS3Stack-RepoToS3Lambda
- API Gateway Logs:
/aws/apigateway/RepositorytoS3Stack-RepoWebhookApi
- Lambda invocation count and duration
- API Gateway request count and latency
- S3 bucket storage usage
- Error rates and 4xx/5xx responses
- GitHub PAT stored in AWS Secrets Manager
- IAM roles with least privilege access
- API Gateway with proper authentication
- S3 bucket with appropriate access controls
npx cdk deploy
npx cdk deploy --require-approval never
-->
npx cdk destroy
β Star this repository if you find it helpful!