Skip to content

A full-stack AWS application that automatically backs up GitHub repositories to S3 when code is pushed to main or develop branches. Built with AWS CDK (TypeScript) and Java Lambda.

License

Notifications You must be signed in to change notification settings

pravin-ba/RepositorytoS3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ GitHub Repository to S3 Backup Service

AWS CDK Java AWS Lambda License CI/CD

A full-stack AWS application that automatically backs up GitHub repositories to S3 when code is pushed to main or develop branches. Built with AWS CDK (TypeScript) and Java Lambda.

πŸ—οΈ Architecture

sequenceDiagram
    GitHub->>API Gateway: Webhook (push event)
    API Gateway->>Lambda: Invoke function
    Lambda ->>AWS Secrets Manager: Read Github PAT
    Lambda->>GitHub API: Fetch repo zipball
    Lambda->>S3: Store archive

Loading

Components:

  • AWS CDK (TypeScript): Infrastructure as Code
  • API Gateway: Webhook endpoint for GitHub events
  • Lambda Function (Java 17): Processes webhooks and fetches repository snapshots
  • S3 Bucket: Stores repository ZIP files
  • Secrets Manager: Securely stores GitHub Personal Access Token
  • CloudWatch Logs: Comprehensive logging for monitoring

✨ Features

  • πŸ”„ Automatic Backup: Triggers on push to main or develop branches
  • πŸ”’ Secure: Uses AWS Secrets Manager for GitHub PAT storage
  • πŸ“¦ Efficient: Uses GitHub Zipball API for direct streaming to S3
  • πŸš€ Serverless: Fully managed AWS services
  • πŸ“Š Monitored: CloudWatch logging and API Gateway access logs
  • πŸ—οΈ Infrastructure as Code: Complete CDK deployment

πŸ“‹ Prerequisites

  • AWS CLI configured with appropriate permissions
  • Node.js 18+ and npm
  • Java 17 JDK
  • Maven 3.6+
  • GitHub Personal Access Token with repo scope

πŸš€ Quick Start

1. Clone and Setup

git clone git@github.com:pravin-ba/RepositorytoS3.git
cd RepositorytoS3
npm install

2. Build Lambda

cd lambda
mvn clean package
cd ..

3. Deploy Infrastructure

npx cdk bootstrap  # First time only
npx cdk deploy

4. Configure GitHub PAT

aws secretsmanager put-secret-value \
  --secret-id github/pat \
  --secret-string "your-github-personal-access-token"

5. Setup GitHub Webhook

  1. Go to your GitHub repository β†’ Settings β†’ Webhooks
  2. Add webhook URL: https://[api-id].execute-api.[region].amazonaws.com/prod/webhook
  3. Content type: application/json
  4. Events: Select "Just the push event"

πŸ“ Project Structure

RepositorytoS3/
β”œβ”€β”€ lib/
β”‚   └── repositoryto_s3-stack.ts    # CDK infrastructure definition
β”œβ”€β”€ lambda/
β”‚   β”œβ”€β”€ src/main/java/com/example/lambda/
β”‚   β”‚   └── GitHubSnapshotHandler.java  # Lambda handler
β”‚   β”œβ”€β”€ src/test/java/com/example/lambda/
β”‚   β”‚   └── GitHubSnapshotHandlerTest.java  # Unit tests
β”‚   β”œβ”€β”€ pom.xml                     # Maven dependencies
β”‚   └── Dockerfile                  # Docker config (optional)
β”œβ”€β”€ cdk.json                        # CDK configuration
β”œβ”€β”€ package.json                    # Node.js dependencies
└── README.md                       # This file

πŸ”§ Configuration

Environment Variables

The Lambda function uses these environment variables:

  • BUCKET_NAME: S3 bucket for storing repository snapshots
  • GITHUB_PAT_SECRET_ARN: ARN of the Secrets Manager secret containing GitHub PAT

Lambda Configuration

  • Runtime: Java 17
  • Memory: 1024 MB
  • Timeout: 5 minutes
  • Handler: com.example.lambda.GitHubSnapshotHandler::handleRequest

πŸ§ͺ Testing

Unit Tests

cd lambda
mvn test

Manual Testing

Test the webhook endpoint with a sample payload:

curl -X POST https://[api-id].execute-api.[region].amazonaws.com/prod/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "ref": "refs/heads/main",
    "repository": {
      "name": "test-repo",
      "owner": {
        "name": "test-owner"
      }
    }
  }'

πŸ“Š Monitoring

CloudWatch Logs

  • Lambda Logs: /aws/lambda/RepositorytoS3Stack-RepoToS3Lambda
  • API Gateway Logs: /aws/apigateway/RepositorytoS3Stack-RepoWebhookApi

Key Metrics

  • Lambda invocation count and duration
  • API Gateway request count and latency
  • S3 bucket storage usage
  • Error rates and 4xx/5xx responses

πŸ”’ Security

  • GitHub PAT stored in AWS Secrets Manager
  • IAM roles with least privilege access
  • API Gateway with proper authentication
  • S3 bucket with appropriate access controls

πŸš€ Deployment

Development

npx cdk deploy

Production

npx cdk deploy --require-approval never

-->

Cleanup

npx cdk destroy

⭐ Star this repository if you find it helpful!

About

A full-stack AWS application that automatically backs up GitHub repositories to S3 when code is pushed to main or develop branches. Built with AWS CDK (TypeScript) and Java Lambda.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published