Version 2.0 | Last Updated: July 11, 2025 | Production-Ready
A comprehensive, enterprise-grade Terraform project implementing production-ready AWS infrastructure with automatic failover, multi-layered health checks, centralized logging via ELK stack, secure bastion access, and modular architecture designed for high availability and DevSecOps best practices.
π Featured Architecture: This project showcases excellent visualization and production-grade design patterns with comprehensive monitoring, security, and automation capabilities.
- β 9 Specialized Terraform Modules - Modular architecture for maintainable infrastructure
- β Multi-AZ High Availability - Automatic failover across availability zones with zero downtime
- β Auto Scaling Groups - Dynamic scaling based on demand with health monitoring
- β Application Load Balancer - Traffic distribution with advanced health checks
- β Launch Templates - Versioned instance configurations with GP3 storage
- β Secure Bastion Host - SSH access to private instances with Elastic IP
- β IAM Best Practices - Centralized roles and policies with least privilege
- β VPC Security Groups - Granular network access controls between tiers
- β EBS Encryption - Encrypted storage volumes with IMDSv2 enforcement
- β Private Subnets - Application instances isolated from direct internet access
- β ELK Stack Integration - Centralized logging with OpenSearch and Kibana dashboards
- β CloudWatch Monitoring - Comprehensive metrics, alarms, and dashboards
- β SNS Notifications - Email alerts for critical infrastructure events
- β Multi-Layer Health Checks - ALB, Route 53, and instance-level monitoring
- β Cost Tracking - Resource tagging for detailed cost allocation
- β Ansible Integration - Complete instance configuration from GitHub repository
- β GitHub Synchronization - Daily automated sync of configuration updates
- β Self-Configuring Instances - Automatic software installation and service setup
- β Role-Based Development - Structured guidance for different engineering disciplines
- β Idempotent Operations - Safe, repeatable configuration management
- β Route 53 Integration - DNS management with health check routing
- β NAT Gateway - Secure outbound internet access for private instances
- β Multi-AZ Deployment - Resources distributed across availability zones
- β CIDR Management - Organized subnet allocation and network planning
| Version | Date | Key Features | Status |
|---|---|---|---|
| 2.0 | July 11, 2025 | ELK Stack, Bastion Host, Enhanced Security, Ansible Automation | β Current |
| 1.5 | June 2025 | Modular Architecture, Auto Scaling, Load Balancer Integration | β Stable |
| 1.0 | May 2025 | Basic EC2 Failover, CloudWatch Monitoring, Initial Terraform Setup | β Legacy |
- π ELK Stack: Centralized logging with OpenSearch, Kibana dashboards, and log shipping
- π° Bastion Host: Secure SSH access with Elastic IP and proper security group configuration
- π Ansible Integration: Complete configuration management with GitHub synchronization
- π Enhanced Security: IAM centralization, encryption, and DevSecOps best practices
- π Advanced Monitoring: Multi-layer health checks and comprehensive alerting
- ποΈ Module Expansion: 9 specialized modules for enterprise-grade infrastructure
graph TB
subgraph "Internet"
Users[π₯ Users]
DNS[π Route 53<br/>DNS + Health Checks]
end
subgraph "AWS VPC - Multi-AZ"
subgraph "Public Subnets"
ALB[βοΈ Application<br/>Load Balancer]
NAT1[π NAT Gateway<br/>AZ-1a]
NAT2[π NAT Gateway<br/>AZ-1b]
end
subgraph "Private Subnets"
subgraph "AZ-1a"
EC2_1[π₯οΈ EC2 Instance<br/>Auto Scaling Group]
end
subgraph "AZ-1b"
EC2_2[π₯οΈ EC2 Instance<br/>Auto Scaling Group]
end
end
subgraph "Infrastructure Modules"
LT[π Launch Template<br/>Module]
ASG[π Auto Scaling<br/>Module]
IAM[π IAM Module<br/>Roles & Policies]
MON[π Monitoring<br/>Module]
end
end
Users --> DNS
DNS --> ALB
ALB --> EC2_1
ALB --> EC2_2
LT --> ASG
ASG --> EC2_1
ASG --> EC2_2
IAM --> EC2_1
IAM --> EC2_2
MON --> ALB
MON --> ASG
EC2_1 --> NAT1
EC2_2 --> NAT2
sequenceDiagram
participant User as π₯ Users
participant R53 as π Route 53
participant ALB as βοΈ Load Balancer
participant ASG as π Auto Scaling Group
participant EC2_OLD as π₯οΈ Failed Instance
participant LT as π Launch Template
participant EC2_NEW as β¨ New Instance
participant CW as π CloudWatch
Note over EC2_OLD: π¨ Service Failure Occurs
ALB->>EC2_OLD: Health Check (HTTP GET /)
EC2_OLD-->>ALB: β No Response (Timeout)
Note over ALB: After 2 failed checks (60s)
ALB->>ALB: Mark Instance Unhealthy
ALB->>User: Stop routing traffic
ASG->>ALB: Monitor target health status
Note over ASG: Grace Period: 300s (5 minutes)
ASG->>ASG: Instance still unhealthy
ASG->>LT: Request new instance config
LT-->>ASG: Instance configuration
ASG->>EC2_NEW: Launch replacement instance
Note over EC2_NEW: User data script runs<br/>Web server starts
ALB->>EC2_NEW: Health Check (HTTP GET /)
EC2_NEW-->>ALB: β
200 OK
Note over ALB: After 2 healthy checks (60s)
ALB->>ALB: Mark Instance Healthy
ALB->>User: Resume traffic routing
ASG->>EC2_OLD: Terminate failed instance
CW->>CW: Log metrics and alerts
Note over User: π Service Restored<br/>Zero downtime achieved
π Visual Excellence: This project features outstanding architectural visualization with comprehensive Mermaid diagrams that clearly illustrate complex infrastructure relationships, data flows, and operational procedures.
graph TB
subgraph "External Access"
Users[π₯ Users]
DNS[π Route 53<br/>Health Checks & DNS]
Bastion[π Bastion Host<br/>10.0.1.138<br/>EIP: 13.223.40.186]
end
subgraph "AWS VPC - 10.0.0.0/16"
subgraph "Public Subnets - DMZ"
subgraph "us-east-1a - 10.0.1.0/24"
ALB[βοΈ Application<br/>Load Balancer<br/>Port 80/443]
NAT1[π NAT Gateway<br/>AZ-1a]
Bastion
end
subgraph "us-east-1b - 10.0.2.0/24"
NAT2[π NAT Gateway<br/>AZ-1b]
end
end
subgraph "Private Subnets - Application Tier"
subgraph "us-east-1a - 10.0.10.0/24"
EC2_1[π₯οΈ EC2 Instance<br/>10.0.20.205<br/>Auto Scaling Group]
end
subgraph "us-east-1b - 10.0.20.0/24"
EC2_2[π₯οΈ EC2 Instance<br/>10.0.20.241<br/>Auto Scaling Group]
end
end
subgraph "Data & Analytics"
ELK[π OpenSearch/ELK<br/>Centralized Logging<br/>& Analytics]
CloudWatch[π CloudWatch<br/>Metrics & Alarms]
SNS[π§ SNS Topics<br/>Alert Notifications]
end
end
subgraph "Infrastructure Modules"
LT[π Launch Template<br/>GP3, IMDSv2, Encryption]
ASG[π Auto Scaling Group<br/>Min:1, Max:5, Desired:2]
IAM[π IAM Module<br/>Roles & Policies]
MON[π Monitoring Module<br/>Dashboards & Alerts]
BastionMod[π° Bastion Module<br/>Secure SSH Access]
ELKMod[π ELK Module<br/>Log Aggregation]
end
subgraph "Automation & Config"
Ansible[π Ansible<br/>Configuration Management]
GitHub[π GitHub<br/>Ansible Playbooks]
end
Users --> DNS
Users -.-> Bastion
DNS --> ALB
ALB --> EC2_1
ALB --> EC2_2
Bastion -.-> EC2_1
Bastion -.-> EC2_2
LT --> ASG
ASG --> EC2_1
ASG --> EC2_2
IAM --> EC2_1
IAM --> EC2_2
BastionMod --> Bastion
ELKMod --> ELK
EC2_1 --> NAT1
EC2_2 --> NAT2
EC2_1 --> ELK
EC2_2 --> ELK
EC2_1 --> CloudWatch
EC2_2 --> CloudWatch
CloudWatch --> SNS
EC2_1 -.-> GitHub
EC2_2 -.-> GitHub
GitHub -.-> Ansible
Ansible -.-> EC2_1
Ansible -.-> EC2_2
style Bastion fill:#e1f5fe
style ELK fill:#f3e5f5
style ALB fill:#e8f5e8
style EC2_1 fill:#fff3e0
style EC2_2 fill:#fff3e0
sequenceDiagram
participant User as π₯ Users
participant R53 as π Route 53
participant ALB as βοΈ Load Balancer
participant ASG as π Auto Scaling Group
participant EC2_OLD as π₯οΈ Failed Instance<br/>10.0.20.25
participant LT as π Launch Template
participant EC2_NEW as β¨ New Instance<br/>10.0.20.205
participant CW as π CloudWatch
participant ELK as π ELK Stack
participant Bastion as π Bastion Host
Note over EC2_OLD: π¨ Service Failure Detected
ALB->>EC2_OLD: Health Check (HTTP GET /)
EC2_OLD-->>ALB: β Connection Timeout
Note over ALB: 30s: First failed check
ALB->>EC2_OLD: Health Check Retry
EC2_OLD-->>ALB: β Still failing
Note over ALB: 60s: Second failed check
ALB->>ALB: π΄ Mark Instance Unhealthy
ALB->>User: π Stop routing traffic to failed instance
ALB->>ASG: π Report instance unhealthy
Note over ASG: 300s: Health check grace period
ASG->>ASG: π Confirm instance still unhealthy
ASG->>LT: π Request new instance configuration
LT-->>ASG: β
Instance config (AMI, security groups, etc.)
ASG->>EC2_NEW: π Launch replacement instance
Note over EC2_NEW: π§ User data script executes<br/>π¦ Ansible pulls from GitHub<br/>π Configure services automatically
EC2_NEW->>GitHub: π₯ Pull Ansible configuration
EC2_NEW->>EC2_NEW: π― Run playbooks (web server, monitoring, etc.)
EC2_NEW->>ELK: π Start shipping logs
EC2_NEW->>CW: π Begin sending metrics
ALB->>EC2_NEW: π Initial health check
EC2_NEW-->>ALB: β³ Still starting up...
Note over EC2_NEW: 120s: Services fully started
ALB->>EC2_NEW: π Health Check (HTTP GET /)
EC2_NEW-->>ALB: β
200 OK - Ready to serve
Note over ALB: 60s: Second successful check
ALB->>ALB: π’ Mark Instance Healthy
ALB->>User: π Resume full traffic routing
ASG->>EC2_OLD: π Terminate failed instance
CW->>SNS: π§ Send recovery notification
ELK->>ELK: π Log complete recovery timeline
Note over User: π Service Fully Restored<br/>π‘ Zero downtime achieved<br/>π All metrics normalized
Note over Bastion: π SSH access available for<br/>troubleshooting throughout process
graph TB
subgraph "External Layer"
Dev[π¨βπ» Developer]
User[π₯ End Users]
GitHub[π GitHub Repository<br/>Ansible Playbooks]
end
subgraph "AWS VPC - Production Environment"
subgraph "Public DMZ - 10.0.1.0/24, 10.0.2.0/24"
Bastion[π Bastion Host<br/>EIP: 13.223.40.186<br/>SSH Gateway]
ALB[βοΈ Application Load Balancer<br/>Health Checks<br/>Traffic Distribution]
NAT[π NAT Gateways<br/>Outbound Internet Access]
end
subgraph "Private App Tier - 10.0.10.0/24, 10.0.20.0/24"
ASG[π Auto Scaling Group<br/>Min: 1, Max: 5, Desired: 2]
EC2_1[π₯οΈ Instance 1<br/>10.0.20.205<br/>Web Server + Ansible]
EC2_2[π₯οΈ Instance 2<br/>10.0.20.241<br/>Web Server + Ansible]
end
subgraph "Data & Analytics Layer"
ELK[π OpenSearch Cluster<br/>vpc-ec2-failover-dev-elk<br/>Centralized Logging]
CW[π CloudWatch<br/>Metrics & Dashboards<br/>Log Groups]
SNS[π§ SNS Topics<br/>Alert Distribution]
end
subgraph "DNS & Routing"
R53[π Route 53<br/>Health Check Routing<br/>DNS Management]
end
end
%% User Traffic Flow
User -->|HTTP/HTTPS| R53
R53 -->|DNS Resolution| ALB
ALB -->|Load Balance| EC2_1
ALB -->|Load Balance| EC2_2
%% Developer Access Flow
Dev -.->|SSH Key Auth| Bastion
Bastion -.->|SSH Forward| EC2_1
Bastion -.->|SSH Forward| EC2_2
%% Configuration Management Flow
GitHub -->|Pull Configs| EC2_1
GitHub -->|Pull Configs| EC2_2
EC2_1 -->|Apply Ansible| EC2_1
EC2_2 -->|Apply Ansible| EC2_2
%% Monitoring & Logging Flow
EC2_1 -->|Logs & Metrics| CW
EC2_2 -->|Logs & Metrics| CW
EC2_1 -->|Application Logs| ELK
EC2_2 -->|Application Logs| ELK
ALB -->|Access Logs| ELK
CW -->|Alerts| SNS
ELK -->|Storage Alerts| SNS
%% Auto Scaling Flow
ALB -->|Health Status| ASG
ASG -->|Launch/Terminate| EC2_1
ASG -->|Launch/Terminate| EC2_2
CW -->|Metrics| ASG
%% Internet Access Flow
EC2_1 -->|Outbound HTTPS| NAT
EC2_2 -->|Outbound HTTPS| NAT
style Bastion fill:#ffecb3,stroke:#ff6f00,stroke-width:3px
style ELK fill:#e8f5e8,stroke:#2e7d32,stroke-width:3px
style ALB fill:#dcedc8,stroke:#388e3c,stroke-width:3px
style ASG fill:#bbdefb,stroke:#1976d2,stroke-width:3px
style CW fill:#fff3e0,stroke:#ef6c00,stroke-width:3px
graph LR
subgraph "Security Features"
BastionF[π Bastion Host<br/>Secure SSH Access<br/>Key-based Authentication]
IAMF[π‘οΈ IAM Security<br/>Centralized Policies<br/>Least Privilege]
EncryptF[π Encryption<br/>EBS Volumes<br/>Data at Rest]
end
subgraph "Automation Features"
AnsibleF[π Ansible Automation<br/>Configuration Management<br/>GitHub Integration]
ASGF[π Auto Scaling<br/>Health-based Scaling<br/>Instance Replacement]
LaunchF[π Launch Templates<br/>Versioned Configs<br/>GP3 Storage]
end
subgraph "Monitoring Features"
ELKF[π ELK Stack<br/>Centralized Logging<br/>Real-time Analytics]
CWF[π CloudWatch<br/>Metrics & Alarms<br/>Custom Dashboards]
SNSF[π§ SNS Alerts<br/>Email Notifications<br/>Event-driven]
end
subgraph "High Availability Features"
ALBF[βοΈ Load Balancer<br/>Health Checks<br/>Traffic Distribution]
MultiAZF[π Multi-AZ<br/>Cross-AZ Deployment<br/>Fault Tolerance]
R53F[π Route 53<br/>DNS Failover<br/>Health Routing]
end
BastionF --> AnsibleF
IAMF --> ASGF
AnsibleF --> ELKF
ASGF --> ALBF
ELKF --> CWF
CWF --> SNSF
ALBF --> MultiAZF
MultiAZF --> R53F
style BastionF fill:#ffcdd2
style AnsibleF fill:#c8e6c9
style ELKF fill:#bbdefb
style ALBF fill:#dcedc8
π ec2-failover/ # π Enterprise Infrastructure Project
βββ ποΈ modules/ # π― 9 Specialized Infrastructure Modules
β βββ π networking/ # Core VPC Infrastructure
β β βββ main.tf # VPC, Subnets, NAT, IGW, Security Groups
β β βββ variables.tf # CIDR blocks, AZ configuration
β β βββ outputs.tf # VPC ID, subnet IDs, security group IDs
β β
β βββ βοΈ load_balancer/ # Application Load Balancer
β β βββ main.tf # ALB, target groups, listeners
β β βββ variables.tf # Health check settings, ports
β β βββ outputs.tf # ALB DNS, target group ARNs
β β
β βββ π route53/ # DNS Management & Health Checks
β β βββ main.tf # Hosted zones, health checks
β β βββ variables.tf # Domain configuration
β β βββ outputs.tf # Zone ID, DNS records
β β
β βββ π launch_template/ # Instance Configuration Templates
β β βββ main.tf # Launch template, GP3, IMDSv2, encryption
β β βββ variables.tf # Instance specs, storage, security
β β βββ outputs.tf # Template ID, ARN, versions
β β
β βββ π autoscaling/ # Auto Scaling & Health Management
β β βββ main.tf # ASG, scaling policies, CloudWatch alarms
β β βββ variables.tf # Min/max size, health check config
β β βββ outputs.tf # ASG details, policy ARNs
β β
β βββ π₯οΈ ec2/ # EC2 Instance Management
β β βββ main.tf # Instance configuration, user data
β β βββ variables.tf # AMI, instance type, key pairs
β β βββ outputs.tf # Instance IDs, private IPs
β β
β βββ π iam/ # Centralized IAM Security
β β βββ main.tf # EC2 roles, CloudWatch/SSM policies
β β βββ variables.tf # SNS publishing, environment config
β β βββ outputs.tf # Role ARNs, instance profiles
β β
β βββ π° bastion/ # Secure SSH Access Gateway
β β βββ main.tf # Bastion instance, EIP, security groups
β β βββ variables.tf # SSH access configuration, key pairs
β β βββ outputs.tf # Bastion IP, SSH commands
β β βββ user_data.sh # Bastion initialization script
β β
β βββ π monitoring/ # CloudWatch & SNS Monitoring
β β βββ main.tf # CloudWatch alarms, SNS topics
β β βββ variables.tf # Alert thresholds, email config
β β βββ outputs.tf # Alarm ARNs, topic ARNs
β β
β βββ π elk/ # ELK Stack Centralized Logging
β βββ main.tf # OpenSearch cluster, log groups
β βββ variables.tf # ELK configuration, retention
β βββ outputs.tf # OpenSearch endpoints, Kibana URLs
β
βββ π’ environments/ # Multi-Environment Orchestration
β βββ π§ͺ dev/ # Development Environment
β β βββ main.tf # Module integration & configuration
β β βββ variables.tf # Environment-specific variables
β β βββ outputs.tf # Environment outputs
β β βββ terraform.tfvars # Actual variable values
β β βββ terraform.tfvars.example # Template for configuration
β β βββ terraform.tfstate # State management
β β
β βββ π staging/ # Staging Environment (Template)
β βββ π prod/ # Production Environment (Template)
β
βββ π ansible/ # Configuration Management
β βββ π playbooks/ # Ansible Playbooks
β β βββ site.yml # Main configuration playbook
β βββ π― roles/ # Modular Ansible Roles
β β βββ common/ # Base system configuration
β β βββ webserver/ # Apache/Nginx setup
β β βββ monitoring/ # CloudWatch agent
β β βββ docker/ # Container runtime
β β βββ nodejs/ # Node.js applications
β β βββ security/ # Security hardening
β βββ π group_vars/ # Global variables
β βββ ποΈ inventory/ # Host inventories
β βββ π templates/ # Configuration templates
β βββ βοΈ ansible.cfg # Ansible configuration
β βββ π run-playbook.sh # Playbook execution script
β βββ π₯ sync-from-github.sh # GitHub synchronization
β
βββ π§ scripts/ # Automation & Deployment Scripts
β βββ π deploy.sh # Complete infrastructure deployment
β βββ π§Ή cleanup.sh # Resource cleanup and teardown
β βββ π health-check.sh # Infrastructure health validation
β
βββ π docs/ # Comprehensive Documentation
β βββ ποΈ architecture.md # Detailed architecture decisions
β βββ π° cost.md # Cost analysis & optimization
β βββ π getting-started.md # Setup and deployment guide
β βββ π security.md # Security best practices
β βββ π monitoring.md # Monitoring and alerting guide
β βββ π change_log.md # Version history and changes
β
βββ π copilot_roles/ # Role-Based Development Guidance
β βββ ποΈ aws_architect.md # Infrastructure design guidance
β βββ π§ sre.md # Site reliability engineering
β βββ π devsecops.md # Security & compliance practices
β βββ π¨βπ» devops_engineer.md # Deployment & automation
β βββ π§ linux_admin.md # System administration
β βββ π python_dev.md # Python development practices
β βββ π logging.md # Logging and monitoring
β
βββ π README.md # π This comprehensive guide
βββ π§ Makefile # Build automation commands
βββ π¦ versions.tf # Terraform version constraints
βββ βοΈ .gitignore # Git ignore patterns
π― Total: 9 Infrastructure Modules | 60+ Configuration Files | Production-Ready