This deployment provides a production-ready OpenEMR system on Amazon EKS with EKS Auto Mode for fully managed EC2 infrastructure with automatic provisioning, configurable autoscaling and a production-ready alerting, monitoring and observability stack.
β οΈ HIPAA Compliance Notice: No matter what you're deploying to AWS full HIPAA compliance requires ...
- Executed Business Associate Agreement (BAA) with AWS
- Organizational policies and procedures
- Staff training and access controls
- Regular security audits and risk assessments
β οΈ End-to-End Test Warning: The end-to-end test script (scripts/test-end-to-end-backup-restore.sh) will create and delete AWS resources (including backup buckets and RDS snapshots) and automatically reset Kubernetes manifests to their default state. Only run in development AWS accounts and commit/stash any uncommitted changes tok8s/manifests before testing.
- EKS Auto Mode Pricing
- Why Auto Mode is Worth the 12% Markup
- Cost Optimization Strategies
- Monthly Cost Breakdown by Organization Size
graph TB
subgraph "AWS Cloud"
subgraph "VPC - Private Network"
subgraph "EKS Auto Mode Cluster"
AM[Auto Mode Controller<br/>Kubernetes 1.34]
BN[Bottlerocket Nodes<br/>SELinux Enforced]
OP[OpenEMR Pods<br/>PHI Processing]
end
subgraph "Data Layer"
RDS[Aurora Serverless V2<br/>MySQL 8.0]
CACHE[Valkey Serverless<br/>Session Cache]
EFS[EFS Storage<br/>Encrypted PHI]
end
subgraph "Security Layer"
KMS[6 KMS Keys<br/>Granular Encryption]
SG[Security Groups]
NP[Network Policies]
WAF[WAFv2<br/>DDoS & Bot Protection]
end
end
subgraph "Compliance & Monitoring"
CW[CloudWatch Logs<br/>365-Day Audit Retention]
CT[CloudTrail<br/>API Auditing]
VFL[VPC Flow Logs<br/>Network Monitoring]
end
end
OP --> RDS
OP --> CACHE
OP --> EFS
AM --> BN
KMS --> OP
BN --> OP
WAF --> OP
SG --> OP
NP --> OP
- Features
- Fully Managed Compute:
- AWS EKS Auto Mode Documentation
- EC2 instances provisioned automatically with 12% management fee
- Kubernetes 1.34:
- Kubernetes v1.34 "Of Wind & Will" Release Blog
- Latest stable version with Auto Mode support
- Bottlerocket OS:
- Bottlerocket OS Github
- Rust-based, immutable, security-hardened Linux with SELinux enforcement and no SSH access
- Fully Managed Compute:
# Option 1: Install via Homebrew (macOS/Linux)
brew tap hashicorp/tap
brew install hashicorp/tap/terraform
# Option 2: Download directly from HashiCorp (All platforms)
# Visit: https://releases.hashicorp.com/terraform/1.13.4/
# Download the appropriate binary for your OS and architecture
# Extract and add to your PATH
# Option 3: Use tfenv for version management (Recommended)
brew install tfenv
tfenv install 1.13.4
tfenv use 1.13.4
# Verify installation
terraform --version # Should show v1.13.4# Install via Homebrew (macOS/Linux)
brew install kubectl helm awscli jq
# Verify installations
kubectl version --client
helm version
aws --version # Must be 2.15.0 or higher
jq --version# Minimum AWS CLI version
aws --version # Must be 2.15.0 or higher
# Required IAM permissions
- eks:CreateCluster (with Kubernetes 1.29+)
- iam:CreateRole (with specific Auto Mode trust policies)
- ec2:CreateVpc (with required CIDR blocks)
- kms:CreateKey (for encryption requirements)
# EKS Auto Mode specific requirements
- Authentication mode: API or API_AND_CONFIG_MAP
- Kubernetes version: 1.29 or higher (1.34 configured)(Recommended) Configure GitHub OIDC β AWS IAM role for CI/CD. See docs/GITHUB_AWS_CREDENTIALS.md.
For production deployments, configure branch rulesets to ensure code quality and enable proper code review:
-
Navigate to Repository Settings:
- Go to your GitHub repository β Settings β Rules β Rulesets
-
Create New Branch Ruleset:
- Click New ruleset β New branch ruleset
- Name:
Main Branch Protection - Target:
mainbranch - Enable: Block force pushes
- Enable: Require linear history
- Important: Add Repository admin Role to bypass list (allows automated workflows)
- Click Create to activate
-
Benefits of Rulesets:
- Modern Approach: More flexible than traditional branch protection
- Granular Control: Define specific rules for different branches
- Bypass Permissions: Grant trusted users ability to bypass rules
- Code Quality: All changes reviewed before merging
- Testing: Required tests must pass before merge
- Security: Prevents accidental pushes to main
- Compliance: Meets enterprise security requirements
# Create feature branch
git checkout -b feature/your-feature-name
# Make changes and commit
git add .
git commit -m "feat: add your feature"
git push origin feature/your-feature-name
# Create pull request on GitHub
# Wait for reviews and status checks
# Merge after approval# Clone repository
git clone <repository-url>
cd openemr-on-eks
# Install Homebrew (https://brew.sh/) on MacOS if necessary
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install required tools on macOS
brew install terraform kubectl helm awscli jq
# install docker if running pre-commit hooks locally by following instructions here: https://docs.docker.com/engine/install/
# Alternative: Install latest Terraform directly from HashiCorp
# Download from: https://releases.hashicorp.com/terraform/1.13.4/
# Or use tfenv for version management:
# brew install tfenv
# tfenv install 1.13.4
# tfenv use 1.13.4
# Configure AWS credentials
aws configure
# Verify installations
terraform --version
kubectl version --client
helm version
aws --version
# Run comprehensive pre-flight checks
cd scripts
./validate-deployment.sh
# Expected output:
π OpenEMR Deployment Validation
================================
1. Checking prerequisites...
β
kubectl is installed
β
aws is installed
β
helm is installed
β
jq is installed
2. Checking AWS credentials...
Checking AWS credential sources...
β
AWS credentials valid
Account ID: <AWS_ACCOUNT_NUMBER>
User/Role: arn:aws:sts::<AWS_ACCOUNT_NUMBER>:assumed-role/<ROLE_NAME>/<USER_NAME>
π Source: Environment variables
π Source: Credentials file found at /path/to/.aws/credentials
π Available profiles: default,
π― Current profile: default
π Config file found at path/to/.aws/config
π Current region: us-west-2
β
Credential sources detected: 2
3. Checking Terraform state...
β
Terraform state file exists
βΉοΈ Terraform state exists but no resources deployed
π‘ This indicates a clean slate for deployment
π‘ This is normal for first-time deployments
4. Checking cluster access...
βΉοΈ EKS cluster 'openemr-eks' not found
π‘ This is expected for first-time deployments
π‘ This is normal for first-time deployments
5. Checking AWS resources...
Checking AWS resources...
βΉοΈ VPC not found
π‘ This is expected for first-time deployments
βΉοΈ RDS Aurora cluster not found
π‘ This is expected for first-time deployments
βΉοΈ ElastiCache Valkey cluster not found
π‘ This is expected for first-time deployments
βΉοΈ EFS file system not found
π‘ This is expected for first-time deployments
π‘ This is normal for first-time deployments
6. Checking Kubernetes resources...
Checking Kubernetes resources...
β οΈ Namespace 'openemr' not found
π‘ Will be created during deployment
β
OpenEMR not yet deployed (clean deployment)
β
EKS Auto Mode handles compute automatically
π‘ No Karpenter needed - Auto Mode manages all compute
π‘ This is normal for first-time deployments
7. Checking security configuration...
Checking security configuration...
βΉοΈ EKS cluster not found - security configuration will be applied during deployment
π Planned deployment features:
β’ OpenEMR 7.0.3 with HTTPS-only access (port 443)
β’ EKS Auto Mode for managed EC2 compute
β’ Aurora Serverless V2 MySQL database
β’ Valkey Serverless cache (Redis-compatible)
β’ IP-restricted cluster endpoint access
β’ Private subnet deployment
β’ 6 dedicated KMS keys (EKS, EFS, RDS, ElastiCache, S3, CloudWatch)
β’ Network policies and Pod Security Standards
π First-time deployment validation completed!
β
Prerequisites and AWS credentials are ready
π You're all set for your first deployment!
Next steps for first-time deployment:
1. cd /path/to/openemr-on-eks/terraform
2. terraform init
3. terraform plan
4. terraform apply
5. cd /path/to/GitHub/openemr-on-eks/k8s
6. ./deploy.sh
β±οΈ Expected deployment time: 40-45 minutes total
β’ Infrastructure (Terraform): 30-32 minutes
β’ Application (Kubernetes): 7-11 minutes
π Deployment Recommendations
=============================
π Security Best Practices:
β’ HTTPS-only access (port 443) - HTTP traffic is refused
β’ Disable public access after deployment
β’ Use strong passwords for all services
β’ Enable AWS WAF for production
β’ Regularly update container images
β’ Monitor audit logs for compliance
π° Cost Optimization:
β’ Aurora Serverless V2 scales automatically
β’ EKS Auto Mode: EC2 costs + management fee for full automation
β’ Valkey Serverless provides cost-effective caching
β’ Monitor usage with CloudWatch dashboards
β’ Set up cost alerts and budgets
π Monitoring Setup:
β’ CloudWatch logging with Fluent Bit sidecar (included in OpenEMR deployment)
β’ Basic deployment: CloudWatch logs only
β’ **β
Logging Status**: Fully functional with test logs, Apache logs, and forward protocol support
β’ Optional: Enhanced monitoring stack: cd /path/to/openemr-on-eks/monitoring && ./install-monitoring.sh
β’ Enhanced stack includes:
- Prometheus v79.1.0 (metrics & alerting)
- Grafana (dashboards with auto-discovery)
- Loki v6.45.2 (log aggregation with S3 storage)
- Jaeger v3.4.1 (distributed tracing)
- AlertManager (Slack integration support)
- OpenEMR-specific monitoring (ServiceMonitor, PrometheusRule)
β’ **Loki S3 Storage**: Loki uses AWS S3 for production-grade log storage. As [recommended by Grafana](https://grafana.com/docs/loki/latest/setup/install/helm/configure-storage/), we configure object storage via cloud provider for production deployments. This provides better durability, scalability, and cost-effectiveness compared to filesystem storage.
β’ Configure alerting for critical issues
β’ Regular backup testingcd ../terraform
# Copy and customize variables
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with healthcare-specific settings:
cat > terraform.tfvars <<EOF
# Cluster Configuration
cluster_name = "openemr-eks"
kubernetes_version = "1.34" # Latest stable with Auto Mode
aws_region = "us-west-2"
# OpenEMR Application Configuration
openemr_version = "7.0.3" # Latest stable OpenEMR version
# Compliance Settings
backup_retention_days = 30
audit_logs_retention_days = 365
enable_waf = true
# Healthcare Workload Scaling
aurora_min_capacity = 0.5 # Always-on minimum
aurora_max_capacity = 16 # Peak capacity
redis_max_data_storage = 20
redis_max_ecpu_per_second = 5000
# Network Configuration
vpc_cidr = "10.0.0.0/16"
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.103.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
# Security Configuration
rds_deletion_protection = false # Set to false for testing, true for production
enable_waf = true # Enable AWS WAF for additional security (recommended for production)
EOFThe enable_waf parameter controls AWS WAFv2 deployment for enhanced security:
- AWS Managed Rules: Core Rule Set (CRS), SQL Injection protection, Known Bad Inputs
- Rate Limiting: Blocks excessive requests (2000 requests per 5 minutes per IP)
- Bot Protection: Blocks suspicious User-Agent patterns (bot, scraper, crawler, spider)
- Comprehensive Logging: WAF logs stored in S3 with 90-day retention
- CloudWatch Metrics: Real-time monitoring and alerting capabilities
# Enable WAF (recommended for production)
enable_waf = true
# Disable WAF (for testing/development)
enable_waf = false- Automatic ALB Association: WAF automatically associates with Application Load Balancer
- Kubernetes Integration: WAF ACL ARN automatically injected into ingress configuration
- Security Headers: Enhanced security headers and DDoS protection
The WAF configuration is defined in terraform/waf.tf and includes:
- Web ACL: Regional WAFv2 Web ACL with multiple security rules
- S3 Logging: Direct WAF logs to S3 bucket with lifecycle policies
- Security Rules:
- AWS Managed Rules for common attack patterns
- Rate limiting to prevent DDoS attacks
- User-Agent filtering for bot protection
- Conditional Deployment: All WAF resources are created only when
enable_waf = true
- Log Destination: S3 bucket with 90-day retention
- CloudWatch Metrics: Real-time monitoring for all WAF rules
- Log Analysis: WAF logs can be analyzed for security insights and threat detection
# Initialize Terraform
terraform init -upgrade
# Validate configuration
terraform validate
# Review deployment plan
terraform plan -out=tfplan
# Deploy infrastructure (~30-32 minutes)
terraform apply tfplan
# (OPTIONAL) Deploy infrastructure and measure the time it takes
time terraform apply --auto-approve tfplanThe modular structure allows for targeted deployments and efficient development:
# Plan changes for specific services
terraform plan -target=module.vpc # VPC changes only
terraform plan -target=aws_rds_cluster.openemr # Database changes only
terraform plan -target=aws_eks_cluster.openemr # EKS changes only# Apply changes to specific resources
terraform apply -target=aws_kms_key.rds # Update RDS encryption
terraform apply -target=aws_efs_file_system.openemr # Update EFS configuration# View resources by file/service
terraform state list | grep rds # All RDS resources
terraform state list | grep kms # All KMS resources# Validate specific configurations
terraform validate # Validate all files
terraform fmt -check # Check formatting
terraform fmt -recursive # Format all files# Testing deployment (deletion protection disabled)
terraform apply -var-file="terraform-testing.tfvars"
# Custom configuration
terraform apply -var="rds_deletion_protection=false"cd ../k8s
# Update kubeconfig
aws eks update-kubeconfig --region us-west-2 --name openemr-eks
# For testing deployments (~7-11 minutes) (uses self-signed certificates)
./deploy.sh
# To time run for testing deployments (uses self-signed certificates)
time ./deploy.sh
# For production deployments (recommended: ACM certificate with auto-renewal)
./ssl-cert-manager.sh request openemr.yourdomain.com
./ssl-cert-manager.sh deploy <certificate-arn>
# Verify deployment
kubectl get pods -n openemr -o wide
kubectl get nodeclaim
# Verify WAF integration (if enabled)
kubectl get ingress -n openemr -o yaml | grep wafv2-acl-arn# Get LoadBalancer URL (HTTPS-only so add "https://" to the beginning to make it work in the browser)
kubectl get svc openemr-service -n openemr
# Get admin credentials
cat openemr-credentials.txtπ Security Note: The load balancer only listens on port 443 (HTTPS). HTTP traffic on port 80 will be refused by the load balancer for maximum security. All access must use HTTPS.
# Option A: Temporary security (can toggle access as needed)
cd ../scripts
./cluster-security-manager.sh disable
# Option B: Production security (recommended)
# Deploy jumpbox in private subnet and permanently disable public access
# See "Production Best Practice: Jumpbox Architecture" section belowNote: The core OpenEMR deployment includes CloudWatch logging only. This optional step installs the Prometheus/Grafana observability stack for monitoring, dashboards, and alerting.
# β οΈ IMPORTANT: If using jumpbox architecture (recommended for production):
# SSH to your jumpbox and run monitoring installation from there
# If not using jumpbox, re-enable cluster access temporarily:
cd ../scripts
./cluster-security-manager.sh enable
# Install comprehensive monitoring stack (~8 minutes)
cd ../monitoring
./install-monitoring.sh
# Optional: Install with Slack alerts
export SLACK_WEBHOOK_URL="https://hooks.slack.com/services/..."
export SLACK_CHANNEL="#openemr-alerts"
./install-monitoring.sh
# Optional: Install with ingress and basic auth
export ENABLE_INGRESS="1"
export GRAFANA_HOSTNAME="grafana.yourdomain.com"
export ENABLE_BASIC_AUTH="1"
./install-monitoring.sh
# If not using jumpbox, disable access again after monitoring installation
cd ../scripts
./cluster-security-manager.sh disableWhat's included by default in OpenEMR deployment:
- β CloudWatch log forwarding via Fluent Bit
- β CloudWatch metrics (AWS infrastructure metrics)
What this optional monitoring stack adds:
- π Prometheus: kube-prometheus-stack v79.1.0 (metrics collection & alerting)
- π Grafana: 20+ pre-built Kubernetes dashboards with auto-discovery and secure credentials
- π Loki: v6.45.2 single-binary (log aggregation with S3 storage and 720h retention)
- Production-Grade Storage: Uses AWS S3 for log storage (as recommended by Grafana) instead of filesystem storage
- Benefits: Better durability, scalability, cost-effectiveness, and lifecycle management compared to filesystem storage
- IAM Integration: Uses IRSA (IAM Roles for Service Accounts) for secure, credential-free S3 access
- π Jaeger: v3.4.1 (distributed tracing)
- π¨ AlertManager: Slack integration support with customizable notifications
- π― OpenEMR Integration: Automatically and continually collects a broad set of metrics from the OpenEMR namespace where your application is running so you can precisely monitor the health and performance of your OpenEMR deployment in real-time. (see monitoring documentation guidance for creating custom dashboards)
- πΎ Optimized Storage: GP3 with 3000 IOPS for time-series data performance
- π Enhanced Security: RBAC, network policies, security contexts, encrypted storage, WAFv2 protection
- π Parallel Installation: Components install simultaneously for faster deployment
- π Optional Ingress: NGINX ingress with TLS and basic authentication support
- π Audit Logging: Audit trails for all monitoring operations
- βοΈ Intelligent Autoscaling: HPA for all components integrated with EKS Auto Mode
openemr-on-eks/
βββ .github/ # GitHub Actions and workflows
β βββ README.md # Comprehensive GitHub workflows documentation
β βββ workflows/ # CI/CD automation workflows
β βββ ci-cd-tests.yml # Automated testing and quality assurance
β βββ manual-releases.yml # Manual release workflow for version management
β βββ version-check.yml # Automated version awareness checking
βββ terraform/ # Infrastructure as Code (Modular Structure)
β βββ README.md # Complete Terraform infrastructure documentation
β βββ main.tf # Terraform providers and data sources
β βββ variables.tf # Input variables and defaults (including autoscaling)
β βββ outputs.tf # Output values for other components
β βββ vpc.tf # VPC and networking resources
β βββ eks.tf # EKS cluster with Auto Mode
β βββ kms.tf # KMS keys and encryption
β βββ rds.tf # Aurora Serverless V2 database
β βββ elasticache.tf # Valkey Serverless cache
β βββ efs.tf # EFS file system with elastic performance
β βββ waf.tf # WAFv2 security configuration
β βββ s3.tf # S3 buckets and policies
β βββ cloudwatch.tf # CloudWatch log groups
β βββ iam.tf # IAM roles and policies
β βββ cloudtrail.tf # CloudTrail logging
β βββ terraform.tfvars.example # Example variable values with autoscaling configs
β βββ terraform-testing.tfvars # Testing configuration (deletion protection disabled)
β βββ terraform-production.tfvars # Production configuration reference (deletion protection enabled)
βββ oidc_provider/ # Terraform + scripts for GitHub β AWS OIDC (preferred)
β βββ README.md # OIDC provider setup and configuration guide
β βββ main.tf # GitHub OIDC provider and IAM role definitions
β βββ variables.tf # OIDC provider input variables (repository, branch, etc.)
β βββ outputs.tf # OIDC provider outputs (role ARN, provider ARN)
β βββ scripts/ # OIDC provider management scripts
β βββ deploy.sh # Deploy OIDC provider and IAM roles
β βββ destroy.sh # Destroy OIDC provider and IAM roles
β βββ validate.sh # Validate OIDC provider configuration
βββ k8s/ # Kubernetes manifests
β βββ README.md # Complete Kubernetes manifests documentation
β βββ deploy.sh # Main deployment script (deploys OpenEMR to the EKS cluster)
β βββ namespace.yaml # Namespace definitions with Pod Security Standards
β βββ storage.yaml # Storage classes (EFS for OpenEMR, optimized EBS for monitoring)
β βββ security.yaml # RBAC, service accounts, and security policies
β βββ network-policies.yaml # Network policies for our deployment
β βββ secrets.yaml # OpenEMR Admin, Database and Valkey credential templates
β βββ deployment.yaml # OpenEMR application deployment with MYSQL_DATABASE env var
β βββ service.yaml # Defines OpenEMR service and load balancer configuration
β βββ hpa.yaml # Horizontal Pod Autoscaler configuration
β βββ ingress.yaml # Ingress controller configuration
β βββ ssl-renewal.yaml # SSL certificate renewal automation
β βββ logging.yaml # Fluent Bit sidecar configuration for log collection
β βββ openemr-credentials.txt # OpenEMR admin credentials (created during deployment)
βββ monitoring/ # Advanced observability stack (optional)
β βββ install-monitoring.sh # Main installation script
β βββ README.md # Comprehensive monitoring documentation
β βββ openemr-monitoring.conf.example # Configuration template (manual creation)
β βββ openemr-monitoring.conf # Configuration file (optional, manual creation)
β βββ prometheus-values.yaml # Generated Helm values (created during installation)
β βββ prometheus-values.yaml.bak # Backup of values file (created during installation)
β βββ openemr-monitoring.log # Installation log (created during installation)
β βββ openemr-monitoring-audit.log # Audit trail (created during installation)
β βββ helm-install-kps.log # Prometheus stack install log (created during installation)
β βββ helm-install-loki.log # Loki install log (created during installation)
β βββ debug-YYYYMMDD_HHMMSS.log # Debug info on errors (created on installation errors)
β βββ credentials/ # Secure credentials directory (created during installation)
β β βββ monitoring-credentials.txt # Access credentials for all services (created during installation)
β β βββ grafana-admin-password # Grafana admin password only (created during installation)
β βββ backups/ # Configuration backups directory (created during installation, future use)
βββ scripts/ # Operational and deployment scripts
β βββ README.md # Complete scripts documentation and maintenance guide
β βββ check-openemr-versions.sh # OpenEMR version discovery and management
β βββ version-manager.sh # Comprehensive version awareness checking
β βββ validate-deployment.sh # Pre-deployment validation and health checks
β βββ validate-efs-csi.sh # EFS CSI driver validation and troubleshooting
β βββ clean-deployment.sh # Enhanced deployment cleanup (deletes PVCs and stale configs)
β βββ restore-defaults.sh # Restore deployment files to default template state
β βββ openemr-feature-manager.sh # OpenEMR feature configuration management
β βββ ssl-cert-manager.sh # SSL certificate management (ACM integration)
β βββ ssl-renewal-manager.sh # Self-signed certificate renewal automation
β βββ cluster-security-manager.sh # Cluster access security management
β βββ backup.sh # Cross-region backup procedures
β βββ restore.sh # Cross-region disaster recovery (with DB reconfiguration)
β βββ destroy.sh # Complete infrastructure destruction (bulletproof cleanup)
β βββ test-end-to-end-backup-restore.sh # End-to-end backup/restore testing
β βββ run-test-suite.sh # CI/CD test suite runner
β βββ test-config.yaml # Test configuration for CI/CD framework
βββ docs/ # Complete documentation
β βββ README.md # Complete documentation index and maintenance guide
β βββ DEPLOYMENT_GUIDE.md # Step-by-step deployment guide
β βββ DEPLOYMENT_TIMINGS.md # Measured timing data for all operations (based on E2E test runs)
β βββ AUTOSCALING_GUIDE.md # Autoscaling configuration and optimization
β βββ MANUAL_RELEASES.md # Guide to the OpenEMR on EKS release system
β βββ VERSION_MANAGEMENT.md # Version awareness and dependency management
β βββ TROUBLESHOOTING.md # Troubleshooting and solutions
β βββ BACKUP_RESTORE_GUIDE.md # Comprehensive backup and restore guide
β βββ LOGGING_GUIDE.md # OpenEMR 7.0.3.4 Enhanced Logging
β βββ TESTING_GUIDE.md # Comprehensive CI/CD testing framework
β βββ END_TO_END_TESTING_REQUIREMENTS.md # Mandatory testing procedure
β βββ GITHUB_AWS_CREDENTIALS.md # GitHub β AWS OIDC setup and credential management
βββ images/ # Visual assets and branding materials
β βββ README.md # Complete images documentation and usage guidelines
β βββ openemr_on_eks_logo.png # Main project logo for documentation and branding (optimized for web)
β βββ openemr_on_eks_github_banner.png # GitHub repository banner for social media display
βββ .pre-commit-config.yaml # Pre-commit hooks configuration
βββ .yamllint # YAML linting configuration (relaxed rules)
βββ .markdownlint.json # Markdown linting configuration (relaxed rules)
βββ VERSION # Current project version
βββ versions.yaml # Version awareness configuration
βββ LICENSE # Project license
EKS Auto Mode adds a 12% management fee on top of standard EC2 costs:
Total Cost = EC2 Instance Cost + (EC2 Instance Cost Γ 0.12) + EKS Control Plane ($73/month)
EKS Auto Modeβs 12% compute markup isnβt just for convenience β itβs for eliminating entire categories of operational overhead, reducing downtime risk, and often lowering total cost when factoring in efficiency gains.
- No node group management β AWS provisions, right-sizes, and manages the lifecycle of compute nodes automatically.
- Automatic OS updates and patching β Security patches and kernel upgrades without downtime.
- No AMI selection/maintenance β AWS handles image selection and maintenance.
- Zero capacity planning β Workload requirements drive provisioning; no need to over/under-provision.
This replaces the ongoing SRE/DevOps effort for node management, saving both headcount and operational complexity.
While per-vCPU costs are higher, Auto Mode can reduce total monthly spend by aligning compute supply closely with demand:
- Bin-packing efficiency β Pods are scheduled onto right-sized nodes automatically, minimizing waste from underutilized instances.
- Automatic Node Optimization with Karpenter β Karpenter dynamically launches the most efficient instance types based on pod resource requests, workload mix, and availability zone capacity. This means fewer idle resources, better spot usage (if enabled), and optimal balance between price and performance without manual tuning.
- Ephemeral on-demand nodes β Compute is provisioned only for the duration of workload execution, then scaled down immediately when idle, eliminating costs from long-lived, underutilized nodes.
- No need for capacity planning β Teams donβt need to guess at cluster sizing or maintain large safety buffers. Auto Mode reacts in real time to workloads, reducing both operational overhead and cost.
- Workload-driven elasticity β The system can scale up quickly for bursty traffic (e.g., peak patient visits in OpenEMR) and scale back down after demand subsides, ensuring spend closely tracks actual usage.
π‘ Example: A medium-sized OpenEMR deployment with hundreds of concurrent users might require 6 m5.large nodes under static provisioning (~$420/month). With EKS Auto Mode and Karpenter, the same workload could run on a mix of a few optimized Graviton instances that scale down after hours, cutting costs to ~$320/month. Savings come from eliminating idle nodes, continuously resizing compute to actual demand and whenever possible trying to use the most cost-efficient nodes to run a workload.
For spiky or unpredictable workloads, this often offsets the markup entirely.
- Managed upgrades β Node fleets are always kept compatible with the control plane.
- Zero-downtime replacements β AWS handles cordoning, draining, and re-scheduling pods.
- Built-in fault tolerance β Automatic AZ balancing and replacement.
These guardrails reduce the risk of human error and outages.
- Developer focus β Teams spend more time on application reliability and performance tuning.
- Faster delivery β No delays from infra maintenance or capacity planning.
- No deep infra expertise required β Avoids the need for Karpenter/EC2/AMI operational knowledge.
The real return on investment often comes from time gains and the reliability of the system.
- Small/medium teams without dedicated infra staff.
- Highly variable workloads (batch jobs, CI/CD runners, ML training).
- Security/compliance-critical environments where timely patching is non-negotiable.
- Workloads with frequent idle time β You only pay for actual usage.
- Compute Savings Plans: Commit to 1-3 year terms for 72% savings
- Graviton Instances: ARM-based instances with 20% cost reduction
- Spot Instances: Offers substantial discount versus on-demand instances
| Component | Configuration | Monthly Cost | Auto Mode Fee |
|---|---|---|---|
| EKS Control Plane | 1 cluster | $73 | N/A |
| EC2 Compute (Auto Mode) | Average ~2 t3.medium equiv. ($0.0416/hr) | $60 | $7.20 |
| Aurora Serverless V2 | 0.5-4 ACUs (AVG of 1 ACU) | $87 | N/A |
| Valkey Serverless | 0.25GB (AVG data stored; mostly user sessions), 1500 ECPUs | $19 | N/A |
| EFS Storage | 100GB | $30 | N/A |
| NAT Gateway | 3 gateway (static cost; add $0.045 price per GB processed) | $99 | N/A |
| WAFv2 | 5 rules + 1 ACL | $10 | N/A |
| Total | $385 |
| Component | Configuration | Monthly Cost | Auto Mode Fee |
|---|---|---|---|
| EKS Control Plane | 1 cluster | $73 | N/A |
| EC2 Compute (Auto Mode) | Average ~4 t3.large equiv. ($0.0832/hr) | $243 | $29.16 |
| Aurora Serverless V2 | 0.5-8 ACUs (AVG of 2 ACU) | $174 | N/A |
| Valkey Serverless | 0.5GB (AVG data stored; mostly user sessions), 3000 ECPUs | $38 | N/A |
| EFS Storage | 500GB | $150 | N/A |
| NAT Gateway | 3 gateway (static cost; add $0.045 price per GB processed) | $99 | N/A |
| WAFv2 | 5 rules + 1 ACL | $10 | N/A |
| Total | $816 |
| Component | Configuration | Monthly Cost | Auto Mode Fee |
|---|---|---|---|
| EKS Control Plane | 1 cluster | $73 | N/A |
| EC2 Compute (Auto Mode) | ~8 m5.xlarge equiv. ($0.192/hr) | $1,121 | $135 |
| Aurora Serverless V2 | 0.5-16 ACUs (AVG of 6 ACU) | $522 | N/A |
| Valkey Serverless | 1GB (AVG data stored; mostly user sessions), 6000 ECPUs | $76 | N/A |
| EFS Storage | 2TB | $600 | N/A |
| NAT Gateway | 3 gateways (static cost; add $0.045 price per GB processed) | $99 | N/A |
| WAFv2 | 5 rules + 1 ACL | $10 | N/A |
| Total | $2636 |
- Compute Pricing
- Compute Orchestration Pricing
- Database Pricing
- Web-Caching Pricing
- Data Storage Pricing
- Network Infrastructure Pricing
- Web Application Security Pricing
WAFv2 fixed pricing is based on Web ACL and rule processing (you will also pay $0.60 per 1 million requests):
# Note for detailed WAF pricing see here: https://aws.amazon.com/waf/pricing/
# Note you will also pay $0.60 per 1 million requests
# - 1 Web ACL: $5.00/month
# - 5 Rules: 5 Γ $1.00 = $5.00/month
# Total: $5.00 + $5.00 = $10.00/month- Rule Efficiency: Minimize the number of rules while maintaining security
- Rule Consolidation: Combine similar rules to reduce rule count
- AWS Managed Rules: Use AWS Managed Rules when possible for cost-effectiveness
- Log Retention: S3 lifecycle policies for cost-effective log storage
After initial setup is complete, the recommended production security architecture is to:
- Permanently disable public endpoint access to the EKS cluster
- Deploy a jumpbox (bastion host) in the same private subnet as the EKS cluster
- Access the cluster only through the jumpbox for all management tasks
- Zero external attack surface - EKS API server not accessible from internet
- Centralized access control - All cluster access goes through one secure point
- Audit trail - All administrative actions logged through jumpbox
- Network isolation - Jumpbox in same VPC/subnet as EKS nodes
- Cost effective - Minimal resources needed (1 vCPU, 1GB RAM) for kubectl access
# Create jumpbox in private subnet with EKS cluster access
# - Minimum requirements: 1 vCPU, 1GB RAM (sufficient for kubectl/helm operations)
# - Subnet: Same private subnet as EKS worker nodes
# - Security group: Allow SSH from your IP, allow HTTPS to EKS API
# - IAM role: EKS cluster access permissions
# - Tools: kubectl, helm, aws-cli pre-installed# 1. SSH to jumpbox (only entry point)
ssh -i your-key.pem ec2-user@jumpbox-private-ip
# 2. From jumpbox, manage EKS cluster
kubectl get nodes
helm list -A
terraform plan # If Terraform state accessible from jumpboxSince the jumpbox is in a private subnet (no direct internet access), you need secure methods to reach it:
Most secure - no SSH keys, no open ports, full audit logging
# Prerequisites: Jumpbox needs SSM agent and IAM role with SSM permissions
# Connect to jumpbox via AWS console or CLI
aws ssm start-session --target i-1234567890abcdef0
# From Session Manager session, use kubectl normally
kubectl get nodes
helm list -ABenefits:
- β No SSH keys to manage or rotate
- β No inbound ports open on jumpbox
- β Full session logging to CloudWatch
- β Multi-factor authentication via AWS IAM
- β Works from anywhere with AWS CLI access
For teams needing persistent VPN access
# Set up AWS Client VPN endpoint in same VPC
# Download VPN client configuration
# Connect via OpenVPN client, then SSH to jumpbox
# After VPN connection:
ssh -i your-key.pem ec2-user@jumpbox-private-ipBenefits:
- β Secure tunnel into private network
- β Multiple users can access simultaneously
- β Works with hospital VPN policies
- β Can access multiple private resources
For permanent hospital network connection
# AWS Site-to-Site VPN connects hospital network to AWS VPC
# Hospital staff access jumpbox as if it's on local network
ssh -i your-key.pem ec2-user@jumpbox-private-ipBenefits:
- β Seamless integration with hospital network
- β No additional client software needed
- β Consistent with existing IT policies
- β High bandwidth for large operations
Two-hop architecture for maximum security
# Public bastion (minimal, hardened) -> Private jumpbox -> EKS cluster
ssh -i bastion-key.pem ec2-user@bastion-public-ip
# From bastion:
ssh -i jumpbox-key.pem ec2-user@jumpbox-private-ipBenefits:
- β Defense in depth
- β Public bastion can be heavily monitored
- β Private jumpbox completely isolated
- β Can implement different security policies per layer
π RDS Deletion Protection: For production deployments, ensure rds_deletion_protection = true in your Terraform variables to prevent accidental data loss.
# 1. Multi-Factor Authentication
# Configure MFA for all AWS IAM users accessing jumpbox
# https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_mfa.html
# 2. Time-based Access Controls
# Use IAM policies with time conditions
# https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_aws-dates.html
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "ssm:StartSession",
"Resource": "arn:aws:ec2:*:*:instance/i-jumpbox-instance-id",
"Condition": {
"DateGreaterThan": {"aws:CurrentTime": "2020-04-01T00:00:00Z"},
"DateLessThan": {"aws:CurrentTime": "2020-06-30T23:59:59Z"},
}
}
]
}
# 3. IP Restrictions (if using SSH)
# Restrict SSH access to known hospital/office IPs
# Other good guidance can be found here: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/access-a-bastion-host-by-using-session-manager-and-amazon-ec2-instance-connect.html
# 4. Session Monitoring
# Set up CloudWatch alarms for suspicious activity
aws cloudwatch put-metric-alarm \
--alarm-name "Jumpbox-Unusual-Access" \
--alarm-description "Alert on unusual jumpbox access patterns" \
--metric-name SessionCount \
--namespace AWS/SSM \
--statistic Sum \
--evaluation-periods 288 \
--period 300 \
--threshold 5 \
--comparison-operator GreaterThanThreshold# Step 1: After initial deployment, permanently disable public access
aws eks update-cluster-config \
--region ${var.aws_region} \
--name ${var.cluster_name} \
--resources-vpc-config endpointConfigPublicAccess=false,endpointConfigPrivateAccess=true
# Step 2: Deploy jumpbox in same private subnet (via Terraform or console)
# Step 3: Configure secure access method (SSM Session Manager recommended)
# Step 4: Set up comprehensive logging and monitoring
# Step 5: All future cluster management through jumpbox only- Cluster updates take 2-3 minutes to apply
- Applications continue running when public access is disabled
- Internal communication unaffected - only external kubectl/API access is blocked
- Always re-enable before running Terraform or kubectl commands (unless using jumpbox)
- Jumpbox approach eliminates need to toggle public access for routine operations
The scripts/ directory contains essential operational tools for managing your OpenEMR deployment:
cd scripts && ./check-openemr-versions.sh [--latest|--count N|--search PATTERN]Purpose: Discover available OpenEMR Docker image versions from Docker Hub Features: Latest version check, version search, current deployment version display, OpenEMR versioning pattern awareness When to use: Before version upgrades, checking for new releases, version planning
cd scripts && ./openemr-feature-manager.sh {enable|disable|status} {api|portal|all}Purpose: Manage OpenEMR API and Patient Portal features with database-level enforcement Features: Runtime feature toggling, database configuration, network policy updates, security validation When to use: Enabling/disabling features post-deployment, security hardening, compliance requirements
cd scripts && ./validate-deployment.shPurpose: Comprehensive health check for the entire OpenEMR deployment Checks: Cluster connectivity, infrastructure status, application health, SSL certificates When to use: Before deployments, during troubleshooting, routine health checks
cd scripts && ./validate-efs-csi.shPurpose: Specialized validation for EFS CSI driver and storage issues Checks: EFS CSI controller status, IAM permissions, PVC provisioning, storage accessibility When to use: When pods are stuck in Pending, storage issues, after infrastructure changes
cd scripts && ./clean-deployment.sh [OPTIONS]Purpose: Clean OpenEMR deployment while preserving infrastructure Actions: Removes namespace, cleans PVCs/PVs, restarts EFS CSI controller, cleans backup files When to use: Before fresh deployments, when deployment is corrupted, testing scenarios Safety: Preserves EKS cluster, RDS database, and all infrastructure - only removes application layer
Options:
--forceor-f: Skip confirmation prompts (great for automated testing)--helpor-h: Show usage information
Examples:
./clean-deployment.sh # Interactive cleanup with prompts
./clean-deployment.sh --force # Force cleanup without prompts
./clean-deployment.sh -f # Force cleanup without prompts (short form)cd scripts && ./restore-defaults.sh [--backup] [--force]Purpose: Restore all deployment files to their default template state for clean git tracking
Actions: Resets YAML files to templates, removes .bak files, cleans generated files, preserves configuration
When to use: Before git commits, after deployments, when preparing for configuration changes, team collaboration
Safety: Preserves terraform.tfvars, infrastructure state, and all documentation
Requirements: Git repository (uses git checkout to restore original files)
cd scripts && ./destroy.sh [--force]Purpose: Complete and bulletproof destruction of all OpenEMR infrastructure Features:
- Disables RDS deletion protection automatically
- Deletes all snapshots to prevent automatic restoration
- Cleans up orphaned resources (security groups, load balancers, WAF)
- Comprehensive AWS resource cleanup
- Terraform destroy with retry logic
- Verification of complete cleanup
When to use:
- Complete infrastructure teardown
- Development environment cleanup
- End-to-end testing preparation
- Disaster recovery scenarios
- Cost optimization (removing unused resources)
Safety Features:
- Interactive confirmation prompts (unless
--forceused) - AWS credentials validation before execution
- Prerequisites checking (terraform, aws, kubectl)
- Retry logic for AWS API calls
- Comprehensive verification of cleanup completion
Options:
--force: Skip confirmation prompts and use force mode
Examples:
./destroy.sh # Interactive destruction with prompts
./destroy.sh --force # Automated destruction (CI/CD) - no prompts- Irreversible: This action completely destroys all infrastructure and cannot be undone
- Comprehensive: Removes ALL resources including Terraform state, RDS clusters, snapshots, S3 buckets
- Bulletproof: Handles edge cases like deletion protection, orphaned resources, and AWS API rate limits
- Verification: Confirms complete cleanup before declaring success
cd scripts && ./cluster-security-manager.sh {enable|disable|status|auto-disable|check-ip}Purpose: Manage EKS cluster public access for security Features: IP-based access control, auto-disable scheduling, security status monitoring When to use: Before cluster management, security hardening, IP address changes
cd scripts && ./ssl-cert-manager.sh {request|validate|deploy|status|cleanup}Purpose: Manage SSL certificates with automatic DNS validation Features: ACM certificate requests, DNS validation, deployment automation When to use: Setting up production SSL, certificate renewals, domain changes
cd scripts && ./ssl-renewal-manager.sh {deploy|status|run-now|logs|cleanup}Purpose: Automate self-signed certificate renewal for development environments Features: Kubernetes CronJob management, certificate rotation, renewal monitoring When to use: Development environments, testing, when ACM certificates aren't needed
cd scripts && ./backup.shPurpose: Create comprehensive cross-region backups of all OpenEMR components Features: Aurora snapshots, EFS backups, K8s configs, application data, rich metadata Cross-Region: Automatic backup to different AWS regions for disaster recovery When to use: Before major changes, routine backup schedules, disaster recovery preparation
π Smart Polling & Timeout Management
- Intelligent Waiting: Automatically waits for RDS clusters and snapshots to be available
- Configurable Timeouts: Set custom timeouts via environment variables for different environments
- Real-Time Updates: Status updates every 30 seconds with remaining time estimates
- Production Ready: Handles large databases and busy clusters with appropriate waiting periods
Environment Variables:
export CLUSTER_AVAILABILITY_TIMEOUT=1800 # 30 min default
export SNAPSHOT_AVAILABILITY_TIMEOUT=1800 # 30 min default
export POLLING_INTERVAL=30 # 30 sec defaultcd scripts && ./restore.sh <backup-bucket> <snapshot-id> [backup-region]Purpose: Simple, reliable restore from cross-region backups during disaster recovery Features:
- One-command restore with auto-detection
- Cross-region snapshot handling with automatic copying
- Auto-reconfiguration of database and Redis connections
- Manual fallback instructions if automated process fails
π Key Improvements
- Simplified Usage: Only requires backup bucket and snapshot ID
- Auto-Detection: Automatically detects EKS cluster from Terraform
- Faster Execution: Uses existing OpenEMR pods (no temporary resources)
- Smart Reconfiguration: Automatically updates database and Redis settings
- Manual Fallback: Built-in step-by-step manual restore instructions
When to use: Disaster recovery, data corruption recovery, environment migration, testing
Environment Variables:
export CLUSTER_AVAILABILITY_TIMEOUT=1800 # 30 min default
export SNAPSHOT_AVAILABILITY_TIMEOUT=1800 # 30 min default
export POLLING_INTERVAL=30 # 30 sec default# Health check
./validate-deployment.sh
# Security status
./cluster-security-manager.sh status# 1. General validation
./validate-deployment.sh
# 2. Storage-specific issues
./validate-efs-csi.shThe infrastructure is organized into modular Terraform files for better maintainability:
main.tf- Terraform providers, required versions, and data sourcesvariables.tf- All input variables with descriptions and defaultsoutputs.tf- Resource outputs for integration with Kubernetes
vpc.tf- VPC, subnets, NAT gateways, and flow logs for regulatory compliancekms.tf- 6 dedicated KMS keys for granular encryptioniam.tf- Service account roles with Auto Mode trust policieswaf.tf- Configures WAFv2 for our ingress to our application
eks.tf- EKS cluster with Auto Mode configurationefs.tf- EFS file system with elastic performance modes3.tf- S3 buckets for ALB logs with lifecycle policies
rds.tf- Aurora Serverless V2 MySQL with encryptionelasticache.tf- Valkey Serverless cache
cloudwatch.tf- Log groups with retention settingscloudtrail.tf- CloudTrail logging with encrypted S3 storage
The Kubernetes manifests are organized for clear separation of concerns:
deployment.yaml- OpenEMR application with Auto Mode optimization (version:${OPENEMR_VERSION})service.yaml- Defines OpenEMR service and the load balancer configuration (including optional AWS Certificate Manager and AWS WAF v2 integrations)secrets.yaml- Database credentials and Redis authentication
π Version Management: OpenEMR version is always specified as
${OPENEMR_VERSION}in manifests and substituted during deployment from Terraform configuration.
storage.yaml- EFS storage classes and PVCsnamespace.yaml- Namespace with Pod Security Standards
security.yaml- RBAC, service accounts, Pod Disruption Budgetingress.yaml- Ingress controller configurationnetwork-policies.yaml- Networking policies for our deployment
logging.yaml- Fluent Bit sidecar configuration for log collectionhpa.yaml- Horizontal Pod Autoscaler configurationssl-renewal.yaml- Automated SSL certificate renewal
# Deploy specific components
kubectl apply -f namespace.yaml # Namespaces only
kubectl apply -f storage.yaml # Storage resources
kubectl apply -f security.yaml # Security policies
kubectl apply -f network-policies.yaml # Network policies for our deployment
kubectl apply -f deployment.yaml # Application only# Check resource status by type
kubectl get all -n openemr # All resources
kubectl get pvc -n openemr # Storage claims
kubectl get secrets -n openemr # Secret resources# Application debugging
kubectl describe deployment openemr -n openemr # Deployment status
kubectl logs -f deployment/openemr -n openemr # Application logs
kubectl get events -n openemr --sort-by='.lastTimestamp' # Recent events
# Storage debugging
kubectl describe pvc openemr-sites-pvc -n openemr # Storage status
kubectl get storageclass # Available storage
# Security debugging
kubectl auth can-i --list --as=system:serviceaccount:openemr:openemr-sa # Permissions
kubectl get rolebindings -n openemr # RBAC bindings
# Network policy debugging
kubectl get networkpolicies -n openemr # Network policies
kubectl describe networkpolicy openemr-base-access -n openemr # Policy details
# WAF debugging
kubectl get ingress -n openemr -o yaml | grep wafv2-acl-arn # WAF association
terraform output waf_enabled # WAF deployment status
terraform output waf_web_acl_arn # WAF ACL ARNThe deployment system includes automatic OpenEMR initialization that provides resilience and recovery capabilities:
- Automatic Setup: OpenEMR containers handle their own initialization automatically
- State Persistence: Application state is preserved across container restarts
- Fault Tolerance: Built-in retry mechanisms for database and service connections
- Health Monitoring: Comprehensive health checks and readiness probes
- Automatic Recovery: Failed containers are automatically restarted
- Resource Management: Efficient resource allocation and autoscaling
The deploy.sh script orchestrates the deployment in the correct order:
1. Prerequisites Check
βββ kubectl, aws, helm availability
βββ AWS credentials validation
βββ Cluster connectivity test
2. Namespace Creation
βββ Create openemr namespace
βββ Apply Pod Security Standards
3. Storage Setup
βββ Create EFS storage classes
βββ Provision persistent volume claims
4. Security Configuration
βββ Apply RBAC policies
βββ Create service accounts
βββ Configure Pod Disruption Budget
βββ Apply network policies for our deployment
βββ Configure WAFv2 protection (if enabled)
5. Application Deployment
βββ Deploy OpenEMR application (config via env vars)
βββ Create services
6. Observability Setup
βββ Deploy Fluent Bit sidecar for logging
βββ Set up CloudWatch log forwarding
7. Ingress Configuration
βββ Configure ingress controller
βββ Set up SSL terminationThe OpenEMR deployment includes a comprehensive backup and restore system with enhanced cross-region and cross-account capabilities designed for enterprise disaster recovery:
- β Multiple Backup Strategies: Same-region, cross-region, and cross-account backup options
- β Enhanced RDS Capabilities: Leverages new Amazon RDS cross-Region/cross-account snapshot copy
- β Comprehensive Coverage: Database, EFS, Kubernetes configs, and application data
- β Automated Metadata: Rich backup metadata for tracking and restoration
- β Cost Optimization: Single-step operations eliminate intermediate snapshots
- β Disaster Recovery: Full infrastructure restoration capabilities
- β Strategy Auto-Detection: Automatically detects restore strategy from backup metadata
- π Same-Region: Fastest backup, lowest cost (development/testing)
- π Cross-Region: Disaster recovery with new RDS single-step copy
- π’ Cross-Account: Compliance and data sharing between organizations
- π Auto-Detection: Intelligent restore strategy detection from metadata
- Aurora Database: RDS cluster snapshots with enhanced cross-region/cross-account copy
- Kubernetes Configs: All K8s resources (deployments, services, PVCs, configmaps)
- Application Data: OpenEMR sites directory with compression
- Backup Metadata: JSON and human-readable reports with strategy tracking
The backup and restore scripts include intelligent polling to handle AWS resource availability timing:
# RDS Cluster Availability Timeout (default: 30 minutes)
export CLUSTER_AVAILABILITY_TIMEOUT=1800
# RDS Snapshot Availability Timeout (default: 30 minutes)
export SNAPSHOT_AVAILABILITY_TIMEOUT=1800
# Polling Interval in Seconds (default: 30 seconds)
export POLLING_INTERVAL=30
# Example: Set longer timeouts for large databases
export CLUSTER_AVAILABILITY_TIMEOUT=3600 # 1 hour
export SNAPSHOT_AVAILABILITY_TIMEOUT=3600 # 1 hour
export POLLING_INTERVAL=60 # 1 minute updates- RDS Cluster Status: Waits for cluster to be "available" before creating snapshots
- Snapshot Creation: Monitors snapshot creation progress with real-time status updates
- Cross-Region Copy: Tracks snapshot copy progress between regions
- User Feedback: Provides status updates every 30 seconds (configurable) with remaining time estimates
- Large Databases: Multi-TB Aurora clusters may need 1-2 hours
- Busy Clusters: High-traffic databases during backup operations
- Cross-Region: Inter-region transfers can take longer depending on data size
- Network Conditions: Slower network connections between regions
# Set in your environment or .bashrc for production
export CLUSTER_AVAILABILITY_TIMEOUT=7200 # 2 hours for large clusters
export SNAPSHOT_AVAILABILITY_TIMEOUT=7200 # 2 hours for large snapshots
export POLLING_INTERVAL=60 # 1 minute updates for production
# Run backup with custom timeouts
BACKUP_REGION=us-east-1 ./scripts/backup.shThe restore script has been significantly simplified and made more reliable:
- One-command restore - Only requires backup bucket and snapshot ID
- Auto-detection - Automatically detects EKS cluster from Terraform output
- Faster execution - Uses existing OpenEMR pods instead of creating temporary ones
- Auto-reconfiguration - Automatically updates database and Redis connections
- Manual fallback - Built-in step-by-step manual restore instructions
- Database Restore: Restore Aurora RDS from snapshots with cross-region support
- Application Data Restore: Download and extract app data from S3 to EFS
- Auto-Reconfiguration: Automatically update database and Redis connections
- Manual Instructions: Get step-by-step manual restore guide if needed
# Basic restore (most common)
./restore.sh my-backup-bucket my-snapshot-id
# Cross-region restore
./restore.sh my-backup-bucket my-snapshot-id us-east-1
# Automated restore (skip confirmations)
./restore.sh my-backup-bucket my-snapshot-id --force
# Selective restore
RESTORE_APP_DATA=false ./restore.sh my-backup-bucket my-snapshot-id
# Get manual instructions
./restore.sh --manual-instructionsThe project includes a comprehensive automated end-to-end backup/restore test script that validates the entire process:
# Run the full end-to-end test
./scripts/test-end-to-end-backup-restore.sh
# Custom test configuration
./scripts/test-end-to-end-backup-restore.sh \
--cluster-name openemr-eks-test \
--aws-region us-west-2- Infrastructure Deployment - Complete EKS cluster creation
- OpenEMR Installation - Full application deployment
- Test Data Creation - Timestamped proof files for verification
- Backup Creation - Complete backup of the installation
- Monitoring Stack Test - Validates monitoring stack installation and uninstallation
- Infrastructure Destruction - Complete resource cleanup
- Infrastructure Recreation - Rebuild from scratch
- Backup Restoration - Restore from backup
- Verification - Confirm data integrity and connectivity
- Final Cleanup - Remove all test resources
- Pre-production Validation - Verify backup/restore before going live
- Disaster Recovery Testing - Test complete recovery procedures
- Infrastructure Validation - Ensure Terraform configurations work
- Compliance Testing - Demonstrate capabilities for audits
- Automated Verification - No manual intervention required
- Resources: AWS resources will be created and destroyed during testing
- Duration: 2.7 hours (160-165 minutes measured in actual test runs)
- Resources: Creates and destroys real AWS resources
- Requirements: Proper AWS credentials and permissions
- Complete Backup/Restore Guide - Comprehensive documentation
- CloudWatch Logs: Application, error, and audit logs
- CloudWatch Metrics: Infrastructure and application metrics
- Fluent Bit: Log collection and forwarding
cd monitoring
./install-monitoring.sh
# Includes:
# - Prometheus: Metrics collection and alerting
# - Grafana: Dashboards and visualization
# - Loki: Log aggregation
# - Jaeger: Distributed tracing
# - AlertManager: Alert routing w/ optional Slack integrationThe monitoring installation script now automatically backs up existing credentials instead of overwriting them:
# Existing credentials are automatically backed up with timestamps
# Example backup files created:
# - grafana-admin-password.backup.20250816-180000
# - monitoring-credentials.txt.backup.20250816-180000
# This prevents accidental loss of:
# - Grafana admin passwords
# - Monitoring access credentials
# - Custom configuration settingsBenefits:
- No credential loss: Existing passwords and settings are preserved
- Timestamped backups: Easy to identify when credentials were changed
- Safe reinstallation: Can reinstall monitoring without losing access
- Audit trail: Track credential changes over time
# 1. Make configuration changes
vim terraform/terraform.tfvars
# 2. Deploy changes
cd k8s && ./deploy.sh
# 3. Test and validate
cd ../scripts && ./validate-deployment.sh
# 4. Clean up for git commit (β οΈ See warning below)
./restore-defaults.sh --backup
# 5. Commit clean changes
git add . && git commit -m "Update configuration"restore-defaults.sh script will erase any structural changes you've made to YAML files. Only use it when you're changing configuration values, not when you're actively developing or modifying the file structure itself.
# Before sharing code (unless you're trying to make structural changes to to the YAML Kubernetes manifests for development purposes)
cd scripts && ./restore-defaults.sh --force
# After pulling changes
git pull
cd k8s && ./deploy.sh # Deploy with your terraform.tfvars# 1. Validate deployment
cd scripts && ./validate-deployment.sh
# 2. Clean and redeploy if needed
./clean-deployment.sh
cd ../k8s && ./deploy.sh
# 3. Restore clean state when done
cd ../scripts && ./restore-defaults.shThe restore-defaults.sh script uses git checkout HEAD -- to restore files to their original repository state. This means:
- β Safe for configuration changes: When you only modify values in terraform.tfvars
- β Safe for deployment cleanup: Removes deployment artifacts and generated files
- β DANGEROUS for structural changes: Will erase modifications to YAML file structure
- β DANGEROUS during development: Will lose custom changes to deployment templates
Use Cases:
- β After deployments to clean up for git commits
- β When switching between different configurations
- β Before sharing code with team members
- β While actively developing new features in YAML files
- β When you've made custom modifications to deployment structure
# Your IP has likely changed
cd scripts
./cluster-security-manager.sh check-ip
./cluster-security-manager.sh enable# Check pod status
kubectl describe pod <pod-name> -n openemr
# Validate EFS CSI driver
cd scripts
./validate-efs-csi.sh# Check Auto Mode status
aws eks describe-cluster --name openemr-eks \
--query 'cluster.computeConfig'
# View nodeclaims (Auto Mode)
kubectl get nodeclaim
# Debug pod scheduling
kubectl get events -n openemr --sort-by='.lastTimestamp'# If ShellCheck fails with Docker errors, ensure Docker is running
docker --version
docker ps
# If pre-commit is not found, use the full path (example below from MacOS)
/Library/Frameworks/Python.framework/Versions/<python_version>/bin/python3 -m pre_commit run --all-files
# Alternative: Skip ShellCheck if Docker is not available
pre-commit run --all-files --hook shellcheck --verbose
# If yamllint is too strict, check the .yamllint configuration
cat .yamllint
# Modify .yamllint to disable specific rules if needed
# If markdownlint is too strict, check the .markdownlint.json configuration
cat .markdownlint.json
# Modify .markdownlint.json to disable specific rules if neededOur enhanced backup and restore system provides simple, reliable, and comprehensive data protection with new Amazon RDS capabilities:
# Cross-region backup for disaster recovery (recommended)
./scripts/backup.sh --strategy cross-region --backup-region us-east-1
# Cross-account backup for compliance
./scripts/backup.sh --strategy cross-account --target-account 123456789012 --backup-region us-east-1
# Same-region backup (fastest, lowest cost)
./scripts/backup.sh --strategy same-region# Auto-detect restore strategy (recommended)
./scripts/restore.sh <backup-bucket> <snapshot-id> <backup-region>
# Cross-region restore
./scripts/restore.sh <backup-bucket> <snapshot-id> --strategy cross-region
# Cross-account restore
./scripts/restore.sh <backup-bucket> <snapshot-id> --strategy cross-account --source-account 123456789012
# Example with auto-detection
./scripts/restore.sh openemr-backups-123456789012-openemr-eks-20250815 openemr-eks-aurora-backup-20250815-120000-us-east-1 us-east-1- β RDS Aurora snapshots - Point-in-time database recovery
- β Kubernetes configurations - All resources, secrets, configs
- β Application data - Patient data, files, custom configurations
- β Cross-region support - Disaster recovery across AWS regions
- β Comprehensive metadata - Restore instructions and audit trails
-
Create Regular Backups
# Daily automated backup (add to cron) 0 2 * * * /path/to/scripts/backup.sh --backup-region us-east-1
-
In Case of Disaster
# Restore to disaster recovery region AWS_REGION=us-east-1 ./scripts/restore.sh \ openemr-backups-123456789012-openemr-eks-20250815 \ openemr-eks-aurora-backup-20250815-120000 \ us-east-1 -
Verify and Activate
- Test application functionality
- Update DNS records
- Notify users of recovery
The project includes a manual release system that manages versions and creates GitHub releases:
- Manual releases only: Triggered when you want them
- Full user tracking: Records who triggered each release
- Complete audit trail: All release metadata includes trigger source
- Semantic versioning: Automatic version calculation and file updates
- Change detection: Only releases when there are actual changes
- User accountability: Every release shows who triggered it
- Required documentation: All releases must include meaningful release notes
- Workflow integration: Direct links to GitHub Actions runs
- Dry run mode: Test releases without creating them
# Create release via GitHub Actions
# Go to Actions > Manual Releases > Run workflow
# Choose type: major (+1.0.0) | minor (+0.1.0) | patch (+0.0.1)
# **Required**: Add release notes describing changes
# Click Run workflowNote: Release notes are required for all manual releases to ensure proper documentation.
For complete release system documentation, see Manual Releases Guide.
Our comprehensive testing strategy ensures code quality and reliability without requiring external infrastructure access.
- π Code Quality Tests - Syntax validation, best practices, and code standards
- βΈοΈ Kubernetes Manifest Tests - K8s syntax validation and security checks
- π Script Validation Tests - Shell script syntax and logic validation
- π Documentation Tests - Markdown validation and link checking
# Run all tests
cd scripts
./run-test-suite.sh
# Run specific test suite
./run-test-suite.sh -s code_quality
./run-test-suite.sh -s kubernetes_manifests
./run-test-suite.sh -s script_validation
./run-test-suite.sh -s documentationAutomated quality checks run before each commit:
# Install pre-commit hooks
pip install pre-commit
pre-commit install
pre-commit install --hook-type commit-msg
# Note: Docker is required for ShellCheck pre-commit hooks
# Make sure Docker is running before running pre-commitAvailable Hooks:
- Code formatting (Black, isort, flake8)
- Security scanning (Trivy)
- Validation (YAML with relaxed rules via
.yamllint, JSON, Terraform, Kubernetes) - Documentation (Markdown linting with relaxed rules via
.markdownlint.json) - Shell scripts (ShellCheck)
Note: Python-specific hooks (Black, isort, flake8, Bandit) are included for future machine learning and analytics capabilities, which will almost certainly be implemented in Python. These hooks ensure Python code quality and security from day one.
- Local Reports - Stored in
test-results/directory - CI/CD Integration - Automatic testing on all pull requests
- Security Scanning - Vulnerability detection with Trivy
- Quality Metrics - Code coverage and best practice compliance
π Detailed Testing Guide: Testing Guide
Automated testing and quality assurance through GitHub Actions.
- π§ͺ Test Matrix - Parallel execution of all test suites
- π Lint & Validate - Code quality and syntax validation
- π‘οΈ Security Scan - Vulnerability detection and reporting
- π Quality Check - Common issue detection and prevention
- π Summary Report - Comprehensive test results and status
- Push to
mainordevelopbranches - Pull Requests to
mainordevelopbranches - Manual Trigger via workflow dispatch
- All tests must pass before merging
- Security vulnerabilities are automatically detected
- Code quality standards are enforced
- Documentation is validated for completeness
- Test Results - Stored for 7 days with detailed reports
- Security Reports - Available in GitHub Security tab
- Pull Request Comments - Automatic status updates and summaries
- Failure Notifications - Immediate feedback on test failures
Each directory now includes detailed README.md files with maintenance guidance for developers and maintainers:
- Terraform Directory - Complete infrastructure documentation with dependency graphs
- Kubernetes Directory - Kubernetes manifests documentation with deployment workflows
- Scripts Directory - Operational scripts documentation and maintenance guide
- GitHub Directory - CI/CD workflows and automation documentation
- OIDC Provider Directory - GitHub β AWS OIDC provider setup and configuration
- Images Directory - Visual assets and branding materials documentation
- Documentation Directory - Complete documentation index and maintenance guide
- Deployment Guide - Complete deployment instructions and configuration
- Deployment Timings Guide - Measured timing data for all operations (infrastructure, backup, restore, etc.)
- Autoscaling Guide - Horizontal Pod Autoscaler configuration and management
- Version Management Guide - Version awareness and dependency management
- Troubleshooting Guide - Common issues and solutions
- Backup & Restore Guide - Data backup and recovery procedures
- Manual Releases Guide - Manual release process and version management
- Logging Guide - OpenEMR 7.0.3.4 Enhanced Logging
- Testing Guide - Comprehensive CI/CD testing framework
- End-to-End Testing Requirements - MANDATORY testing procedures
- GitHub β AWS Credentials Guide - GitHub β AWS OIDC setup and credential management
- Monitoring Setup - Prometheus, Grafana, and monitoring stack configuration
- OpenEMR Community Forums Support Section
- AWS Support (with support plan)
- GitHub Issues for this deployment
This project includes comprehensive version management and awareness capabilities to help you stay up-to-date with the latest releases and security updates.
# Check for available updates
./scripts/version-manager.sh check
# Check specific component types
./scripts/version-manager.sh check --components applications
./scripts/version-manager.sh check --components terraform_modules
# Show current status
./scripts/version-manager.sh status- π’ Awareness Notifications: Automated monthly checks via GitHub Actions
- π Comprehensive Monitoring: All dependencies tracked across the entire stack
- π GitHub Issues: Automatic issue creation for available updates
- π― Manual Control: Read-only awareness system - no automatic updates applied
- π§ Component Selection: Choose specific component types for targeted checks
The project features a comprehensive version check system that supports both automated and manual runs:
- Runs automatically on the 1st of every month via GitHub Actions
- Creates monthly issues titled "Version Check Report for Month of [Current Month]"
- Prevents duplicates by checking for existing monthly issues
- Uses AWS CLI for definitive version lookups when credentials are available (prefers OIDC authentication)
- Gracefully handles missing AWS credentials with fallback mechanisms
- Triggered manually via GitHub Actions workflow dispatch
- Creates timestamped issues titled "Manual Version Check Report - [YYYY-MM-DD HH:MM:SS UTC]"
- Component selection - Choose specific component types to check
- Flexible reporting - Option to run checks without creating issues
- No duplicate prevention - Always creates new timestamped issues for manual runs
The system tracks versions for:
- Applications: OpenEMR, Fluent Bit
- Infrastructure: Kubernetes, Terraform, AWS Provider
- Terraform Modules: EKS (terraform-aws-modules/eks/aws), EKS Pod Identity (terraform-aws-modules/eks-pod-identity/aws), VPC (terraform-aws-modules/vpc/aws), AWS (hashicorp/aws), Kubernetes (hashicorp/kubernetes)
- GitHub Workflows: GitHub Actions dependencies and versions
- Pre-commit Hooks: Code quality tools and versions
- Semver Packages: The Python package called "Semver", Python, Terraform CLI, kubectl
- Monitoring: Prometheus, Loki, Jaeger
- Security: Cert Manager
- EKS Add-ons: EFS CSI Driver, Metrics Server
Some version checks require AWS CLI credentials to be configured:
- EKS Add-ons: EFS CSI Driver and Metrics Server versions require AWS CLI to query EKS add-on versions
- Aurora MySQL: (optional) Can use AWS credentials as a redundant source for version lookups
- Infrastructure: EKS versions require AWS access for accurate version checking via AWS CLI
Authentication Method:
β οΈ IMPORTANT: This repository now prefers GitHub OIDC β AWS IAM role for AWS authentication in GitHub Actions workflows.Use OIDC whenever possible. Static AWS access keys are still supported for backward compatibility.
See
docs/GITHUB_AWS_CREDENTIALS.mdfor complete setup instructions, or use the automated setup inoidc_provider/.
Note: The system is designed to work gracefully without AWS credentials. Components that cannot be checked due to missing credentials will be clearly reported, and the system will continue to check all other components.
You can run version checks manually in two ways:
- Go to Actions β Version Check & Awareness
- Click Run workflow
- Select component type:
all- Check all components (default)applications- OpenEMR, Fluent Bitinfrastructure- Kubernetes, Terraform, AWS Providerterraform_modules- EKS, VPC, RDS modulesgithub_workflows- GitHub Actions dependenciesmonitoring- Prometheus, Loki, Jaegereks_addons- EFS CSI Driver, Metrics Server
- Choose whether to create GitHub issue (default: true)
- Click Run workflow
# Check all components
./scripts/version-manager.sh check
# Check specific component types
./scripts/version-manager.sh check --components applications
./scripts/version-manager.sh check --components eks_addons
./scripts/version-manager.sh check --components monitoring
# Show current status
./scripts/version-manager.sh status- Stable Only: OpenEMR uses stable releases (not latest development versions)
- Latest Available: Other components use latest available versions
- Security Focus: Prioritizes security updates and patches
- Compatibility: Ensures component compatibility before suggesting updates
- Read-Only System: Provides awareness only - no automatic updates applied
Version information is centrally managed in versions.yaml. To configure AWS credentials for enhanced version checking:
-
Deploy OIDC provider using the Terraform module in
oidc_provider/:cd oidc_provder/scripts ./deploy.sh # Or ... cd oidc_provider terraform init terraform apply
-
Configure GitHub Secret:
- Add
AWS_OIDC_ROLE_ARNsecret with the role ARN from Terraform outputs
- Add
-
The workflow will automatically use OIDC - no access keys needed!
See docs/GITHUB_AWS_CREDENTIALS.md for detailed setup instructions.
For backward compatibility, static credentials are still supported:
- Set up AWS CLI credentials in your environment
- Configure GitHub Secrets for CI/CD:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_REGION(optional, defaults to us-west-2)
Note: The workflow will automatically prefer OIDC if configured, falling back to static credentials if needed.
For more details, see the Version Management Guide.
MIT License. See full license here.
This deployment provides production ready infrastructure. Full HIPAA compliance requires additional organizational policies, procedures, and training. Ensure you have executed a Business Associate Agreement with AWS before processing PHI.