GitHub Template Repository for deploying LabLink infrastructure to AWS
Deploy your own LabLink infrastructure for cloud-based VM allocation and management. This template uses Terraform and GitHub Actions to automate deployment of the LabLink allocator service to AWS.
📖 Main Documentation: https://talmolab.github.io/lablink/
LabLink automates deployment and management of cloud-based VMs for running research software. It provides:
- Web interface for requesting and managing VMs
- Automatic VM provisioning with your software pre-installed
- GPU support for ML/AI workloads
- Chrome Remote Desktop access to VM GUI
- Flexible configuration for different research needs
Click the "Use this template" button at the top of this repository to create your own deployment repository.
Go to your repository → Settings → Secrets and variables → Actions, and add these secrets:
| Secret Name | Description | Example Value |
|---|---|---|
AWS_ROLE_ARN |
IAM role ARN for GitHub Actions OIDC | arn:aws:iam::123456789012:role/github-actions-role |
AWS_REGION |
AWS region for deployment | us-west-2 |
ADMIN_PASSWORD |
Password for allocator web interface | your-secure-password |
DB_PASSWORD |
PostgreSQL database password | your-secure-db-password |
Run the automated setup script to create required AWS resources:
# 1. Copy example config
cp lablink-infrastructure/config/test.example.yaml lablink-infrastructure/config/config.yaml
# 2. Edit with your values
# Update bucket_name, domain, region, etc.
# 3. Run setup (creates S3, DynamoDB, Route53)
./scripts/setup-aws-infrastructure.shSee AWS Setup Guide for details.
Edit lablink-infrastructure/config/config.yaml:
# Update these values for your deployment:
allocator:
image_tag: "linux-amd64-latest-test" # For prod, use specific version like "linux-amd64-v1.2.3"
machine:
repository: "https://github.com/YOUR_ORG/YOUR_DATA_REPO.git"
software: "your-software-name"
extension: "your-file-ext"
dns:
enabled: true # Set to true if using custom domain
domain: "your-domain.com"
bucket_name: "tf-state-YOUR-ORG-lablink" # Must be globally uniqueImportant: The config file path (lablink-infrastructure/config/config.yaml) is hardcoded in the infrastructure. Do not move or rename this file.
See Configuration Reference for all options.
Via GitHub Actions (Recommended):
- Go to Actions → "Deploy LabLink Infrastructure"
- Click "Run workflow"
- Select environment (
test,prod, orci-test) - Click "Run workflow"
Via Local Terraform:
cd lablink-infrastructure
../scripts/init-terraform.sh test
terraform apply -var="resource_suffix=test"After deployment completes:
- Allocator URL: Check workflow output or Terraform output for the URL/IP
- SSH Access: Download the PEM key from workflow artifacts
- Web Interface: Navigate to allocator URL in your browser
-
AWS Account with permissions to create:
- EC2 instances
- Security Groups
- Elastic IPs
- (Optional) Route 53 records for DNS
-
GitHub Account with ability to:
- Create repositories from templates
- Configure GitHub Actions secrets
- Run GitHub Actions workflows
-
Basic Knowledge of:
- Terraform (helpful but not required)
- AWS services
Before deploying, you must set up:
- S3 Bucket for Terraform state storage
- IAM Role for GitHub Actions OIDC authentication
- (Optional) Elastic IP for persistent allocator address
- (Optional) Route 53 Hosted Zone for custom domain
See AWS Setup Guide below for detailed instructions.
Create an IAM role with OIDC provider for GitHub Actions:
-
Create OIDC provider in IAM (if not exists):
- Provider URL:
https://token.actions.githubusercontent.com - Audience:
sts.amazonaws.com
- Provider URL:
-
Create IAM role with trust policy:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::YOUR_ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringLike": { "token.actions.githubusercontent.com:sub": "repo:YOUR_ORG/YOUR_REPO:*" } } } ] } -
Attach permissions:
PowerUserAccess(or custom policy with EC2, VPC, S3, Route53, IAM permissions)
-
Copy the Role ARN and add to GitHub secrets
The AWS region where your infrastructure will be deployed. Must match the region in your config.yaml.
Common regions:
us-west-2(Oregon)us-east-1(N. Virginia)eu-west-1(Ireland)
Important: AMI IDs are region-specific. If you change regions, update the ami_id in config.yaml.
Password for accessing the allocator web interface. Choose a strong password (12+ characters, mixed case, numbers, symbols).
This password is used to log in to the admin dashboard where you can:
- Create and destroy client VMs
- View VM status
- Assign VMs to users
Password for the PostgreSQL database used by the allocator service. Choose a different strong password than ADMIN_PASSWORD.
This is stored securely and injected into the configuration at deployment time.
Use the automated setup script to create all required AWS resources:
# 1. Configure your deployment
cp lablink-infrastructure/config/test.example.yaml lablink-infrastructure/config/config.yaml
# Edit config.yaml with your values (bucket_name, domain, region, etc.)
# 2. Run automated setup
./scripts/setup-aws-infrastructure.shWhat the script does:
- Checks prerequisites (AWS CLI installed, credentials configured)
- Creates S3 bucket for Terraform state (with versioning)
- Creates DynamoDB table for state locking
- Creates Route53 hosted zone (if DNS enabled) - the DNS management container
- Updates config.yaml with zone_id automatically
- Idempotent (safe to run multiple times)
What the script does NOT do:
- Does NOT register domain names (you must register via Route53 registrar, CloudFlare, or other registrar)
- Does NOT create DNS records (Terraform can create these, or you create manually)
After setup, choose your DNS/SSL approach:
-
Route53 + Let's Encrypt:
- Register domain → Update nameservers → Set
dns.terraform_managed: true/false - DNS records: Terraform-managed or manual in Route53 console
- Register domain → Update nameservers → Set
-
CloudFlare DNS + SSL:
- Manage domain/DNS in CloudFlare (no Route53 needed)
- Set
ssl.provider: "cloudflare" - Create A record in CloudFlare pointing to allocator IP
-
IP-only (no DNS/SSL):
- Set
dns.enabled: false - Access via IP address
- Set
Note: Config will be simplified in future releases. See DNS-SSL-SIMPLIFICATION-PLAN.md for upcoming changes.
If you prefer to create resources manually:
# Create bucket (must be globally unique across ALL of AWS)
aws s3 mb s3://tf-state-YOUR-ORG-lablink --region us-west-2
# Enable versioning (recommended)
aws s3api put-bucket-versioning \
--bucket tf-state-YOUR-ORG-lablink \
--versioning-configuration Status=EnabledUpdate bucket_name in lablink-infrastructure/config/config.yaml to match.
aws dynamodb create-table \
--table-name lock-table \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region us-west-2For persistent allocator IP address across deployments:
# Allocate EIP
aws ec2 allocate-address --domain vpc --region us-west-2
# Tag it for reuse
aws ec2 create-tags \
--resources eipalloc-XXXXXXXX \
--tags Key=Name,Value=lablink-eipUpdate eip.tag_name in config.yaml if using a different tag name.
If using a custom domain:
-
Create or use existing hosted zone:
aws route53 create-hosted-zone --name your-domain.com --caller-reference $(date +%s) -
Update your domain's nameservers to point to Route 53 NS records
-
Update
dnssection inconfig.yaml:dns: enabled: true domain: "your-domain.com" zone_id: "Z..." # Optional - will auto-lookup if empty
See GitHub Secrets Setup above for detailed IAM role configuration.
All configuration is in lablink-infrastructure/config/config.yaml.
db:
dbname: "lablink_db"
user: "lablink"
password: "PLACEHOLDER_DB_PASSWORD" # Injected from GitHub secret
host: "localhost"
port: 5432machine:
machine_type: "g4dn.xlarge" # AWS instance type
image: "ghcr.io/talmolab/lablink-client-base-image:latest" # Docker image
ami_id: "ami-0601752c11b394251" # Region-specific AMI
repository: "https://github.com/YOUR_ORG/YOUR_REPO.git" # Your code/data repo
software: "your-software" # Software identifier
extension: "ext" # Data file extensionInstance Types:
g4dn.xlarge- GPU instance (NVIDIA T4, good for ML)t3.large- CPU-only, cheaperp3.2xlarge- More powerful GPU (NVIDIA V100)
AMI IDs (Ubuntu 24.04 with Docker + Nvidia):
us-west-2:ami-0601752c11b394251- Other regions: Use AWS Console to find similar AMI or create custom
app:
admin_user: "admin"
admin_password: "PLACEHOLDER_ADMIN_PASSWORD" # Injected from secret
region: "us-west-2" # Must match AWS_REGION secretdns:
enabled: false # true to use DNS, false for IP-only
terraform_managed: false # true = Terraform creates records
domain: "lablink.example.com"
zone_id: "" # Leave empty for auto-lookup
app_name: "lablink"
pattern: "auto" # "auto" or "custom"DNS Patterns:
auto: Creates{env}.{app_name}.{domain}(e.g.,test.lablink.example.com)custom: Usescustom_subdomainvalue
ssl:
provider: "none" # "letsencrypt", "cloudflare", or "none"
email: "admin@example.com" # For Let's Encrypt notifications
staging: true # true = staging certs, false = production certsSSL Providers:
none: HTTP only (for testing)letsencrypt: Automatic SSL with Caddycloudflare: Use CloudFlare proxy for SSL
eip:
strategy: "persistent" # "persistent" or "dynamic"
tag_name: "lablink-eip" # Tag to find reusable EIPDeploys or updates your LabLink infrastructure.
Triggers:
- Manual: Actions → "Deploy LabLink Infrastructure" → Run workflow
- Automatic: Push to
testbranch
Inputs:
environment:testorprod
What it does:
- Configures AWS credentials via OIDC
- Injects passwords from GitHub secrets into config
- Runs Terraform to create/update infrastructure
- Verifies deployment and DNS
- Uploads SSH key as artifact
Triggers:
- Manual only: Actions → "Destroy LabLink Infrastructure" → Run workflow
Inputs:
confirm_destroy: Must type "yes" to confirmenvironment:testorprod
What it does:
- Creates a minimal terraform backend configuration
- Initializes Terraform with S3 backend to access client VM state
- Destroys client VMs directly from the S3 state (for test/prod/ci-test)
- Destroys the allocator infrastructure (EC2, security groups, EIP, etc.)
Note: Client VM state is stored in S3 (same bucket as infrastructure state). Terraform can destroy resources using only the state file - no terraform configuration files needed!
If the destroy workflow fails or leaves orphaned resources, see the Manual Cleanup Guide for step-by-step procedures to:
- Remove orphaned IAM roles, policies, and instance profiles
- Clean up leftover EC2 instances, security groups, and key pairs
- Fix Terraform state file issues (checksum mismatches, corrupted state)
- Verify complete resource removal
Common scenarios covered:
- Destroy workflow failures
- "Resource in use" errors
- Orphaned client VMs
- State lock issues
-
Update
config.yaml:machine: repository: "https://github.com/your-org/your-software-data.git" software: "your-software-name" extension: "your-file-ext" # e.g., "h5", "npy", "csv"
-
(Optional) Use custom Docker image:
machine: image: "ghcr.io/your-org/your-custom-image:latest"
-
Update
config.yaml:app: region: "eu-west-1" # Your region machine: ami_id: "ami-XXXXXXX" # Region-specific AMI
-
Update GitHub secret
AWS_REGION -
Find appropriate AMI for region (Ubuntu 24.04 with Docker)
machine:
machine_type: "t3.xlarge" # No GPU, cheaper
# or
machine_type: "p3.2xlarge" # More powerful GPUSee AWS EC2 Instance Types for options.
The client VMs can be configured with a custom startup script. See the LabLink Infrastructure README for more details.
Cause: Destroy workflow failed or Terraform state is out of sync with AWS resources
Solution: Use the automated cleanup script:
# Dry-run to see what would be deleted
./scripts/cleanup-orphaned-resources.sh <environment> --dry-run
# Actual cleanup
./scripts/cleanup-orphaned-resources.sh <environment>The script automatically reads configuration from config.yaml, backs up Terraform state files, and deletes resources in the correct dependency order. For detailed manual cleanup procedures, see MANUAL_CLEANUP_GUIDE.md.
Cause: AMI ID doesn't exist in your region
Solution: Update ami_id in config.yaml with a region-appropriate AMI
Cause: Security group or DNS not configured
Solution:
- Check security group allows inbound traffic on port 5000
- If using DNS, verify DNS records propagated
- Try accessing via public IP first
Cause: Previous deployment didn't complete or cleanup
Solution:
# In lablink-infrastructure/
terraform force-unlock LOCK_IDCause: DNS propagation delay or Route 53 not configured
Solution:
- Wait 5-10 minutes for propagation
- Verify Route 53 hosted zone exists
- Check nameservers match at domain registrar
- Use
nslookup your-domain.comto test
- Main Documentation: https://talmolab.github.io/lablink/
- Infrastructure Docs: lablink-infrastructure/README.md
- GitHub Issues: https://github.com/talmolab/lablink/issues
- Deployment Checklist: DEPLOYMENT_CHECKLIST.md
lablink-template/
├── .github/workflows/ # GitHub Actions workflows
│ ├── terraform-deploy.yml # Deploy infrastructure
│ └── terraform-destroy.yml # Destroy infrastructure (includes client VMs)
├── lablink-infrastructure/ # Terraform infrastructure
│ ├── config/
│ │ ├── config.yaml # Main configuration
│ │ └── *.example.yaml # Configuration examples
│ ├── main.tf # Core Terraform config
│ ├── backend-*.hcl # Environment-specific backends
│ ├── user_data.sh # EC2 initialization script
│ ├── verify-deployment.sh # Deployment verification
│ └── README.md # Infrastructure documentation
├── MANUAL_CLEANUP_GUIDE.md # Manual cleanup procedures
├── README.md # This file
├── DEPLOYMENT_CHECKLIST.md # Pre-deployment checklist
└── LICENSE
Found an issue with the template or want to suggest improvements?
- Open an issue: https://github.com/talmolab/lablink-template/issues
- For LabLink core issues: https://github.com/talmolab/lablink/issues
BSD 2-Clause License - see LICENSE file for details.
- Main LabLink Repository: https://github.com/talmolab/lablink
- Documentation: https://talmolab.github.io/lablink/
- Template Repository: https://github.com/talmolab/lablink-template
- Example Deployment: https://github.com/talmolab/sleap-lablink (SLEAP-specific configuration)
Need Help? Check the Deployment Checklist or Troubleshooting section above.