🚀 WAIIDE Deployment Guide
Calliope Integration: This component is integrated into the Calliope AI platform. Some features and configurations may differ from the upstream project.
Complete deployment documentation for WAIIDE across different platforms and environments.
📋 Deployment Overview
WAIIDE can be deployed in multiple ways depending on your infrastructure:
| Platform | Use Case | Complexity | Scalability |
|---|---|---|---|
| Docker | Development, small teams | Low | Limited |
| Docker Compose | Multi-service setups | Medium | Medium |
| AWS ECS | Production cloud | Medium | High |
| Kubernetes | Enterprise, multi-cloud | High | Very High |
🎯 Quick Deployment Options
5-Minute Quick Start
Get WAIIDE running immediately for testing:
- Pre-built Docker images
- Minimal configuration
- Single-user setup
- Perfect for evaluation
Docker Deployment
Standard Docker deployment:
- Docker and Docker Compose
- Local and remote deployment
- Volume persistence
- Resource management
AWS ECS Deployment
Production-ready AWS deployment:
- ECS Fargate containers
- Auto-scaling configuration
- Load balancer setup
- Multi-instance support
Kubernetes Deployment
Enterprise Kubernetes deployment:
- Helm charts and manifests
- Persistent volumes
- Service mesh integration
- Multi-zone deployment
🏗️ Architecture Considerations
Single-Instance vs Multi-Instance
# Single instance per user (legacy)
/user/username/waiide/
# Multi-instance per user (modern)
/user/username/username-waiide-abc123/
/user/username/username-data-science-def456/
/user/username/username-project-a-ghi789/Scaling Patterns
- Horizontal: Multiple WAIIDE containers per user
- Vertical: Increase resources per container
- Elastic: Auto-scale based on demand
- Geographic: Multi-region deployment
⚙️ Configuration by Platform
Docker
# docker-compose.yml
version: '3.8'
services:
jupyterhub:
image: jupyterhub/jupyterhub:latest
ports:
- "8000:8000"
environment:
- DOCKER_SPAWNER_IMAGE=calliopeai/waiide:latest
waiide:
image: calliopeai/waiide:latest
ports:
- "8070:8070" # Default JUPYTERHUB_PORT
environment:
- JUPYTERHUB_USER=${USER}
- JUPYTERHUB_SERVICE_PREFIX=/user/${USER}/${USER}-waiide/
# Defaults: JUPYTERHUB_PORT=8070, VSCODE_PORT=8071ECS
{
"family": "waiide-task",
"containerDefinitions": [{
"name": "waiide",
"image": "calliopeai/waiide:latest",
"environment": [
{"name": "JUPYTERHUB_USER", "value": "${username}"},
{"name": "JUPYTERHUB_SERVER_NAME", "value": "${servername}"},
{"name": "JUPYTERHUB_SERVICE_PREFIX", "value": "/user/${username}/${servername}/"}
]
}]
}Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: waiide-deployment
spec:
replicas: 3
selector:
matchLabels:
app: waiide
template:
spec:
containers:
- name: waiide
image: calliopeai/waiide:latest
env:
- name: JUPYTERHUB_USER
valueFrom:
fieldRef:
fieldPath: metadata.annotations['hub.jupyter.org/username']🔐 Security Considerations
Network Security
- TLS/SSL: Always use HTTPS in production
- Firewall: Restrict access to necessary ports only
- VPC/Subnet: Deploy in private networks where possible
- Load Balancer: Use application load balancers with SSL termination
Authentication Security
- OAuth Integration: Use enterprise identity providers
- Token Management: Secure JWT token handling
- Session Management: Configure appropriate session timeouts
- Multi-Factor Auth: Enable MFA where supported
Container Security
- Image Scanning: Scan images for vulnerabilities
- User Privileges: Run containers as non-root users
- Resource Limits: Set memory and CPU limits
- Secret Management: Use proper secret management systems
📊 Resource Planning
Minimum Requirements
| Component | CPU | Memory | Storage |
|---|---|---|---|
| JupyterHub | 1 vCPU | 2GB | 10GB |
| WAIIDE Instance | 2 vCPU | 4GB | 20GB |
| Database | 1 vCPU | 2GB | 50GB |
Production Recommendations
| Scale | Users | CPU per Instance | Memory per Instance | Storage |
|---|---|---|---|---|
| Small | 1-10 | 2 vCPU | 4GB | 100GB |
| Medium | 10-50 | 4 vCPU | 8GB | 500GB |
| Large | 50-200 | 8 vCPU | 16GB | 2TB |
| Enterprise | 200+ | 16 vCPU | 32GB | 10TB+ |
Auto-Scaling Guidelines
# CPU-based scaling
Target CPU: 70%
Scale up: +2 instances when CPU > 80% for 5 minutes
Scale down: -1 instance when CPU < 50% for 10 minutes
# Memory-based scaling
Target Memory: 80%
Scale up: +1 instance when Memory > 90% for 3 minutes
Scale down: -1 instance when Memory < 60% for 15 minutes🔄 High Availability Setup
Multi-Zone Deployment
# Distribute across availability zones
Zone A: JupyterHub Primary + WAIIDE instances
Zone B: JupyterHub Backup + WAIIDE instances
Zone C: Database Primary + WAIIDE instancesLoad Balancing
- Application Load Balancer: Route to healthy instances
- Session Affinity: Maintain user-to-instance mapping
- Health Checks: Monitor instance health
- Failover: Automatic failover to healthy zones
Data Persistence
- Shared Storage: EFS, GFS, or similar for user data
- Database: RDS, PostgreSQL, or MySQL for metadata
- Backup Strategy: Regular backups with point-in-time recovery
- Disaster Recovery: Cross-region backup and recovery
🌐 Network Configuration
Port Configuration
| Port | Service | External | Internal | Purpose |
|---|---|---|---|---|
| 443 | Load Balancer | ✅ | ❌ | HTTPS traffic |
| 8000 | JupyterHub | ❌ | ✅ | Hub interface |
| 8070 | WAIIDE (Default) | ❌ | ✅ | Main service port |
| 8071 | WAIIDE (Default) | ❌ | ✅ | Internal WAIIDE |
| 8080 | WAIIDE (Override) | ❌ | ✅ | Common override port |
DNS Configuration
# Example DNS setup
hub.company.com → Load Balancer
*.hub.company.com → Wildcard for instances
# Individual instance URLs
user1-waiide-abc123.hub.company.com → Instance container
user1-data-def456.hub.company.com → Instance container🔍 Monitoring and Logging
Application Monitoring
- Health Endpoints: Monitor
/healthand/api/status - Response Times: Track API response latencies
- User Sessions: Monitor active user sessions
- Resource Usage: CPU, memory, disk utilization
Infrastructure Monitoring
- Container Metrics: Docker/Kubernetes metrics
- Network Metrics: Bandwidth, connection counts
- Storage Metrics: Disk usage, IOPS
- Security Metrics: Failed authentication attempts
Logging Strategy
# Centralized logging
Application Logs → Fluentd → Elasticsearch → Kibana
Container Logs → Docker Logging Driver → CloudWatch
System Logs → rsyslog → Splunk
# Log Rotation
Max size: 100MB per file
Retention: 30 days
Compression: gzip🚀 Performance Optimization
Container Optimization
- Image Size: Use multi-stage builds to minimize size
- Layer Caching: Optimize Docker layer caching
- Resource Limits: Set appropriate CPU/memory limits
- Health Checks: Configure proper health check intervals
Application Optimization
- Extension Loading: Lazy load WAIIDE extensions
- Caching: Cache frequently accessed data
- Compression: Enable gzip compression
- CDN: Use CDN for static assets
Database Optimization
- Connection Pooling: Use connection pooling
- Query Optimization: Optimize database queries
- Indexing: Create appropriate database indexes
- Partitioning: Partition large tables
📋 Deployment Checklist
Pre-Deployment
- Infrastructure provisioned
- DNS configured
- SSL certificates obtained
- Monitoring setup configured
- Backup strategy implemented
Deployment
- Images built and pushed to registry
- Configuration files updated
- Database migrations run
- Services deployed in correct order
- Health checks passing
Post-Deployment
- End-to-end testing completed
- User acceptance testing passed
- Monitoring alerts configured
- Documentation updated
- Team trained on new deployment
🆘 Disaster Recovery
Backup Strategy
# Daily backups
Database: Full backup daily, transaction log every 15 minutes
User Data: Incremental backup daily, full backup weekly
Configuration: Version controlled in Git
# Backup Retention
Daily: 30 days
Weekly: 12 weeks
Monthly: 12 monthsRecovery Procedures
- Service Outage: Switch to backup region
- Data Corruption: Restore from latest backup
- Security Breach: Isolate, patch, restore
- Infrastructure Failure: Failover to secondary zone
📞 Support and Maintenance
Regular Maintenance
- Security Updates: Monthly security patching
- Dependency Updates: Quarterly dependency updates
- Performance Review: Monthly performance analysis
- Capacity Planning: Quarterly capacity review
Support Escalation
- Level 1: Basic configuration and user issues
- Level 2: Advanced troubleshooting and debugging
- Level 3: Core platform issues and security incidents
- Level 4: Vendor support and critical infrastructure
🎯 Next Steps
Choose your deployment path:
- Quick Testing: Start with 5-Minute Quick Start
- Development: Use Docker Deployment
- Production: Consider AWS ECS or Kubernetes
📚 Related Documentation
- Configuration Guide - Detailed configuration options
- Architecture Overview - System design and components
- Troubleshooting - Common deployment issues
- Security Guide - Security best practices