🚀 WAIIDE Deployment Guide

🚀 WAIIDE Deployment Guide

Calliope Integration: This component is integrated into the Calliope AI platform. Some features and configurations may differ from the upstream project.

Complete deployment documentation for WAIIDE across different platforms and environments.

📋 Deployment Overview

WAIIDE can be deployed in multiple ways depending on your infrastructure:

PlatformUse CaseComplexityScalability
DockerDevelopment, small teamsLowLimited
Docker ComposeMulti-service setupsMediumMedium
AWS ECSProduction cloudMediumHigh
KubernetesEnterprise, multi-cloudHighVery High

🎯 Quick Deployment Options

5-Minute Quick Start

Get WAIIDE running immediately for testing:

  • Pre-built Docker images
  • Minimal configuration
  • Single-user setup
  • Perfect for evaluation

Docker Deployment

Standard Docker deployment:

  • Docker and Docker Compose
  • Local and remote deployment
  • Volume persistence
  • Resource management

AWS ECS Deployment

Production-ready AWS deployment:

  • ECS Fargate containers
  • Auto-scaling configuration
  • Load balancer setup
  • Multi-instance support

Kubernetes Deployment

Enterprise Kubernetes deployment:

  • Helm charts and manifests
  • Persistent volumes
  • Service mesh integration
  • Multi-zone deployment

🏗️ Architecture Considerations

Single-Instance vs Multi-Instance

# Single instance per user (legacy)
/user/username/waiide/

# Multi-instance per user (modern)
/user/username/username-waiide-abc123/
/user/username/username-data-science-def456/
/user/username/username-project-a-ghi789/

Scaling Patterns

  • Horizontal: Multiple WAIIDE containers per user
  • Vertical: Increase resources per container
  • Elastic: Auto-scale based on demand
  • Geographic: Multi-region deployment

⚙️ Configuration by Platform

Docker

# docker-compose.yml
version: '3.8'
services:
  jupyterhub:
    image: jupyterhub/jupyterhub:latest
    ports:
      - "8000:8000"
    environment:
      - DOCKER_SPAWNER_IMAGE=calliopeai/waiide:latest
  
  waiide:
    image: calliopeai/waiide:latest
    ports:
      - "8070:8070"  # Default JUPYTERHUB_PORT
    environment:
      - JUPYTERHUB_USER=${USER}
      - JUPYTERHUB_SERVICE_PREFIX=/user/${USER}/${USER}-waiide/
      # Defaults: JUPYTERHUB_PORT=8070, VSCODE_PORT=8071

ECS

{
  "family": "waiide-task",
  "containerDefinitions": [{
    "name": "waiide",
    "image": "calliopeai/waiide:latest",
    "environment": [
      {"name": "JUPYTERHUB_USER", "value": "${username}"},
      {"name": "JUPYTERHUB_SERVER_NAME", "value": "${servername}"},
      {"name": "JUPYTERHUB_SERVICE_PREFIX", "value": "/user/${username}/${servername}/"}
    ]
  }]
}

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: waiide-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: waiide
  template:
    spec:
      containers:
      - name: waiide
        image: calliopeai/waiide:latest
        env:
        - name: JUPYTERHUB_USER
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['hub.jupyter.org/username']

🔐 Security Considerations

Network Security

  • TLS/SSL: Always use HTTPS in production
  • Firewall: Restrict access to necessary ports only
  • VPC/Subnet: Deploy in private networks where possible
  • Load Balancer: Use application load balancers with SSL termination

Authentication Security

  • OAuth Integration: Use enterprise identity providers
  • Token Management: Secure JWT token handling
  • Session Management: Configure appropriate session timeouts
  • Multi-Factor Auth: Enable MFA where supported

Container Security

  • Image Scanning: Scan images for vulnerabilities
  • User Privileges: Run containers as non-root users
  • Resource Limits: Set memory and CPU limits
  • Secret Management: Use proper secret management systems

📊 Resource Planning

Minimum Requirements

ComponentCPUMemoryStorage
JupyterHub1 vCPU2GB10GB
WAIIDE Instance2 vCPU4GB20GB
Database1 vCPU2GB50GB

Production Recommendations

ScaleUsersCPU per InstanceMemory per InstanceStorage
Small1-102 vCPU4GB100GB
Medium10-504 vCPU8GB500GB
Large50-2008 vCPU16GB2TB
Enterprise200+16 vCPU32GB10TB+

Auto-Scaling Guidelines

# CPU-based scaling
Target CPU: 70%
Scale up: +2 instances when CPU > 80% for 5 minutes
Scale down: -1 instance when CPU < 50% for 10 minutes

# Memory-based scaling  
Target Memory: 80%
Scale up: +1 instance when Memory > 90% for 3 minutes
Scale down: -1 instance when Memory < 60% for 15 minutes

🔄 High Availability Setup

Multi-Zone Deployment

# Distribute across availability zones
Zone A: JupyterHub Primary + WAIIDE instances
Zone B: JupyterHub Backup + WAIIDE instances  
Zone C: Database Primary + WAIIDE instances

Load Balancing

  • Application Load Balancer: Route to healthy instances
  • Session Affinity: Maintain user-to-instance mapping
  • Health Checks: Monitor instance health
  • Failover: Automatic failover to healthy zones

Data Persistence

  • Shared Storage: EFS, GFS, or similar for user data
  • Database: RDS, PostgreSQL, or MySQL for metadata
  • Backup Strategy: Regular backups with point-in-time recovery
  • Disaster Recovery: Cross-region backup and recovery

🌐 Network Configuration

Port Configuration

PortServiceExternalInternalPurpose
443Load BalancerHTTPS traffic
8000JupyterHubHub interface
8070WAIIDE (Default)Main service port
8071WAIIDE (Default)Internal WAIIDE
8080WAIIDE (Override)Common override port

DNS Configuration

# Example DNS setup
hub.company.com → Load Balancer
*.hub.company.com → Wildcard for instances

# Individual instance URLs
user1-waiide-abc123.hub.company.com → Instance container
user1-data-def456.hub.company.com → Instance container

🔍 Monitoring and Logging

Application Monitoring

  • Health Endpoints: Monitor /health and /api/status
  • Response Times: Track API response latencies
  • User Sessions: Monitor active user sessions
  • Resource Usage: CPU, memory, disk utilization

Infrastructure Monitoring

  • Container Metrics: Docker/Kubernetes metrics
  • Network Metrics: Bandwidth, connection counts
  • Storage Metrics: Disk usage, IOPS
  • Security Metrics: Failed authentication attempts

Logging Strategy

# Centralized logging
Application Logs → Fluentd → Elasticsearch → Kibana
Container Logs → Docker Logging Driver → CloudWatch
System Logs → rsyslog → Splunk

# Log Rotation
Max size: 100MB per file
Retention: 30 days
Compression: gzip

🚀 Performance Optimization

Container Optimization

  • Image Size: Use multi-stage builds to minimize size
  • Layer Caching: Optimize Docker layer caching
  • Resource Limits: Set appropriate CPU/memory limits
  • Health Checks: Configure proper health check intervals

Application Optimization

  • Extension Loading: Lazy load WAIIDE extensions
  • Caching: Cache frequently accessed data
  • Compression: Enable gzip compression
  • CDN: Use CDN for static assets

Database Optimization

  • Connection Pooling: Use connection pooling
  • Query Optimization: Optimize database queries
  • Indexing: Create appropriate database indexes
  • Partitioning: Partition large tables

📋 Deployment Checklist

Pre-Deployment

  • Infrastructure provisioned
  • DNS configured
  • SSL certificates obtained
  • Monitoring setup configured
  • Backup strategy implemented

Deployment

  • Images built and pushed to registry
  • Configuration files updated
  • Database migrations run
  • Services deployed in correct order
  • Health checks passing

Post-Deployment

  • End-to-end testing completed
  • User acceptance testing passed
  • Monitoring alerts configured
  • Documentation updated
  • Team trained on new deployment

🆘 Disaster Recovery

Backup Strategy

# Daily backups
Database: Full backup daily, transaction log every 15 minutes
User Data: Incremental backup daily, full backup weekly
Configuration: Version controlled in Git

# Backup Retention
Daily: 30 days
Weekly: 12 weeks  
Monthly: 12 months

Recovery Procedures

  1. Service Outage: Switch to backup region
  2. Data Corruption: Restore from latest backup
  3. Security Breach: Isolate, patch, restore
  4. Infrastructure Failure: Failover to secondary zone

📞 Support and Maintenance

Regular Maintenance

  • Security Updates: Monthly security patching
  • Dependency Updates: Quarterly dependency updates
  • Performance Review: Monthly performance analysis
  • Capacity Planning: Quarterly capacity review

Support Escalation

  1. Level 1: Basic configuration and user issues
  2. Level 2: Advanced troubleshooting and debugging
  3. Level 3: Core platform issues and security incidents
  4. Level 4: Vendor support and critical infrastructure

🎯 Next Steps

Choose your deployment path:

📚 Related Documentation