Generic JupyterHub Integration Blueprint

Generic JupyterHub Integration Blueprint

Calliope Integration: This component is integrated into the Calliope AI platform. Some features and configurations may differ from the upstream project.

Overview

This blueprint shows how to transform any standalone containerized service into a JupyterHub-compatible service, based on the pattern implemented for WAIIDE (WAIIDE Server → JupyterHub integration).

Core Pattern: Dual-Mode Architecture

Transform a single-purpose service container into a dual-mode container:

  • Standalone Mode: Original service + API proxy
  • JupyterHub Mode: jupyterhub-singleuser + jupyter-server-proxy + original service

Implementation Recipe

Step 1: Environment Detection & Mode Selection

Create an entrypoint script that detects JupyterHub environment:

#!/bin/bash
# entrypoint-jupyterhub.sh

# Detect JupyterHub environment
if [ -n "$JUPYTERHUB_SERVICE_PREFIX" ] || [ -n "$JUPYTERHUB_USER" ] || [ -n "$JUPYTERHUB_API_TOKEN" ]; then
    MODE="jupyterhub"
else
    MODE="standalone"
fi

# Route to appropriate startup mode
if [ "$MODE" = "jupyterhub" ]; then
    start_jupyterhub_mode
else
    start_standalone_mode
fi

Key Environment Variables to Check:

  • JUPYTERHUB_SERVICE_PREFIX - URL prefix (e.g., /user/alice/myservice/)
  • JUPYTERHUB_USER - Username
  • JUPYTERHUB_API_TOKEN - OAuth token
  • JUPYTERHUB_SERVER_NAME - Named server identifier

Step 2: Permission Handling

Handle Docker user permissions properly:

# Start as root, fix permissions, then drop to target user
if [ "$(id -u)" = "0" ]; then
    echo "🔧 Running as root - fixing permissions..."
    
    # Create user directories
    mkdir -p "$USER_HOME/workspace"
    mkdir -p "$USER_HOME/.local/share/jupyter/runtime"
    
    # Fix ownership (UID 1000, GID 100 - standard for Jupyter containers)
    chown -R 1000:100 "$USER_HOME"
    
    # Drop to non-root user
    exec su -s /bin/bash -c "exec $0 $@" $(getent passwd 1000 | cut -d: -f1)
fi

Step 3: Dual-Mode Service Architecture

JupyterHub Mode (Port Strategy)

┌─────────────────────────────────────────────────────────────┐
│                    Container (Port 8080)                    │
├─────────────────────────────────────────────────────────────┤
│  jupyterhub-singleuser (0.0.0.0:8080)                      │
│           ↓                                                 │
│  jupyter-server-proxy                                       │
│           ↓                                                 │
│  Original Service (127.0.0.1:8081)                         │
│                                                             │
│  URL: /user/{username}/proxy/8081/ → localhost:8081        │
└─────────────────────────────────────────────────────────────┘

Standalone Mode (Port Strategy)

┌─────────────────────────────────────────────────────────────┐
│                    Container (Port 8080)                    │
├─────────────────────────────────────────────────────────────┤
│  API Server (0.0.0.0:8080)                                 │
│           ↓                                                 │
│  Proxy to Original Service (127.0.0.1:8081)                │
│                                                             │
│  URL: /api → API endpoints                                  │
│  URL: /* → Original Service (proxied)                      │
└─────────────────────────────────────────────────────────────┘

Step 4: Create API Server with URL Rewriting

#!/usr/bin/env python3
"""
Generic API server that provides JupyterHub-compatible API endpoints
and proxies requests to the original service with URL path rewriting.
"""

import os
import json
from http.server import HTTPServer, BaseHTTPRequestHandler
from urllib.request import urlopen, Request
from urllib.parse import urlparse

class ServiceAPIHandler(BaseHTTPRequestHandler):
    def __init__(self, *args, **kwargs):
        self.service_host = '127.0.0.1'
        self.service_port = 8081  # Original service internal port
        
        # Get JupyterHub service prefix for URL rewriting
        self.service_prefix = os.environ.get('JUPYTERHUB_SERVICE_PREFIX', '')
        if self.service_prefix and not self.service_prefix.endswith('/'):
            self.service_prefix += '/'
        super().__init__(*args, **kwargs)
    
    def strip_prefix(self, path):
        """Strip JupyterHub service prefix from path"""
        if self.service_prefix and path.startswith(self.service_prefix):
            stripped = path[len(self.service_prefix)-1:]
            if not stripped:
                stripped = '/'
            return stripped
        return path
    
    def rewrite_content(self, content, content_type):
        """Rewrite URLs in content to include JupyterHub prefix"""
        # Implement URL rewriting for HTML/CSS/JS content
        # Pattern: Replace absolute paths with prefixed paths
        pass
    
    def handle_api_endpoints(self, path):
        """Handle JupyterHub-compatible API endpoints"""
        if path == '/api' or path == '/api/':
            self.send_api_response({
                "status": "running",
                "user": os.environ.get('JUPYTERHUB_USER', 'unknown'),
                "server": "your-service-name",
                "version": "1.0.0",
                "mode": "jupyterhub" if self.service_prefix else "standalone",
                "service_prefix": self.service_prefix,
                "endpoints": {
                    "api": f"{self.service_prefix}api",
                    "service": f"{self.service_prefix}"
                }
            })
            return True
        return False
    
    def proxy_to_service(self, stripped_path):
        """Proxy request to original service with URL rewriting"""
        # Implementation similar to WAIIDE's proxy_to_vscode method
        pass
    
    def do_GET(self):
        stripped_path = self.strip_prefix(self.path)
        
        # Handle API endpoints
        if self.handle_api_endpoints(stripped_path):
            return
        
        # Proxy to original service
        self.proxy_to_service(stripped_path)

Step 5: Jupyter Server Configuration

Create jupyter_server_config.py:

"""
Jupyter Server configuration for JupyterHub integration
"""
import os
from jupyter_server_proxy import IdentityProvider

# Configure jupyter-server-proxy
c.ServerProxy.servers = {
    'your-service': {
        'command': ['echo', 'Service started elsewhere'],
        'port': 8081,
        'timeout': 60,
        'absolute_url': False,
        'rewrite_response': True,
    }
}

# Permissive authentication for JupyterHub
c.IdentityProvider.identity_provider_class = IdentityProvider

Step 6: Docker Configuration

Update your Dockerfile:

# Install JupyterHub dependencies
RUN pip3 install --break-system-packages --no-cache-dir \
    jupyter-server-proxy[standalone] \
    jupyterhub

# Set standard Jupyter container UID/GID (1000:100)
RUN groupadd -g 100 users 2>/dev/null || true && \
    useradd -m -u 1000 -g 100 -s /bin/bash myuser && \
    usermod -aG sudo myuser

# Copy integration scripts
COPY --chmod=755 scripts/entrypoint-jupyterhub.sh /usr/local/bin/
COPY --chmod=755 scripts/api_server.py /usr/local/bin/
COPY --chmod=755 scripts/jupyter_server_config.py /usr/local/bin/

# Expose ports
EXPOSE 8080 8081

# Use bash entrypoint for flexibility
ENTRYPOINT ["/bin/bash", "-c"]
CMD ["exec /usr/local/bin/entrypoint-jupyterhub.sh"]

Step 7: JupyterHub Spawner Configuration

Configure your JupyterHub spawner:

# jupyterhub_config.py
c.JupyterHub.spawner_class = 'dockerspawner.DockerSpawner'
c.DockerSpawner.image = 'your-org/your-service:latest'
c.DockerSpawner.network_name = 'jupyterhub-network'
c.DockerSpawner.volumes = {
    'jupyterhub-user-{username}': '/home/{username}'
}
c.DockerSpawner.extra_create_kwargs = {'user': 'root'}  # For permission fixing
c.DockerSpawner.cmd = ''  # Use container's entrypoint
c.Spawner.default_url = '/proxy/8081/'  # Direct to your service

Testing Strategy

Unit Tests

  • Test environment detection logic
  • Test API endpoint responses
  • Test URL rewriting functions
  • Test proxy functionality

Integration Tests

  • Test standalone mode startup
  • Test JupyterHub mode startup
  • Test API compatibility with JupyterHub
  • Test URL path handling

End-to-End Tests

  • Test spawning from JupyterHub
  • Test service accessibility
  • Test OAuth flows
  • Test WebSocket connections (if applicable)

Common OAuth Fixes

Named Server OAuth Issues

# oauth_named_server_fix.py
"""Fix OAuth redirect URLs for named servers"""
def fix_oauth_redirect_url(url):
    # Remove service prefix from hub OAuth URLs
    if '/user/' in url and '/hub/api/oauth2' in url:
        return url.replace('/user/{username}/{servername}/hub/', '/hub/')
    return url

Scope Fixes

# jupyter_scope_fix.py
"""Fix OAuth scopes for named servers"""
def patch_oauth_scopes():
    # Add proper scopes for named server access
    pass

Key Implementation Files

Based on the WAIIDE implementation, you’ll need:

  1. Core Files (~1000 lines):

    • entrypoint-jupyterhub.sh - Main orchestration script
    • api_server.py - API server with proxy functionality
    • jupyter_server_config.py - Jupyter server configuration
  2. OAuth Fixes (~200 lines):

    • oauth_named_server_fix.py - Fix OAuth redirects
    • jupyter_scope_fix.py - Fix OAuth scopes
  3. Testing (~1000 lines):

    • test_api.py - API endpoint tests
    • test_entrypoint.py - Startup logic tests
    • test_url_rewriting.py - URL rewriting tests
    • run_tests.py - Test runner
  4. Documentation (~3000 lines):

    • Configuration guides
    • Troubleshooting guides
    • Architecture documentation

Service-Specific Adaptations

For Web Services

  • Focus on URL rewriting for HTML/CSS/JS content
  • Handle WebSocket upgrades if needed
  • Implement proper CORS headers

For API Services

  • Ensure API endpoints don’t conflict with JupyterHub paths
  • Handle authentication properly
  • Consider API versioning

For Desktop Applications (via web interface)

  • May need VNC/X11 forwarding
  • Consider noVNC integration
  • Handle clipboard/file transfer

Success Metrics

  • ✅ Service starts in both modes
  • ✅ API endpoints respond correctly
  • ✅ JupyterHub can health-check the service
  • ✅ URL rewriting works correctly
  • ✅ OAuth authentication works
  • ✅ Service is accessible through JupyterHub
  • ✅ WebSocket connections work (if applicable)

Troubleshooting Checklist

  1. Environment Detection: Check if JupyterHub variables are detected
  2. Permissions: Verify container can create user directories
  3. Port Configuration: Ensure no port conflicts
  4. URL Rewriting: Test with/without service prefix
  5. OAuth: Check for named server OAuth issues
  6. Proxy: Verify requests reach the original service

Advanced Features

Service Discovery

Implement /api/services endpoint for service discovery:

{
  "services": {
    "your-service": {
      "port": 8081,
      "status": "running",
      "description": "Your Service Description"
    }
  }
}

Health Monitoring

Add health check endpoints:

def check_service_health(self):
    """Check if original service is responding"""
    try:
        with socket.create_connection((self.service_host, self.service_port), timeout=2):
            return True
    except:
        return False

Custom URL Patterns

Support custom URL patterns beyond the standard /proxy/ pattern:

# Handle custom paths like /user/{username}/myservice/
c.DockerSpawner.default_url = '/myservice/'

Performance Considerations

  • Memory: Add ~500MB for JupyterHub components
  • CPU: Minimal overhead for proxy operations
  • Network: <5ms latency for proxy requests
  • Startup: Add 10-15 seconds for dual-mode initialization

Security Notes

  • Always start as root and drop privileges
  • Use standard Jupyter container UID/GID (1000:100)
  • Validate all URL rewrites to prevent injection
  • Implement proper CORS headers for API endpoints
  • Use secure WebSocket connections when possible

This blueprint provides a comprehensive pattern for transforming any containerized service into a JupyterHub-compatible service. The total implementation typically requires ~2000 lines of WAIIDE across 10-15 files, but provides robust dual-mode operation with full JupyterHub integration.