Troubleshooting ECS Boot Errors

Troubleshooting ECS Boot Errors

Calliope Integration: This component is integrated into the Calliope AI platform. Some features and configurations may differ from the upstream project.

Error: Cannot find module ‘/home/calliope/WAIIDE-server/out/server-main.js’

This error indicates the WAIIDE Server files are missing from the Docker image.

Possible Causes

  1. Multi-stage build failure: The build stage that compiles WAIIDE didn’t complete
  2. Architecture mismatch: Image built for different architecture (arm64 vs amd64)
  3. Incomplete image push: Docker image was corrupted during push to registry

Solutions

1. Verify Image Architecture

# Check the image architecture
docker inspect calliopeai/waiide:latest | grep Architecture

# For ECS, ensure you're building for amd64:
docker buildx build --platform linux/amd64 -t calliopeai/waiide:latest .

2. Rebuild and Verify Locally

# Build the image locally
docker build -t waiide-test .

# Test the image
docker run -it --rm waiide-test ls -la /home/calliope/WAIIDE-server/

# Should see:
# - server-main.js or out/server-main.js
# - extensions/
# - product.json

3. Check Build Logs

During build, verify these steps complete successfully:

=== Running compile ===
=== Verifying WAIIDE Server build output ===
✓ server-main.js found at root

Error: cp: cannot stat ‘/home/calliope/scripts/jupyter_server_config.py’

This indicates the scripts directory wasn’t properly copied to the image.

Solutions

1. Verify Scripts in Image

docker run -it --rm calliopeai/waiide:latest ls -la /home/calliope/scripts/

Should contain:

  • jupyter_server_config.py
  • entrypoint-jupyterhub.sh
  • api_server.py
  • Other .py and .sh files

2. Emergency Fix (if scripts are missing)

Create a derived image:

FROM calliopeai/waiide:latest

# Copy missing scripts
COPY scripts /home/calliope/scripts
RUN chmod +x /home/calliope/scripts/*.sh && \
    chown -R calliope:calliope /home/calliope/scripts

Port Configuration Mismatch

Your logs show:

  • JUPYTERHUB_SERVICE_URL=http://0.0.0.0:8080/... (port 8080)
  • JUPYTERHUB_PORT=8070 (port 8070)
  • Script says “Starting ALL components on port 8080”

Solution

The container is correctly using PORT environment variable. Your ECS task definition should either:

  1. Use default port 8070:
{
  "environment": [
    {"name": "JUPYTERHUB_SERVICE_URL", "value": "http://0.0.0.0:8070/user/lmata/waiide/"}
    // Remove any PORT override
  ]
}
  1. Or explicitly set PORT to 8080:
{
  "environment": [
    {"name": "PORT", "value": "8080"},
    {"name": "JUPYTERHUB_SERVICE_URL", "value": "http://0.0.0.0:8080/user/lmata/waiide/"}
  ]
}

Complete Build Verification

Build Command for ECS (amd64)

# Build for amd64 architecture
docker buildx create --use
docker buildx build \
  --platform linux/amd64 \
  --push \
  -t your-registry/waiide:latest \
  .

Test Before Deploying to ECS

# Run with same environment as ECS
docker run -it --rm \
  -e JUPYTERHUB_USER=testuser \
  -e JUPYTERHUB_SERVICE_PREFIX=/user/testuser/waiide/ \
  -e JUPYTERHUB_SERVICE_URL=http://0.0.0.0:8080/user/testuser/waiide/ \
  -e PORT=8080 \
  -p 8080:8080 \
  calliopeai/waiide:latest

# In another terminal, check:
docker exec <container> ls -la /home/calliope/WAIIDE-server/
docker exec <container> ls -la /home/calliope/scripts/
docker exec <container> ps aux | grep node

Quick Debug Commands

# Check what's actually in the running container
docker exec <container> find /home -name "server-main.js" 2>/dev/null
docker exec <container> find /home -name "jupyter_server_config.py" 2>/dev/null
docker exec <container> env | grep -E "(PORT|JUPYTER)"

Root Cause Summary

The errors suggest the Docker image is incomplete or corrupted. This typically happens when:

  1. The multi-stage build fails silently
  2. The image is built for wrong architecture
  3. The build cache causes incomplete builds
  4. The image push to registry was interrupted

Recommended Action

  1. Rebuild with no cache: docker build --no-cache --platform linux/amd64 -t calliopeai/waiide:latest .
  2. Test locally before pushing to registry
  3. Verify image contents after push
  4. Check ECS task logs for startup sequence