Troubleshooting: Successful Build but ECS Runtime Failures
Calliope Integration: This component is integrated into the Calliope AI platform. Some features and configurations may differ from the upstream project.
Symptoms
- Docker build completes successfully (30+ minutes, no errors)
- ECS container fails with:
Error: Cannot find module '/home/calliope/WAIIDE-server/out/server-main.js'cp: cannot stat '/home/calliope/scripts/jupyter_server_config.py'
Common Causes When Build Succeeds
1. Wrong Image Tag/Version in ECS
The most common issue - ECS is pulling a different image than what you built.
# Check what image ECS is actually using
aws ecs describe-task-definition --task-definition your-task-name | grep image
# Verify the image digest
docker inspect your-registry/waiide:latest | grep -A 5 "RepoDigests"
# Compare with what ECS pulled
aws ecs describe-tasks --cluster your-cluster --tasks your-task-id | grep imageDigest2. Registry Push/Pull Issues
Incomplete Push
# Verify the pushed image size (should be ~3-4GB)
aws ecr describe-images --repository-name waiide --image-ids imageTag=latest
# Or for Docker Hub
docker manifest inspect calliopeai/waiide:latestMulti-Architecture Confusion
# Check if you accidentally pushed multi-arch manifest
docker buildx imagetools inspect your-registry/waiide:latest
# ECS might be pulling wrong architecture
# Ensure you pushed specifically linux/amd64
docker buildx build --platform linux/amd64 --push -t your-registry/waiide:latest-amd64 .3. ECS Task Definition Cache
ECS might be using cached task definition:
# Force new task definition revision
aws ecs register-task-definition --cli-input-json file://task-def.json
# Force new deployment
aws ecs update-service --cluster your-cluster --service your-service --force-new-deployment4. Build vs Runtime Architecture
Your GitHub Actions runner (8GB) built for amd64, but verify:
# In your GitHub Actions workflow, add:
- name: Verify built image
run: |
docker run --rm your-registry/waiide:latest uname -m
docker run --rm your-registry/waiide:latest ls -la /home/calliope/WAIIDE-server/
docker run --rm your-registry/waiide:latest ls -la /home/calliope/scripts/Debugging Steps
1. Pull and Test the Exact Image ECS Uses
# Pull the exact image ECS is using
docker pull your-registry/waiide:latest
# Test it locally
docker run -it --rm your-registry/waiide:latest bash -c "
echo '=== Checking WAIIDE Server ==='
ls -la /home/calliope/WAIIDE-server/ | head -10
echo '=== Checking for server-main.js ==='
find /home/calliope -name 'server-main.js' 2>/dev/null | head -5
echo '=== Checking scripts ==='
ls -la /home/calliope/scripts/ | head -10
echo '=== Image architecture ==='
uname -m
"2. Compare Image Layers
# Check if the image has the expected layers
docker history your-registry/waiide:latest | grep -E "(WAIIDE|scripts|COPY)"3. ECS Exec Into Container
# Enable ECS Exec
aws ecs update-service --cluster your-cluster --service your-service --enable-execute-command
# Exec into running container
aws ecs execute-command --cluster your-cluster --task your-task-id --container waiide --interactive --command "/bin/bash"
# Inside container, check:
find / -name "server-main.js" 2>/dev/null
find / -name "jupyter_server_config.py" 2>/dev/null
ls -la /home/Quick Fix Solutions
1. Use Explicit Image Digest
Instead of using :latest tag:
{
"image": "your-registry/waiide@sha256:abc123...",
"taskDefinitionArn": "..."
}2. Tag with Unique Version
# In GitHub Actions
docker buildx build \
--platform linux/amd64 \
--push \
-t your-registry/waiide:$(git rev-parse --short HEAD) \
-t your-registry/waiide:latest \
.
# Update ECS to use specific tag
"image": "your-registry/waiide:a1b2c3d"3. Verify Push Completion
# Add to GitHub Actions after push
- name: Verify pushed image
run: |
docker pull ${{ env.REGISTRY }}/waiide:latest
docker run --rm ${{ env.REGISTRY }}/waiide:latest ls -la /home/calliope/WAIIDE-server/GitHub Actions Build Verification
Add this to your workflow to ensure build artifacts exist:
- name: Build and verify
run: |
docker buildx build --platform linux/amd64 --load -t waiide:test .
# Verify critical files before pushing
docker run --rm waiide:test bash -c "
test -f /home/calliope/WAIIDE-server/server-main.js || test -f /home/calliope/WAIIDE-server/out/server-main.js || exit 1
test -f /home/calliope/scripts/jupyter_server_config.py || exit 1
echo 'Build verification passed!'
"
# Only push if verification passed
docker buildx build --platform linux/amd64 --push -t ${{ env.REGISTRY }}/waiide:latest .Most Likely Issue
Given your build was successful, the most likely issue is:
- ECS is pulling an older/different image than what you just built
- The push to registry was incomplete despite appearing successful
- Multi-architecture manifest causing ECS to pull wrong variant
Immediate Action: Pull the exact image URL that ECS is using and test it locally to see what’s actually in it.