Ticket: TBD
Date: 2026-04-12
Same server, same code, same data:
Docker's default 64MB /dev/shm causes DuckDB to spill to overlayfs (copy-on-write),
seccomp filters add overhead to WASM JIT, and the overlay filesystem multiplies I/O cost.
The result is a 6-8x slowdown that makes 30-minute data refresh cycles impossible.
The dev-evidence systemd service already proves bare-metal Evidence works on this server.
We just need to productionize that pattern.
Replace: Coolify → Docker (nginx + evidence containers) → server.cjs
With: Traefik → server.cjs via pm2 (bare metal, no Docker)
Before (Docker):
Browser → Traefik (443) → Docker nginx (80) → Docker evidence:3000 (server.cjs)
↓
cron → rebuild.sh → npm run sources/build
↓
/data/build_live/ (Docker volume)
After (bare metal):
Browser → Traefik (443) → host.docker.internal:3001 (server.cjs via pm2)
↓
cron → rebuild.sh → npm run sources/build
↓
/home/dev/hc-evidence/build_live/
File: /home/dev/hc-evidence/scripts/rebuild-prod.sh
Adapted from docker/rebuild.sh but for bare-metal paths:
#!/bin/bash
# Production rebuild — bare metal, no Docker
LOCKFILE="/tmp/evidence-rebuild.lock"
if [ -f "$LOCKFILE" ]; then
echo "[$(date)] Rebuild already running, skipping."
exit 0
fi
trap "rm -f $LOCKFILE" EXIT
touch "$LOCKFILE"
echo "[$(date)] Starting rebuild..."
export PATH="/home/dev/.nvm/versions/node/v22.22.2/bin:$PATH"
export NODE_OPTIONS="--max-old-space-size=16384"
cd /home/dev/hc-evidence
# Pull latest changes
git pull origin main --ff-only -q \
&& echo "[$(date)] git pull: up to date." \
|| echo "[$(date)] git pull: skipped (conflict or detached HEAD)."
# Fetch fresh data
npm run sources
# Write build info
DEPLOYED_AT=$(cat /home/dev/hc-evidence/deployed_at.txt 2>/dev/null || echo "")
DATA_REFRESHED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
cat > /home/dev/hc-evidence/static/build-info.json << ENDJSON
{
"deployed_at": "$DEPLOYED_AT",
"data_refreshed_at": "$DATA_REFRESHED_AT"
}
ENDJSON
# Build
if npm run build:strict; then
rsync -a --delete /home/dev/hc-evidence/build/ /home/dev/hc-evidence/build_live/
echo "[$(date)] Rebuild complete."
else
echo "[$(date)] WARNING: Build failed, keeping previous build."
fi
File: /etc/systemd/system/evidence.service
[Unit]
Description=Evidence BI Production Server
After=network.target
[Service]
Type=simple
User=dev
WorkingDirectory=/home/dev/hc-evidence
Environment=PATH=/home/dev/.nvm/versions/node/v22.22.2/bin:/usr/local/bin:/usr/bin:/bin
Environment=NODE_OPTIONS=--max-old-space-size=8192
EnvironmentFile=/home/dev/hc-evidence/.env
ExecStartPre=/bin/bash -c 'test -d /home/dev/hc-evidence/build_live || mkdir -p /home/dev/hc-evidence/build_live'
ExecStart=/home/dev/.nvm/versions/node/v22.22.2/bin/node /home/dev/hc-evidence/docker/server.cjs
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Note: server.cjs needs two tweaks for bare-metal:
const PORT = 3000 to const PORT = process.env.EVIDENCE_PORT || 3001path.join(__dirname, '..', 'build_live') resolves to/home/dev/hc-evidence/build_live when run from docker/ — this is correct.# As dev user
crontab -e
# Add:
*/30 * * * * /home/dev/hc-evidence/scripts/rebuild-prod.sh >> /home/dev/hc-evidence/logs/rebuild.log 2>&1
Also create the log directory:
mkdir -p /home/dev/hc-evidence/logs
File: /data/coolify/proxy/dynamic/evidence.yaml
Change from Docker container to host:
http:
routers:
evidence-https:
entryPoints:
- https
rule: Host(`evidence.sayhellocollege.com`)
service: evidence-prod
tls:
certResolver: letsencrypt
priority: 100
evidence-http:
entryPoints:
- http
rule: Host(`evidence.sayhellocollege.com`)
middlewares:
- redirect-to-https
service: evidence-prod
priority: 100
services:
evidence-prod:
loadBalancer:
servers:
- url: http://host.docker.internal:3001
This is the same pattern used by dev-evidence (port 4000) and dev-portal (port 3000).
Production Evidence will use port 3000. Dev evidence stays on port 4000.
Port conflict check: dev-portal also uses port 3000 via host.docker.internal:3001.
Either change Evidence to port 3001, or confirm dev-portal is not running on hc-central
(it may be on hc-vps).
Before switching traffic, run the first bare-metal build:
# As dev user
cd /home/dev/hc-evidence
date -u +"%Y-%m-%dT%H:%M:%SZ" > deployed_at.txt
./scripts/rebuild-prod.sh
Verify build_live/ is populated and index.html exists.
# Start production Evidence
sudo systemctl enable evidence
sudo systemctl start evidence
# Verify it's serving
curl -s http://localhost:3001/health
# Update Traefik config (step 4 above)
# Traefik watches the dynamic config directory — changes take effect immediately
# Verify production
curl -sk https://evidence.sayhellocollege.com/build-info.json
# Stop the Docker containers
docker stop evidence-mbrtarzc8vpkcejj6yx7stea-* nginx-mbrtarzc8vpkcejj6yx7stea-*
# Disable in Coolify (via UI or API) to prevent auto-restart
curl -k -X PATCH "https://coolify-central.sayhellocollege.com/api/v1/applications/mbrtarzc8vpkcejj6yx7stea" \
-H "Authorization: Bearer $COOLIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"status": "stopped"}'
Do NOT delete the Coolify app yet — keep it as a rollback option.
curl http://localhost:3001/health returns 200https://evidence.sayhellocollege.com/build-info.json returns timestamps/home/dev/hc-evidence/logs/rebuild.log shows successIf anything goes wrong:
| File | Action |
|---|---|
/home/dev/hc-evidence/scripts/rebuild-prod.sh |
Create |
/etc/systemd/system/evidence.service |
Create |
/data/coolify/proxy/dynamic/evidence.yaml |
Modify (point to host:3000) |
/home/dev/hc-evidence/deployed_at.txt |
Create (initial timestamp) |
/home/dev/hc-evidence/logs/ |
Create directory |
| dev user crontab | Add rebuild cron entry |
server.cjs — same auth, same CSP headers, same static file servingrebuild.sh logic — same git pull → sources → build → rsync pattern.env — same environment variablesbuild_live/ directorydocker-compose.yml, Dockerfile, docker/nginx.conf, docker/nginx.Dockerfiledocker/entrypoint.sh (replaced by systemd)| Metric | Docker (current) | Bare metal (target) |
|---|---|---|
| Build time | 75+ min (incomplete) | ~10 min |
| Data refresh cycle | Broken (build never finishes) | Every 30 min |
| Memory overhead | Container + overlay + cgroups | Direct process |
| Operational complexity | Coolify + Docker + nginx + server.cjs | systemd + server.cjs |
| Failure modes | Container health, volume mount, overlay fs | Process crash (auto-restart) |
| Rollback | Restart containers | Revert Traefik config |