Ticket: TBD
Date: 2026-04-12
We have two Evidence instances on the same server. Instead of one being "dev" (Vite HMR)
and one being "prod" (static build), make them identical — both run production builds
from the same codebase, same config, same infra. Then use Traefik port-swapping for
zero-downtime deploys with pre-verified builds.
Two instances: blue (port 3001) and green (port 3002). They share the same git
repo but have separate build output directories. At any time, one is "live" (serving
production traffic) and the other is "standby" (receiving the next build).
┌─ evidence-blue:3001 ─── /srv/evidence/blue/build_live/
Browser → Traefik (443) ──┤ (swap via config)
└─ evidence-green:3002 ─── /srv/evidence/green/build_live/
Both instances:
server.cjs via systemdbuild_live/ directory.env, same auth token, same CSP headersrebuild.sh on the standby instance onlygit pull → npm run sources → npm run build:strict → rsync to its build_live/standby-evidence.sayhellocollege.com)swap-prod.sh — rewrites Traefik config to point prod at the standby portRun swap-prod.sh again — it just flips back to the other instance.
Both instances share the same git repo at /home/dev/hc-evidence (one copy of source
code, node_modules, etc). But each has its own build output:
/srv/evidence/blue/build_live/ # blue instance serves from here
/srv/evidence/green/build_live/ # green instance serves from here
/srv/evidence/active # file containing "blue" or "green"
Using /srv/evidence/ rather than /home/dev/ keeps the deployment artifacts separate
from the source code.
Currently: const BUILD_DIR = path.join(__dirname, '..', 'build_live');
Change to: const BUILD_DIR = process.env.EVIDENCE_BUILD_DIR || path.join(__dirname, '..', 'build_live');
This lets each systemd service point to its own build directory.
evidence-blue.service:
[Unit]
Description=Evidence BI - Blue Instance
After=network.target
[Service]
Type=simple
User=dev
WorkingDirectory=/home/dev/hc-evidence
Environment=PATH=/home/dev/.nvm/versions/node/v22.22.2/bin:/usr/local/bin:/usr/bin:/bin
Environment=NODE_OPTIONS=--max-old-space-size=8192
Environment=EVIDENCE_PORT=3001
Environment=EVIDENCE_BUILD_DIR=/srv/evidence/blue/build_live
EnvironmentFile=/home/dev/hc-evidence/.env
ExecStart=/home/dev/.nvm/versions/node/v22.22.2/bin/node /home/dev/hc-evidence/docker/server.cjs
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
evidence-green.service: (identical except port 3002 and green path)
[Unit]
Description=Evidence BI - Green Instance
After=network.target
[Service]
Type=simple
User=dev
WorkingDirectory=/home/dev/hc-evidence
Environment=PATH=/home/dev/.nvm/versions/node/v22.22.2/bin:/usr/local/bin:/usr/bin:/bin
Environment=NODE_OPTIONS=--max-old-space-size=8192
Environment=EVIDENCE_PORT=3002
Environment=EVIDENCE_BUILD_DIR=/srv/evidence/green/build_live
EnvironmentFile=/home/dev/hc-evidence/.env
ExecStart=/home/dev/.nvm/versions/node/v22.22.2/bin/node /home/dev/hc-evidence/docker/server.cjs
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Remove the old evidence.service and dev-evidence.service.
scripts/rebuild-prod.sh changes:
#!/bin/bash
# Rebuild the STANDBY instance only. Never rebuild the live instance.
LOCKFILE="/tmp/evidence-rebuild.lock"
ACTIVE=$(cat /srv/evidence/active 2>/dev/null || echo "blue")
# Determine standby
if [ "$ACTIVE" = "blue" ]; then
STANDBY="green"
else
STANDBY="blue"
fi
STANDBY_DIR="/srv/evidence/$STANDBY/build_live"
if [ -f "$LOCKFILE" ]; then
echo "[$(date)] Rebuild already running, skipping."
exit 0
fi
trap "rm -f $LOCKFILE" EXIT
touch "$LOCKFILE"
echo "[$(date)] Rebuilding standby ($STANDBY)..."
export PATH="/home/dev/.nvm/versions/node/v22.22.2/bin:$PATH"
export NODE_OPTIONS="--max-old-space-size=16384"
cd /home/dev/hc-evidence
set -a
source /home/dev/hc-evidence/.env
set +a
git pull origin main --ff-only -q \
&& echo "[$(date)] git pull: up to date." \
|| echo "[$(date)] git pull: skipped."
npm run sources
DEPLOYED_AT=$(cat /srv/evidence/$STANDBY/deployed_at.txt 2>/dev/null || date -u +"%Y-%m-%dT%H:%M:%SZ")
DATA_REFRESHED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
cat > /home/dev/hc-evidence/static/build-info.json << ENDJSON
{
"deployed_at": "$DEPLOYED_AT",
"data_refreshed_at": "$DATA_REFRESHED_AT"
}
ENDJSON
if npm run build:strict; then
mkdir -p "$STANDBY_DIR"
rsync -a --delete /home/dev/hc-evidence/build/ "$STANDBY_DIR"/
echo "[$(date)] Standby ($STANDBY) rebuild complete. Ready to swap."
else
echo "[$(date)] WARNING: Build failed, standby not updated."
fi
scripts/swap-prod.sh:
#!/bin/bash
# Swap which instance serves production traffic.
# Rewrites Traefik config — takes effect instantly.
ACTIVE_FILE="/srv/evidence/active"
ACTIVE=$(cat "$ACTIVE_FILE" 2>/dev/null || echo "blue")
if [ "$ACTIVE" = "blue" ]; then
NEW_ACTIVE="green"
NEW_PORT=3002
else
NEW_ACTIVE="blue"
NEW_PORT=3001
fi
# Verify the target instance is healthy before swapping
if ! curl -sf "http://localhost:$NEW_PORT/health" > /dev/null 2>&1; then
echo "ERROR: $NEW_ACTIVE instance (port $NEW_PORT) is not healthy. Aborting swap."
exit 1
fi
# Write Traefik config
cat > /data/coolify/proxy/dynamic/evidence.yaml << EOF
http:
routers:
evidence-https:
entryPoints:
- https
rule: Host(\`evidence.sayhellocollege.com\`)
service: evidence-prod
tls:
certResolver: letsencrypt
priority: 100
evidence-http:
entryPoints:
- http
rule: Host(\`evidence.sayhellocollege.com\`)
middlewares:
- redirect-to-https
service: evidence-prod
priority: 100
evidence-standby-https:
entryPoints:
- https
rule: Host(\`standby-evidence.sayhellocollege.com\`)
service: evidence-standby
tls:
certResolver: letsencrypt
priority: 100
services:
evidence-prod:
loadBalancer:
servers:
- url: http://host.docker.internal:$NEW_PORT
evidence-standby:
loadBalancer:
servers:
- url: http://host.docker.internal:$([ "$NEW_PORT" = "3001" ] && echo "3002" || echo "3001")
EOF
echo "$NEW_ACTIVE" > "$ACTIVE_FILE"
date -u +"%Y-%m-%dT%H:%M:%SZ" > "/srv/evidence/$NEW_ACTIVE/deployed_at.txt"
echo "Swapped: $ACTIVE → $NEW_ACTIVE"
echo " Production: evidence.sayhellocollege.com → port $NEW_PORT ($NEW_ACTIVE)"
echo " Standby: standby-evidence.sayhellocollege.com → port $([ "$NEW_PORT" = "3001" ] && echo "3002" || echo "3001") ($ACTIVE)"
Add standby-evidence.sayhellocollege.com DNS A record pointing to 62.171.177.227.
Traefik handles the cert automatically. This gives a permanent URL to preview the
standby instance before swapping.
*/15 * * * * /home/dev/hc-evidence/scripts/rebuild-prod.sh >> /home/dev/hc-evidence/logs/rebuild.log 2>&1
Same as current. Cron always rebuilds the standby. After a swap, the old-prod (now
standby) gets rebuilt on the next cycle automatically.
/etc/systemd/system/evidence.service (replaced by blue/green)/etc/systemd/system/dev-evidence.service (replaced by blue/green)/home/dev/hc-evidence/dev-server.sh (no more Vite dev server in prod)dev-evidence entry from Traefik dev-environments.yaml| File | Action |
|---|---|
docker/server.cjs |
Add EVIDENCE_BUILD_DIR env var support |
scripts/rebuild-prod.sh |
Rewrite for blue-green (target standby only) |
scripts/swap-prod.sh |
Create (Traefik config swap + health check) |
/etc/systemd/system/evidence-blue.service |
Create |
/etc/systemd/system/evidence-green.service |
Create |
/srv/evidence/blue/build_live/ |
Create, seed from current build |
/srv/evidence/green/build_live/ |
Create, seed from current build |
/srv/evidence/active |
Create, initial value "blue" |
/data/coolify/proxy/dynamic/evidence.yaml |
Rewrite with blue/green + standby URL |
| DNS | Add standby-evidence.sayhellocollege.com A record |
| File | Action |
|---|---|
/etc/systemd/system/evidence.service |
Delete |
/etc/systemd/system/dev-evidence.service |
Delete |
/home/dev/hc-evidence/dev-server.sh |
Delete |
/home/dev/hc-evidence/build_live/ |
Delete (moved to /srv/) |
dev-evidence in dev-environments.yaml |
Remove |
| Scenario | Before | After |
|---|---|---|
| Code deploy | 11 min build, brief gap | Zero downtime swap |
| Data refresh | Every 15 min (standby rebuilds) | Every 15 min (same) |
| Verify before deploy | Not possible | Preview at standby-evidence.sayhellocollege.com |
| Rollback | Re-run rebuild (~11 min) | Run swap-prod.sh again (instant) |
| Dev testing | Vite dev server (different behavior) | Identical production build |