Ticket: HEL-702
Date: 2026-04-10
Revert the 6-container Evidence architecture back to a single container. Fix all broken pages. Restore the simplicity and speed the system had before the April 1-2 infrastructure overhaul.
The git history tells the story. Before March 31, Evidence was a single container with a simple build. Then in a 48-hour burst (April 1-2), ~40 commits reshaped the entire infrastructure:
build:strict introduced, breaking several pagesEvery subsequent ticket (HEL-684 through HEL-700) has been fixing problems created by this burst.
I watched a full Coolify deploy cycle. All 6 containers started rebuild at 04:20 UTC:
| Container | Pages | Build Time | Status |
|---|---|---|---|
| growth | 4 | 11 min | ✓ Complete |
| operations | 3 | 11 min | ✓ Complete |
| outcomes | 3 | 13 min | ✓ Complete (SQL warning — gpa_distribution GROUP BY error) |
| service | 4 | 14 min | ✓ Complete |
| planning | 6 | 15+ min | ⚠️ Stuck on DuckDB Struct/Array errors |
| finance | 8 | 15+ min | ⚠️ Still building after 3 skipped crons |
11-15+ minutes to build 3-8 pages per container. And that's just wall clock — during those 15+ minutes, all 6 containers are:
Sources are duplicated 6×. Every container runs npm run sources which fetches ALL data (Postgres, HubSpot, Stripe) — not just the data its pages need. 6 containers × 51 source queries = 306 database queries per rebuild cycle. With cron running every 5 minutes, that's thousands of unnecessary queries per hour.
Vite overhead is per-container, not per-page. Each Evidence build loads all plugins, processes all source data into DuckDB, and initializes Vite — regardless of how many pages it compiles. Building 4 pages takes 11 min; building 8 takes 15+ min. The per-page cost is small; the fixed overhead dominates.
CPU contention. 6 concurrent Vite builds on 18 cores. Each build wants full CPU for its Vite compile step. They get ~3 cores each instead of 18.
Complexity cost. The multi-container split introduced:
%20, section-specific locations, Referer-based _app/ routing)EVIDENCE_SECTION page deletion that conflicts with git pullhc-central (62.171.177.227):
CPU: 18 cores
RAM: 94 GB (22 GB used, 66 GB free)
Disk: 679 GB (65 GB used)
There is no resource pressure. A single container with 8GB heap on 18 cores will build faster than 6 containers fighting over those same cores.
Replace the 6 evidence-* services + nginx with:
services:
nginx:
build:
context: .
dockerfile: docker/nginx.Dockerfile
restart: unless-stopped
depends_on:
- evidence
networks:
- default
evidence:
build:
context: .
dockerfile: Dockerfile
args:
EVIDENCE_SOURCE__postgres_heroku__connectionString: ${EVIDENCE_SOURCE__postgres_heroku__connectionString}
EVIDENCE_VAR__grad_year: ${EVIDENCE_VAR__grad_year}
EVIDENCE_TOKEN: ${EVIDENCE_TOKEN}
VITE_EVIDENCE_HIDE_SIDEBAR: "true"
EVIDENCE_HUBSPOT_TOKEN: ${EVIDENCE_HUBSPOT_TOKEN:-}
EVIDENCE_STRIPE_SECRET_KEY: ${EVIDENCE_STRIPE_SECRET_KEY:-}
SUPABASE_CONNECTION_STRING: ${SUPABASE_CONNECTION_STRING}
restart: unless-stopped
volumes:
- evidence-data:/data
networks:
- default
volumes:
evidence-data:
networks:
default:
driver: bridge
No section routing needed. Everything goes to the single container:
server {
listen 80;
proxy_read_timeout 120s;
proxy_connect_timeout 10s;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $http_x_forwarded_proto;
proxy_set_header Cookie $http_cookie;
location / {
proxy_pass http://evidence:3000;
}
}
No %20 encoding issues. No Referer-based routing. No section-specific locations.
Remove the EVIDENCE_SECTION ARG/ENV and the find command that deletes pages. All pages build in one pass.
Change cron from */5 to */15 (every 15 min). A single build will take ~15-20 min; running cron every 5 min just piles up "already running, skipping" messages.
pages/College Outcomes/college_decisions.md — gpa_distribution query: Accepted_Colleges missing from GROUP BY. Either add it to GROUP BY or wrap in ANY_VALUE().
Planning pages — DuckDB Struct/Array errors: convert struct columns to JSON in the source queries.
These were genuinely good changes that should be preserved:
SELECT id, note_type instead of SELECT *)| Metric | Current (6 containers) | After (1 container) |
|---|---|---|
| Source queries per cycle | 306 (51 × 6) | 51 (1×) |
| Concurrent Vite builds | 6 | 1 |
| CPU per build | ~3 cores | 18 cores |
| Database load | 6× | 1× |
| Build time (est.) | 11-20+ min | 10-15 min |
| nginx routing complexity | High (regex, %20, Referer) | None (single proxy_pass) |
| server.cjs reload issues | Yes | No (single container) |
| Failure visibility | Hidden in 6 separate logs | One log |
| Service Delivery 404 | Yes (routing bug) | No (no routing needed) |
docker-compose.yml — collapse 6 services to 1docker/nginx.conf — simplify to single proxy_passDockerfile — remove EVIDENCE_SECTION logicdocker/rebuild.sh — change cron interval to 15 min (in entrypoint.sh)docker/entrypoint.sh — update cron schedulepages/College Outcomes/college_decisions.md — fix gpa_distribution GROUP BYnpm run sources && npm run build:strict passes locally with all 32 pagesgpa_distribution query fixednpm run build:strict passes with all 32 pages