$ man context-wiki/multi-model-optimization
Infrastructureadvanced
Multi-Model AI Optimization
How to run 4 AI models for under $1/day with build-time static data
The Problem with One Model
Most people pick one AI model and use it for everything. That is like using a sledgehammer to hang a picture frame. It works but you are overpaying and overbuilding. I was running Claude Opus for 104 daily cron API calls at $75/M output tokens. WhatsApp self-chat loop was generating cascade replies. The bill was $50/day for work that a local 14B model could handle. The fix was not to stop using AI. It was to route each task to the right model based on one question: does a human read the output?
PATTERN
The 4-Model Squad
1. Ollama / Qwen 2.5 14B (FREE, local)
Runs on Mac Mini M4 Pro. Handles all repetitive cron work: commit tracking, RSS monitoring, dashboard data generation, status reports. These tasks run 4+ times per day. At Opus pricing, that was 96 API calls burning real money for structured data extraction that a 14B model handles fine. M4 Pro with 24GB runs it at ~9GB VRAM.
2. Claude Sonnet 4 ($15/M output)
All conversations, orchestration, and agent coordination. This is the main agent. Chat, WhatsApp, Discord, memory management. Sonnet is 5x cheaper than Opus but handles conversation just as well. Quality difference only matters for long-form content.
3. Claude Opus 4 ($75/M output)
Reserved for content creation only. Blog posts, Substack essays, LinkedIn drafts, deep analysis. Content creation is where model quality directly maps to output quality. You can feel the difference in a 2000-word essay. For a commit summary? Zero difference.
4. Claude Code / Opus 4.6 (FREE, Max subscription)
The infrastructure layer. Debugging, deployments, git operations, architecture decisions, quality review on what the other models ship. Unlimited usage via subscription. No per-token cost. This is where you do the heavy lifting.
Runs on Mac Mini M4 Pro. Handles all repetitive cron work: commit tracking, RSS monitoring, dashboard data generation, status reports. These tasks run 4+ times per day. At Opus pricing, that was 96 API calls burning real money for structured data extraction that a 14B model handles fine. M4 Pro with 24GB runs it at ~9GB VRAM.
2. Claude Sonnet 4 ($15/M output)
All conversations, orchestration, and agent coordination. This is the main agent. Chat, WhatsApp, Discord, memory management. Sonnet is 5x cheaper than Opus but handles conversation just as well. Quality difference only matters for long-form content.
3. Claude Opus 4 ($75/M output)
Reserved for content creation only. Blog posts, Substack essays, LinkedIn drafts, deep analysis. Content creation is where model quality directly maps to output quality. You can feel the difference in a 2000-word essay. For a commit summary? Zero difference.
4. Claude Code / Opus 4.6 (FREE, Max subscription)
The infrastructure layer. Debugging, deployments, git operations, architecture decisions, quality review on what the other models ship. Unlimited usage via subscription. No per-token cost. This is where you do the heavy lifting.
FORMULA
The Decision Framework
One question decides the model: does the output quality matter to a human reader?
Cron jobs (tracking, monitoring, updates) -> Ollama local. Runs frequently, output is structured data, quality does not matter.
Conversations, routing, memory -> Sonnet. Good enough for real-time interaction, 5x cheaper than Opus.
Blog posts, essays, content -> Opus. Quality is the product, humans read this.
Infrastructure, debugging, deploys -> Claude Code. Free via subscription, needs full codebase context.
If no human reads the output, use the cheapest model that works. If a human reads the output, use the best model you can afford. If it touches infrastructure, use Claude Code.
Cron jobs (tracking, monitoring, updates) -> Ollama local. Runs frequently, output is structured data, quality does not matter.
Conversations, routing, memory -> Sonnet. Good enough for real-time interaction, 5x cheaper than Opus.
Blog posts, essays, content -> Opus. Quality is the product, humans read this.
Infrastructure, debugging, deploys -> Claude Code. Free via subscription, needs full codebase context.
If no human reads the output, use the cheapest model that works. If a human reads the output, use the best model you can afford. If it touches infrastructure, use Claude Code.
PATTERN
Build-Time Static JSON Pattern
This is the key architectural pattern that makes dashboards work on Vercel. The problem: API routes that read local files like
The solution: generate static JSON at build time. Commit the JSON. Vercel serves it.
A generator script reads all local data sources (markdown files, cron job configs, git log) and writes structured JSON to
Five JSON files cover the entire dashboard:
~/.openclaw/workspace/HEARTBEAT.md or run execSync("git log") work perfectly on localhost but completely break on Vercel. The build server cannot access your laptop filesystem.The solution: generate static JSON at build time. Commit the JSON. Vercel serves it.
A generator script reads all local data sources (markdown files, cron job configs, git log) and writes structured JSON to
public/data/*.json. The API routes read from those JSON files instead of absolute paths. The generator runs in prebuild so every deploy gets fresh data. A cron job regenerates the data, commits, and pushes. The push triggers a Vercel rebuild with updated data.Five JSON files cover the entire dashboard:
tasks.json from HEARTBEAT.md checkboxes, calendar.json from git commits and cron schedules, memories.json from workspace memory files, team.json from cron job model stats, status.json from status update markdown.CODE
How the API Routes Changed
Before: every API route had hardcoded absolute paths and shell commands.
After: every API route reads from a relative static JSON file.
No execSync. No absolute paths. No filesystem dependencies. Works anywhere. The generator script handles all the local filesystem access at build time so the production API never needs it.
const heartbeatPath = "/Users/shawnos.ai/.openclaw/workspace/HEARTBEAT.md"execSync("git log --since=...", { cwd: "/Users/shawnos.ai/shawn-gtme-os" })const jobsPath = "/Users/shawnos.ai/.openclaw/cron/jobs.json"After: every API route reads from a relative static JSON file.
const dataPath = path.join(process.cwd(), "public/data/tasks.json")const data = JSON.parse(fs.readFileSync(dataPath, "utf8"))return data.tasks || []No execSync. No absolute paths. No filesystem dependencies. Works anywhere. The generator script handles all the local filesystem access at build time so the production API never needs it.
PATTERN
Keeping Data Fresh
The generator runs in two places. First,
Data freshness depends on cron frequency. Every 30 minutes means the dashboard is never more than 30 minutes stale. For a status dashboard, that is plenty. You do not need WebSockets or real-time subscriptions. Cron plus git push plus rebuild is simple, reliable, and free.
prebuild in package.json runs it before next build, so every deploy gets fresh data. Second, a cron job on the local machine runs the generator, commits the new JSONs, and pushes to GitHub. The push triggers a Vercel deploy.Data freshness depends on cron frequency. Every 30 minutes means the dashboard is never more than 30 minutes stale. For a status dashboard, that is plenty. You do not need WebSockets or real-time subscriptions. Cron plus git push plus rebuild is simple, reliable, and free.
PRO TIP
What This Actually Saved
Before: all Opus, all the time. 104 daily cron API calls at $75/M output. WhatsApp self-chat loop generating cascade replies. ~$50/day in API costs.
After: 96 cron calls moved to free local Ollama. 8 remaining API calls on Sonnet at $15/M. Content creation on Opus at 1-2 calls per day. Self-chat loop killed with 3s debounce. ~$0.50/day in API costs.
That is a 99% cost reduction. Same output quality where it matters. The dashboard works on Vercel. The team roster shows actual model stats from real cron job data. The tasks board reads real checkboxes from HEARTBEAT.md. No mock data. No localhost dependencies. Ship it.
After: 96 cron calls moved to free local Ollama. 8 remaining API calls on Sonnet at $15/M. Content creation on Opus at 1-2 calls per day. Self-chat loop killed with 3s debounce. ~$0.50/day in API costs.
That is a 99% cost reduction. Same output quality where it matters. The dashboard works on Vercel. The team roster shows actual model stats from real cron job data. The tasks board reads real checkboxes from HEARTBEAT.md. No mock data. No localhost dependencies. Ship it.
ANTI-PATTERN
Common Mistakes
Using Opus for everything. The biggest money pit. Most tasks do not need frontier-level reasoning. A 14B local model handles structured data extraction just fine.
Reading local files from serverless functions. Your production server is not your laptop. Absolute paths will always fail on Vercel, AWS Lambda, or any cloud platform. Generate the data before deploy.
No fallback when data is missing. Always return empty arrays instead of crashing. If tasks.json does not exist yet, return []. The dashboard should render empty, not error out.
Running git commands in API routes. execSync("git log") works locally but fails in production where there is no git repo. Move it to build time.
Over-engineering the refresh. You do not need WebSockets for a status dashboard. Cron plus git push plus rebuild is simple and reliable.
Reading local files from serverless functions. Your production server is not your laptop. Absolute paths will always fail on Vercel, AWS Lambda, or any cloud platform. Generate the data before deploy.
No fallback when data is missing. Always return empty arrays instead of crashing. If tasks.json does not exist yet, return []. The dashboard should render empty, not error out.
Running git commands in API routes. execSync("git log") works locally but fails in production where there is no git repo. Move it to build time.
Over-engineering the refresh. You do not need WebSockets for a status dashboard. Cron plus git push plus rebuild is simple and reliable.
knowledge guide
related entries