$ man context-wiki/multi-model-optimization

Infrastructureadvanced

Multi-Model AI Optimization

How to run 4 AI models for under $1/day with build-time static data


The Problem with One Model

Most people pick one AI model and use it for everything. That is like using a sledgehammer to hang a picture frame. It works but you are overpaying and overbuilding. I was running Claude Opus for 104 daily cron API calls at $75/M output tokens. WhatsApp self-chat loop was generating cascade replies. The bill was $50/day for work that a local 14B model could handle. The fix was not to stop using AI. It was to route each task to the right model based on one question: does a human read the output?
PATTERN

The 4-Model Squad

1. Ollama / Qwen 2.5 14B (FREE, local)
Runs on Mac Mini M4 Pro. Handles all repetitive cron work: commit tracking, RSS monitoring, dashboard data generation, status reports. These tasks run 4+ times per day. At Opus pricing, that was 96 API calls burning real money for structured data extraction that a 14B model handles fine. M4 Pro with 24GB runs it at ~9GB VRAM.

2. Claude Sonnet 4 ($15/M output)
All conversations, orchestration, and agent coordination. This is the main agent. Chat, WhatsApp, Discord, memory management. Sonnet is 5x cheaper than Opus but handles conversation just as well. Quality difference only matters for long-form content.

3. Claude Opus 4 ($75/M output)
Reserved for content creation only. Blog posts, Substack essays, LinkedIn drafts, deep analysis. Content creation is where model quality directly maps to output quality. You can feel the difference in a 2000-word essay. For a commit summary? Zero difference.

4. Claude Code / Opus 4.6 (FREE, Max subscription)
The infrastructure layer. Debugging, deployments, git operations, architecture decisions, quality review on what the other models ship. Unlimited usage via subscription. No per-token cost. This is where you do the heavy lifting.
FORMULA

The Decision Framework

One question decides the model: does the output quality matter to a human reader?

Cron jobs (tracking, monitoring, updates) -> Ollama local. Runs frequently, output is structured data, quality does not matter.
Conversations, routing, memory -> Sonnet. Good enough for real-time interaction, 5x cheaper than Opus.
Blog posts, essays, content -> Opus. Quality is the product, humans read this.
Infrastructure, debugging, deploys -> Claude Code. Free via subscription, needs full codebase context.

If no human reads the output, use the cheapest model that works. If a human reads the output, use the best model you can afford. If it touches infrastructure, use Claude Code.
PATTERN

Build-Time Static JSON Pattern

This is the key architectural pattern that makes dashboards work on Vercel. The problem: API routes that read local files like ~/.openclaw/workspace/HEARTBEAT.md or run execSync("git log") work perfectly on localhost but completely break on Vercel. The build server cannot access your laptop filesystem.

The solution: generate static JSON at build time. Commit the JSON. Vercel serves it.

A generator script reads all local data sources (markdown files, cron job configs, git log) and writes structured JSON to public/data/*.json. The API routes read from those JSON files instead of absolute paths. The generator runs in prebuild so every deploy gets fresh data. A cron job regenerates the data, commits, and pushes. The push triggers a Vercel rebuild with updated data.

Five JSON files cover the entire dashboard: tasks.json from HEARTBEAT.md checkboxes, calendar.json from git commits and cron schedules, memories.json from workspace memory files, team.json from cron job model stats, status.json from status update markdown.
CODE

How the API Routes Changed

Before: every API route had hardcoded absolute paths and shell commands.

const heartbeatPath = "/Users/shawnos.ai/.openclaw/workspace/HEARTBEAT.md"
execSync("git log --since=...", { cwd: "/Users/shawnos.ai/shawn-gtme-os" })
const jobsPath = "/Users/shawnos.ai/.openclaw/cron/jobs.json"

After: every API route reads from a relative static JSON file.

const dataPath = path.join(process.cwd(), "public/data/tasks.json")
const data = JSON.parse(fs.readFileSync(dataPath, "utf8"))
return data.tasks || []

No execSync. No absolute paths. No filesystem dependencies. Works anywhere. The generator script handles all the local filesystem access at build time so the production API never needs it.
PATTERN

Keeping Data Fresh

The generator runs in two places. First, prebuild in package.json runs it before next build, so every deploy gets fresh data. Second, a cron job on the local machine runs the generator, commits the new JSONs, and pushes to GitHub. The push triggers a Vercel deploy.

Data freshness depends on cron frequency. Every 30 minutes means the dashboard is never more than 30 minutes stale. For a status dashboard, that is plenty. You do not need WebSockets or real-time subscriptions. Cron plus git push plus rebuild is simple, reliable, and free.
PRO TIP

What This Actually Saved

Before: all Opus, all the time. 104 daily cron API calls at $75/M output. WhatsApp self-chat loop generating cascade replies. ~$50/day in API costs.

After: 96 cron calls moved to free local Ollama. 8 remaining API calls on Sonnet at $15/M. Content creation on Opus at 1-2 calls per day. Self-chat loop killed with 3s debounce. ~$0.50/day in API costs.

That is a 99% cost reduction. Same output quality where it matters. The dashboard works on Vercel. The team roster shows actual model stats from real cron job data. The tasks board reads real checkboxes from HEARTBEAT.md. No mock data. No localhost dependencies. Ship it.
ANTI-PATTERN

Common Mistakes

Using Opus for everything. The biggest money pit. Most tasks do not need frontier-level reasoning. A 14B local model handles structured data extraction just fine.

Reading local files from serverless functions. Your production server is not your laptop. Absolute paths will always fail on Vercel, AWS Lambda, or any cloud platform. Generate the data before deploy.

No fallback when data is missing. Always return empty arrays instead of crashing. If tasks.json does not exist yet, return []. The dashboard should render empty, not error out.

Running git commands in API routes. execSync("git log") works locally but fails in production where there is no git repo. Move it to build time.

Over-engineering the refresh. You do not need WebSockets for a status dashboard. Cron plus git push plus rebuild is simple and reliable.

knowledge guide
See "Agent" in Knowledge See "Vercel" in Knowledge

related entries
Model SelectionCron JobsDeployments and VercelSkills
← context wikiknowledge guide →
ShawnOS.ai|theGTMOS.ai|theContentOS.ai
built with Next.js · Tailwind · Claude · Remotion