Why I Stopped Paying for API Calls and Built My Own AI Chat

Q: how do you build an AI chat without API costs?

spawn `claude -p` as a child process from a Next.js API route. pipe the JSON stream back to the browser as server-sent events. point a Cloudflare Tunnel at your local machine. that's the entire backend. one afternoon to build, zero ongoing API costs.

Shawn Tenam·Feb 22, 2026·5 min read·context-engineering

tl;dr: I was spending $50-100/day on API calls for a separate AI chat platform. Then I realized Claude Code Max already gives unlimited CLI access. One afternoon later, I had a PWA chat interface powered by claude -p with zero marginal cost. The repo is the system.

why did I stop paying for API calls?

two weeks ago I started building ShawnOS. the repo. a monorepo with three Next.js sites, agent skills, content pipelines, RPG progression. the operating system for running a one-person GTM engine.

a week ago I stood up Nio on OpenClaw. OpenClaw is GPT-based. the idea was a separate chat system for my AI agent. Nio would run cron jobs, write blog posts, manage content pipelines, update dashboards. real infrastructure, not a toy.

and it worked. but the API costs added up fast. Sonnet for daily ops. Opus for anything that needed real thinking. $50 to $100 a day depending on how much I was building. and I was always building. it was absolute madness.

I routed high-frequency crons to a local Ollama model (Qwen 2.5 14B). free, fast, good enough for commit tracking and status updates. but for anything that required actual intelligence... you need Claude.

then the real insight hit.

how does the repo-as-magic pattern work?

I was running two separate systems. OpenClaw for chat. the repo for everything else. but the models I actually depend on. Opus and Sonnet. they already speak with my repo. they amplify my voice and DNA in a way GPT can't. they read my soul files, my commit history, my content pipeline. they don't just respond. they compound.

in another world, OpenClaw would have been built on Anthropic instead of GPT. but it wasn't. and that gap made the answer obvious: you don't need two separate systems. the repo IS the system.

Claude Code Max. $200/month flat. unlimited CLI access via claude -p. I was already paying for it. using it every day to build the repo. never occurred to me that the same CLI that powers my coding workflow could power a chat interface.

claude -p lets you send a prompt, get a response, stream JSON output, and resume sessions. all from the terminal. all included in the subscription. no API key. no per-token billing. no usage caps.

the recursive nature of it is what makes it work. Claude builds the system that Claude powers. the model that writes the code is the model that runs inside it. that's not a cost optimization. that's a flywheel.

what did the build look like?

Next.js app. one API route. spawn claude -p as a child process. pipe the JSON stream back to the browser as server-sent events.

that's it. that's the entire backend.

the client is an iMessage-style PWA. dark theme, monospace font, typing indicators. send a message, get a streaming response. session IDs persist across conversations so Nio remembers context.

Cloudflare Tunnel pointed at my Mac Mini. now I text my AI from my phone. anywhere. zero API costs. zero infrastructure costs beyond the tunnel (free tier).

total time from idea to working PWA... one afternoon.

what is the soul file pattern?

here's where it gets interesting from an architecture perspective. Claude CLI has a flag called --append-system-prompt-file. point it at a markdown file and that file becomes part of the system prompt.

I wrote nio-soul.md. defines Nio's personality, capabilities, anti-slop rules, decision-making framework. everything that makes Nio... Nio. not a chatbot. infrastructure with opinions.

which means adding a new agent is just writing a new markdown file.

how does multi-agent expansion work?

one CLI. different soul files. different personalities. separate sessions.

that's what ShawnOS Chat is building towards. a multi-agent platform where each agent has its own personality file, accent color, bubble colors, and isolated session state. switch between agents in the UI. each one picks up where they left off.

Nio handles ops and infrastructure. an Architect agent handles system design. a Writer agent handles content in my voice. same Claude CLI underneath. same zero marginal cost.

the per-agent state lives in localStorage on the client and file-based memory on the server. each agent gets their own MEMORY.md, their own heartbeat file, their own daily snapshots. lightweight, portable, no database needed.

why isn't the code the IP?

anyone can spawn a CLI process. the pattern is what matters.

CLI-as-backend for personal AI infrastructure. session isolation per agent. soul files for personality injection. file-based memory systems. zero-marginal-cost architecture that scales with your subscription, not your usage.

this is the kind of thing that used to require a custom API integration, a database, authentication middleware, and a monthly cloud bill. now it requires a markdown file and a Next.js route.

what does this mean for builders?

if you're paying per-token for personal AI tooling and you have a Claude Code Max subscription... you're leaving money on the table.

the CLI is the API. your subscription is the infrastructure budget. everything else is just plumbing.

but more than that. if you're running your AI on a platform that doesn't speak with your codebase, you're building two systems when you only need one. the model that builds your infrastructure should be the model that powers it. that's not a shortcut. that's the architecture.

this is part of ShawnOS. the operating system I'm building in public for running a one-person GTM engine. started building the repo two weeks ago. already compounding.

frequently asked questions

how much do AI API calls cost? it depends on the model and usage volume. Sonnet and Opus API calls can run $50-200/day if you're building seriously. that's $1,500-6,000/month. the per-token model makes costs unpredictable and hard to cap.

can you use Claude without paying for API calls? yes. Claude Code Max at $200/month gives you unlimited CLI access via claude -p. no per-token billing. no usage caps. if you're already on the subscription, you can build a chat interface on top of the CLI with zero marginal cost.

what is the soul file pattern? a markdown file that defines an AI agent's personality, capabilities, and decision rules. you pass it to Claude Code with the --append-system-prompt-file flag. different soul files create different agents. same CLI, different personality, different job.

how do you build an AI chat without API costs? spawn claude -p as a child process from a Next.js API route. pipe the JSON stream back to the browser as server-sent events. point a Cloudflare Tunnel at your local machine. that's the entire backend. one afternoon to build, zero ongoing API costs.

keep reading

build yours.