$ man how-to/credit-management
Cost Efficiencybeginner
Credit and Token Management
Understand what you are spending and where the tokens go
What Costs Money
Every AI interaction costs tokens. Tokens are roughly four characters of text. Every file Claude reads costs tokens (input). Every response Claude generates costs tokens (output). Every tool call Claude makes costs tokens (input and output). The context window is the total token budget for a single interaction. Files you load, chat history, rules, skills, system instructions — all of it competes for tokens. Understanding this changes how you interact with AI. Long chat histories burn tokens on context that might not be relevant anymore. Loading 10 files when you need 2 wastes tokens on irrelevant context. A 500-line CLAUDE.md consumes tokens every single session.
PATTERN
Where Tokens Go
System instructions and configuration: CLAUDE.md, rules, skill files that load automatically. This is your baseline cost per session.
File reads: every file the agent reads to understand your codebase. Larger files cost more. Reading a 2,000-line data file costs more than reading a 50-line config.
Chat history: every previous message in the conversation. Long conversations accumulate context. Eventually the context window fills and older messages get truncated.
Agent output: code generation, explanations, tool calls. Longer outputs cost more tokens.
The biggest token sinks are usually file reads (loading large files) and chat history (long conversations). Keep files focused and start new sessions for new tasks rather than continuing a single session for hours.
PRO TIP
Practical Strategies
Start new sessions for new tasks. A session about deploying your website does not need the chat history from your earlier session about writing a blog post. Fresh context means fewer wasted tokens.
Keep CLAUDE.md lean. Every line in CLAUDE.md costs tokens in every session. Move workflow instructions to skills (loaded on demand) and file patterns to rules (loaded conditionally).
Reference specific files instead of asking Claude to search. Saying "read website/packages/shared/data/clay-wiki.ts" costs less than saying "find the clay wiki data file" because the search requires reading multiple files.
Use fast models for simple tasks. Fast models cost roughly 3-5x less per token than capable models. If the task is mechanical, the cheaper model produces identical results.
FORMULA
The 80/20 of Token Budget
Eighty percent of your token budget goes to three things: file reads, chat history, and system context. Optimizing those three is the highest-leverage move.
File reads: be specific about which files to load. Do not say "read the entire data folder." Say "read how-to-wiki.ts."
Chat history: start fresh sessions for new topics. One focused session beats one marathon session.
System context: keep always-loaded context (CLAUDE.md, auto-rules) minimal. Move everything else to on-demand loading (skills, manual file references).
The remaining 20% is agent output. You cannot control how many tokens Claude uses to generate a response, but you can control how much context it has to process before generating. Less irrelevant context means faster, cheaper, and often better output.
knowledge guide
related guides