Full monthly cost breakdown
| Model | $/call | Monthly cost |
|---|---|---|
| โญ Gemini 2.5 Flash | $0.00 | $2 |
| GPT-4o-mini | $0.00 | $5 |
| DeepSeek V3 | $0.00 | $9 |
| Claude Haiku 4 | $0.00 | $30 |
| Gemini 2.5 Pro | $0.00 | $41 |
| GPT-4o | $0.01 | $82 |
| Claude Sonnet 4 | $0.01 | $113 |
| Claude Opus 4 | $0.06 | $566 |
How to actually lower your bill
- Prompt caching (Anthropic): 90% off for cache hits. Structure: static system prompt first, dynamic user input last.
- Prompt caching (OpenAI): 50% off for hits. Automatic โ no code change needed for calls over 1,024 tokens.
- Batch API: 50% off across OpenAI, Anthropic. 24-hour turnaround. Perfect for daily report generation, async summarization, evaluation runs.
- Model laddering: Use cheap model (Haiku/Flash/mini) for triage, expensive (Opus/4o/Pro) only when needed. Often cuts cost 70โ90%.
- Output tokens are 3โ5x more expensive than input. Cap response length aggressively.