Question 1

Why is there sometimes a 100x price difference between models?

Accepted Answer

Frontier models (Opus, GPT-4o) are priced at the expensive capability ceiling. Budget-tier models (Haiku, GPT-4o-mini, Flash, DeepSeek) share infrastructure with smaller parameter counts. For many tasks — classification, simple extraction, routing — the cheap models match frontier quality at 5–100x less cost.

Question 2

Do these prices include the free tier?

Accepted Answer

No. Most providers offer $5–$100 in free credits for new accounts, but this calculator shows per-token pricing only. Factor in free tier for your first month's real-world cost.

Question 3

What about self-hosted open-source models?

Accepted Answer

Llama 3, Qwen 2.5, Mistral are free for the model weights but you pay for GPU compute. Typical rule of thumb: self-hosting breaks even vs API at ~10M–50M tokens/day depending on model size. Below that, hosted APIs win.

Question 4

Does this include embedding costs?

Accepted Answer

No — only chat completion / generation. Embedding models (text-embedding-3-small, text-embedding-3-large, Voyage, Cohere Embed) are priced separately at $0.02–$0.13 per 1M tokens.

Model	$/call	Monthly cost
⭐ Gemini 2.5 Flash	$0.00	$2
GPT-4o-mini	$0.00	$5
DeepSeek V3	$0.00	$9
Claude Haiku 4	$0.00	$30
Gemini 2.5 Pro	$0.00	$41
GPT-4o	$0.01	$82
Claude Sonnet 4	$0.01	$113
Claude Opus 4	$0.06	$566

LLM API Cost Calculator (GPT, Claude, Gemini, DeepSeek)

Full monthly cost breakdown

How to actually lower your bill

Frequently Asked Questions