LLM Cost Ledger · CSA & Lumi

01 Prompt & knowledge base

Paste actual content to get real token counts. Load a preset below to seed with production CSA / Lumi content.

System prompt 0 tokens Knowledge base (per turn, cached) 0 tokens

02 Conversation shape

User msg tokens/turn

Response tokens/turn

Turns per conversation

Conversations / day

Advanced: reasoning / output cap

Reasoning multiplier on response

1.0 = off. 3–5× is typical for thinking models (Claude xhigh, GPT-5 reasoning, Gemini thinking).

Max output tokens cap

Hard ceiling on response tokens/turn (after reasoning multiplier).

03 Caching strategy

Cached tokens cost ~10% of base input rate. Write cost (1.25× input) amortized across conversations.

What to cache

Cache write amortization

Conversations before cache refresh

Write cost (1.25× base input) amortized over this many conversations. Anthropic 5-min TTL typically hits ~20–100; 1-hour TTL hits ~200+.

04 LLM settings

Context tier (Gemini Pro / GPT-5.4)

Batch API (50% off, Lumi only)

CSA is conversational — batch not applicable.

USD → VND

05 Tokenizer calibration

GPT uses the exact o200k_base tokenizer in-browser. Non-GPT families apply a characters-per-token ratio calibrated for Vietnamese. Override any value below.

Vietnamese calibration ratios — editable

GPT (o200k_base)

Uses real tokenizer. ~2.7 chars/token on Vietnamese (measured).

Claude 4.6 chars/token

Claude 4.7 × multiplier vs 4.6

Anthropic docs: 1.0–1.35×. claudecodecamp measurement: 1.32–1.47×. Vietnamese sits top of range.

Gemini chars/token

SentencePiece 256K vocab. English ~4.2, Chinese ~2.4. Vietnamese ~3.3.

MiniMax chars/token

Official: ~1.6 chars/token on Chinese. Vietnamese estimated ~2.5.

Qwen chars/token

DeepSeek chars/token

GLM chars/token

Gemma chars/token

06 Models to compare

Monthly cost · CSA

— conv/day × — turns

Model comparison · monthly spend

Model	Provider	Input $/M	Cache $/M	Output $/M	Tokens /conv	USD/month	VND/month

Cost breakdown · cheapest selected model

Monthly projection by volume · USD

Sources & formulas. Prices verified April 2026 against vendor docs (platform.claude.com, openai.com/api/pricing, ai.google.dev/pricing, minimax.io). Hosted open-source prices via DeepInfra / SiliconFlow / OpenRouter. Claude 4.7 ratio (1.35× over 4.6) from Anthropic migration guide + claudecodecamp real-world measurement. Gemini SentencePiece Vietnamese extrapolated from CJK (~2.4 cpt) and English (~4.2 cpt) baselines. All ratios editable above — paste real usage.input_tokens values to calibrate precisely.