LLM Cost Ledger

Unit-economics for CakeAI · CSA · Lumi
v1.1
April 2026
25,400₫
USD → VND

Models
01 Prompt & knowledge base

Paste actual content to get real token counts. Load a preset below to seed with production CSA / Lumi content.

02 Conversation shape
Advanced: reasoning / output cap
1.0 = off. 3–5× is typical for thinking models (Claude xhigh, GPT-5 reasoning, Gemini thinking).
Hard ceiling on response tokens/turn (after reasoning multiplier).
03 Caching strategy

Cached tokens cost ~10% of base input rate. Write cost (1.25× input) amortized across conversations.

Cache write amortization
Write cost (1.25× base input) amortized over this many conversations. Anthropic 5-min TTL typically hits ~20–100; 1-hour TTL hits ~200+.
04 LLM settings
CSA is conversational — batch not applicable.
05 Tokenizer calibration

GPT uses the exact o200k_base tokenizer in-browser. Non-GPT families apply a characters-per-token ratio calibrated for Vietnamese. Override any value below.

Vietnamese calibration ratios — editable
Uses real tokenizer. ~2.7 chars/token on Vietnamese (measured).
Anthropic docs: 1.0–1.35×. claudecodecamp measurement: 1.32–1.47×. Vietnamese sits top of range.
SentencePiece 256K vocab. English ~4.2, Chinese ~2.4. Vietnamese ~3.3.
Official: ~1.6 chars/token on Chinese. Vietnamese estimated ~2.5.
06 Models to compare

Monthly cost · CSA

— conv/day × — turns

Model comparison · monthly spend

Model Provider Input $/M Cache $/M Output $/M Tokens /conv USD/month VND/month

Cost breakdown · cheapest selected model

Monthly projection by volume · USD

Monthly cost projection by volume

Sources & formulas. Prices verified April 2026 against vendor docs (platform.claude.com, openai.com/api/pricing, ai.google.dev/pricing, minimax.io). Hosted open-source prices via DeepInfra / SiliconFlow / OpenRouter. Claude 4.7 ratio (1.35× over 4.6) from Anthropic migration guide + claudecodecamp real-world measurement. Gemini SentencePiece Vietnamese extrapolated from CJK (~2.4 cpt) and English (~4.2 cpt) baselines. All ratios editable above — paste real usage.input_tokens values to calibrate precisely.