| Model | Input | Output | Cache Read | Quant | Context |
|---|---|---|---|---|---|
openai/gpt-oss-20b | $0.07 | $0.25 | — | mxfp4 | 131,072 |
openai/gpt-oss-120b | $0.09 | $0.36 | — | mxfp4 | 131,072 |
google/gemma-4-26B-A4B-it | $0.15 | $0.60 | $0.015 | managed | 262,144 |
Qwen/Qwen3-Next-80B-A3B-Instruct | $0.15 | $1.20 | — | managed | 262,144 |
Qwen/Qwen3-Next-80B-A3B-Thinking | $0.15 | $1.20 | — | managed | 262,144 |
deepseek-ai/DeepSeek-V4-Flash | $0.19 | $0.51 | — | managed | 1,048,576 |
Qwen/Qwen3-235B-A22B-Instruct-2507 | $0.22 | $0.88 | — | managed | 262,144 |
Qwen/Qwen3-Coder-480B-A35B-Instruct | $0.22 | $1.80 | $0.022 | managed | 262,144 |
MiniMaxAI/MiniMax-M2 | $0.30 | $1.20 | $0.03 | managed-fp8 | 196,608 |
deepseek-ai/DeepSeek-V3.2 | $0.56 | $1.68 | $0.056 | managed | 163,840 |
deepseek-ai/DeepSeek-V3.1 | $0.60 | $1.70 | $0.06 | managed | 163,840 |
moonshotai/Kimi-K2.5 | $0.60 | $3.00 | — | managed | 262,144 |
moonshotai/kimi-k2-thinking | $0.60 | $2.50 | $0.06 | managed-int4 | 262,144 |
zai-org/GLM-4.7 | $0.60 | $2.20 | — | managed | 200,000 |
meta-llama/Llama-3.3-70B-Instruct | $0.72 | $0.72 | — | managed | 128,000 |
moonshotai/Kimi-K2.6 | $0.95 | $4.00 | — | managed | 262,144 |
zai-org/GLM-5 | $1.00 | $3.20 | $0.10 | managed | 200,000 |
deepseek-ai/DeepSeek-R1-0528 | $1.35 | $5.40 | — | managed | 163,840 |
zai-org/GLM-5.2 | $1.49 | $4.62 | $0.27 | managed | 262,144 |
deepseek-ai/DeepSeek-V4-Pro | $1.74 | $3.48 | — | managed | 1,048,576 |
How billing works
- Input tokens are counted from the rendered prompt after applying the model’s chat template.
- Output tokens include generated text. For reasoning models,
reasoning_contenttokens count toward output. - Cache read tokens are cached input tokens reported by the backend. They appear in pricing only for models with non-zero cache-read rates.