- Behavior: Most advanced multimodal model with strong vision capabilities, 128K context.
- Usage: Faster, cheaper than GPT-4 Turbo; suitable for text, vision tasks, and advanced use cases.
- Latency: Optimized for low-latency performance.
- Pricing:
- Input Tokens: $2.50 / 1M ($1.25 cached or via Batch API)
- Output Tokens: $10.00 / 1M ($5.00 cached or via Batch API)
- Special Variants:
- Audio Preview: $100 / 1M input, $200 / 1M output
- Older versions maintain the same pricing.
- Behavior: Cost-efficient, smaller model; vision-enabled with 128K context.
- Usage: Designed for budget-friendly use, smarter and cheaper than GPT-3.5 Turbo.
- Latency: Lower resource requirements, optimized for quick results.
- Pricing:
- Input Tokens: $0.15 / 1M ($0.075 cached or via Batch API)
- Output Tokens: $0.60 / 1M ($0.30 cached or via Batch API)
- Behavior: Advanced reasoning model for complex tasks, 128K context.
- Usage: Designed for complex reasoning and decision-making tasks.
- Latency: Higher due to complex reasoning.
- Pricing:
- Input Tokens: $15.00 / 1M ($7.50 cached)
- Output Tokens: $60.00 / 1M
- Note: Includes internal reasoning tokens not visible in API responses.
- Behavior: Fast, cost-efficient reasoning model optimized for coding, math, and science tasks, 128K context.
- Usage: Tailored for technical use cases requiring advanced reasoning.
- Latency: Faster compared to o1-preview due to simpler architecture.
- Pricing:
- Input Tokens: $3.00 / 1M ($1.50 cached)
- Output Tokens: $12.00 / 1M
- Batch API: 50% discount for batched requests; responses returned within 24 hours.
- Prompt Caching: Cached prompts cost 50% less than uncached.
- Audio Costs: $100 / 1M input tokens, $200 / 1M output tokens for audio tasks.