Free Tool

Compare LLM costs & performance

Select up to 3 models. Enter your usage. See real-time cost, latency, and quality comparisons.

1 — Select Models

2 — Your Usage

How many LLM calls per month

Average tokens per input prompt

Average tokens per output response

Results — update automatically

live

Data sources

OpenAI API pricing — 2025 Anthropic API pricing — 2025 Google AI pricing — 2025 Berkeley Function-Calling Leaderboard LM Arena benchmarks SWE-bench scores

Want continuous monitoring?

Benchwright runs your evaluation suite daily, tracks regressions, and alerts you before bad deploys reach production.

We'll email you when Benchwright ships new benchmarks.