All Tasks (15)
Summarization (5)
Classification (5)
Code Generation (5)
Your API key is sent directly to the provider — never stored.

Evaluating model...

Running tasks, scoring outputs, calculating metrics.

This takes 15-60 seconds depending on the model.

Evaluation Results

--
Overall Score

Category Breakdown

Individual Tasks

Markdown Report

Recent Evaluations