ChirperBench Scores

Overall Leaderboard

Click any column header to sort. Highlighted rows mark highest score, lowest latency, and best score per second.

Model Metrics

Judge issues are model-output mistakes found by the judge. One result can have multiple issues; run failures are shown separately as statuses.

Telemetry Graphs

Outcome Graphs

Pass / Fail by Transcript Category
Judge Issues by Type
Issue Type by Severity

Case Matrix

Compare Outputs

Detailed Results