Compare Models
A side-by-side comparison across the emotional intelligence dimensions we measured.
Hover a line to inspect it · Each vertical axis is a metric, normalised within its range · Lines that cross are models trading position
| Metric | Claude Haiku 4.5 | Claude Opus 4.6 |
|---|---|---|
Composite Score Overall weighted score | 52.8 | 54.3 |
Emotion F1 | 13.6% | 13.8% |
VA Score | 27.6% | 25.0% |
Binary Acc | 83.2% | 84.4% |
Pairwise Acc | 56.0% | 63.7% |
Four-Branch | 79.8% | 76.3% |
PANAS | 88.5% | 90.8% |
Q1 Goals | 70.0% | 61.0% |
Q3 Fit | 37.0% | 45.5% |
Across Evaluation Modes
How does performance change when we give the model extra context (omniscient) or ask it to reason out loud (verbose)?
Claude Haiku 4.5
default 52.8
omniscient 53.4
verbose 51.5
Claude Opus 4.6
default 54.3
omniscient 55.0
verbose 53.5