Compare Models

A side-by-side comparison across the emotional intelligence dimensions we measured.

Hover a line to inspect it · Each vertical axis is a metric, normalised within its range · Lines that cross are models trading position

Metric	Claude Haiku 4.5	Claude Opus 4.6
Composite Score Overall weighted score	52.8	54.3
Emotion F1	13.6%	13.8%
VA Score	27.6%	25.0%
Binary Acc	83.2%	84.4%
Pairwise Acc	56.0%	63.7%
Four-Branch	79.8%	76.3%
PANAS	88.5%	90.8%
Q1 Goals	70.0%	61.0%
Q3 Fit	37.0%	45.5%

Across Evaluation Modes

How does performance change when we give the model extra context (omniscient) or ask it to reason out loud (verbose)?

Claude Haiku 4.5

default 52.8

omniscient 53.4

verbose 51.5

Claude Opus 4.6

default 54.3

omniscient 55.0

verbose 53.5