Compare Models

A side-by-side comparison across the emotional intelligence dimensions we measured.

Hover a line to inspect it · Each vertical axis is a metric, normalised within its range · Lines that cross are models trading position

Emotion F10.1430.104VA Score0.2810.225Binary OM86.4%83.1%Binary HP1.0%-1.0%Pairwise65.6%44.0%4-Branch EQ0.8220.692PANAS0.9210.860Draft0.01-0.01
Metric
Claude Haiku 4.5
Claude Opus 4.6
Composite Score
Overall weighted score
52.854.3
Emotion F1
13.6%
13.8%
VA Score
27.6%
25.0%
Binary Acc
83.2%
84.4%
Pairwise Acc
56.0%
63.7%
Four-Branch
79.8%
76.3%
PANAS
88.5%
90.8%
Q1 Goals
70.0%
61.0%
Q3 Fit
37.0%
45.5%

Across Evaluation Modes

How does performance change when we give the model extra context (omniscient) or ask it to reason out loud (verbose)?

Claude Haiku 4.5
default 52.8
omniscient 53.4
verbose 51.5
Claude Opus 4.6
default 54.3
omniscient 55.0
verbose 53.5
View analysis
AttuneBench · Evaluating Emotional Intelligence in LLMs