Writing Ranking

ModelWriting Score
GPT-49.65
Llama-2-70b-chat9.3
Vicuna-13B9.25
Llama-2-7b-chat8.9
Llama-2-13b-chat8.85
Guanaco-33b8.6
mpt-7b-chat8.35
Nous-Hermes-13b7.75
Alpaca-13B6.7
falcon-40b-instruct6.05
rwkv-4-raven-14b5.725
oasst-sft-4-pythia-12b5.2
stablelm-tuned-alpha-7b3.425

Roleplay Ranking

ModelRoleplay Score
GPT-48.9
Guanaco-33b8.6
Llama-2-7b-chat7.7
Llama-2-70b-chat7.5
Llama-2-13b-chat7.5
Vicuna-13B7.175
mpt-7b-chat6.45
Nous-Hermes-13b6.375
oasst-sft-4-pythia-12b6.2
rwkv-4-raven-14b6
falcon-40b-instruct5.5
Alpaca-13B5.45
stablelm-tuned-alpha-7b4.75

Reasoning Ranking

ModelReasoning Score
GPT-49
Vicuna-13B5.85
Llama-2-70b-chat5.8
Llama-2-13b-chat5.1
Guanaco-33b4.7
Llama-2-7b-chat4.25
falcon-40b-instruct4.05
mpt-7b-chat3.85
Nous-Hermes-13b3.8
Alpaca-13B3.5
rwkv-4-raven-14b3.45
oasst-sft-4-pythia-12b3.3
stablelm-tuned-alpha-7b1.6

Math Ranking

ModelMath Score
GPT-46.8
Llama-2-13b-chat3.45
Llama-2-70b-chat3.3
Nous-Hermes-13b2.65
Vicuna-13B2.6
Guanaco-33b2.45
Llama-2-7b-chat2.4
mpt-7b-chat1.8
rwkv-4-raven-14b1.8
falcon-40b-instruct1.7
oasst-sft-4-pythia-12b1.65
stablelm-tuned-alpha-7b1.4
Alpaca-13B1.05

Coding Ranking

ModelCoding Score
GPT-48.55
falcon-40b-instruct3.4
Vicuna-13B3.25
Guanaco-33b3.25
Llama-2-70b-chat3.15
Llama-2-13b-chat3
Llama-2-7b-chat3
mpt-7b-chat2.94736842
rwkv-4-raven-14b2.75
Nous-Hermes-13b2.45
Alpaca-13B2.35
oasst-sft-4-pythia-12b2.25
stablelm-tuned-alpha-7b1.2

Extraction Ranking

ModelExtraction Score
GPT-49.375
Llama-2-70b-chat7.25
Llama-2-13b-chat6.925
Llama-2-7b-chat6.5
Guanaco-33b5.95
falcon-40b-instruct5.85
Vicuna-13B5.55
mpt-7b-chat5.15
Nous-Hermes-13b5.05
Alpaca-13B4.15
oasst-sft-4-pythia-12b3.4
rwkv-4-raven-14b1.65
stablelm-tuned-alpha-7b1.15

STEM Ranking

ModelSTEM Score
GPT-49.7
Guanaco-33b9.175
Llama-2-70b-chat8.925
Llama-2-7b-chat8.65
Llama-2-13b-chat8.625
Vicuna-13B7.975
Nous-Hermes-13b7.44736842
falcon-40b-instruct6.85
mpt-7b-chat6.6
oasst-sft-4-pythia-12b6.1
rwkv-4-raven-14b5.425
Alpaca-13B5.2
stablelm-tuned-alpha-7b3.9

Humanities Ranking

ModelHumanities Score
GPT-49.95
Llama-2-13b-chat9.75
Llama-2-70b-chat9.625
Guanaco-33b9.5
Vicuna-13B9.45
Nous-Hermes-13b9
Llama-2-7b-chat8.75
mpt-7b-chat8.4
falcon-40b-instruct7.95
Alpaca-13B7.85
oasst-sft-4-pythia-12b6.45
rwkv-4-raven-14b5.075
stablelm-tuned-alpha-7b4.6
,