Writing Ranking
Model | Writing Score |
---|
GPT-4 | 9.65 |
---|
Llama-2-70b-chat | 9.3 |
---|
Vicuna-13B | 9.25 |
---|
Llama-2-7b-chat | 8.9 |
---|
Llama-2-13b-chat | 8.85 |
---|
Guanaco-33b | 8.6 |
---|
mpt-7b-chat | 8.35 |
---|
Nous-Hermes-13b | 7.75 |
---|
Alpaca-13B | 6.7 |
---|
falcon-40b-instruct | 6.05 |
---|
rwkv-4-raven-14b | 5.725 |
---|
oasst-sft-4-pythia-12b | 5.2 |
---|
stablelm-tuned-alpha-7b | 3.425 |
---|
Roleplay Ranking
Model | Roleplay Score |
---|
GPT-4 | 8.9 |
---|
Guanaco-33b | 8.6 |
---|
Llama-2-7b-chat | 7.7 |
---|
Llama-2-70b-chat | 7.5 |
---|
Llama-2-13b-chat | 7.5 |
---|
Vicuna-13B | 7.175 |
---|
mpt-7b-chat | 6.45 |
---|
Nous-Hermes-13b | 6.375 |
---|
oasst-sft-4-pythia-12b | 6.2 |
---|
rwkv-4-raven-14b | 6 |
---|
falcon-40b-instruct | 5.5 |
---|
Alpaca-13B | 5.45 |
---|
stablelm-tuned-alpha-7b | 4.75 |
---|
Reasoning Ranking
Model | Reasoning Score |
---|
GPT-4 | 9 |
---|
Vicuna-13B | 5.85 |
---|
Llama-2-70b-chat | 5.8 |
---|
Llama-2-13b-chat | 5.1 |
---|
Guanaco-33b | 4.7 |
---|
Llama-2-7b-chat | 4.25 |
---|
falcon-40b-instruct | 4.05 |
---|
mpt-7b-chat | 3.85 |
---|
Nous-Hermes-13b | 3.8 |
---|
Alpaca-13B | 3.5 |
---|
rwkv-4-raven-14b | 3.45 |
---|
oasst-sft-4-pythia-12b | 3.3 |
---|
stablelm-tuned-alpha-7b | 1.6 |
---|
Math Ranking
Model | Math Score |
---|
GPT-4 | 6.8 |
---|
Llama-2-13b-chat | 3.45 |
---|
Llama-2-70b-chat | 3.3 |
---|
Nous-Hermes-13b | 2.65 |
---|
Vicuna-13B | 2.6 |
---|
Guanaco-33b | 2.45 |
---|
Llama-2-7b-chat | 2.4 |
---|
mpt-7b-chat | 1.8 |
---|
rwkv-4-raven-14b | 1.8 |
---|
falcon-40b-instruct | 1.7 |
---|
oasst-sft-4-pythia-12b | 1.65 |
---|
stablelm-tuned-alpha-7b | 1.4 |
---|
Alpaca-13B | 1.05 |
---|
Coding Ranking
Model | Coding Score |
---|
GPT-4 | 8.55 |
---|
falcon-40b-instruct | 3.4 |
---|
Vicuna-13B | 3.25 |
---|
Guanaco-33b | 3.25 |
---|
Llama-2-70b-chat | 3.15 |
---|
Llama-2-13b-chat | 3 |
---|
Llama-2-7b-chat | 3 |
---|
mpt-7b-chat | 2.94736842 |
---|
rwkv-4-raven-14b | 2.75 |
---|
Nous-Hermes-13b | 2.45 |
---|
Alpaca-13B | 2.35 |
---|
oasst-sft-4-pythia-12b | 2.25 |
---|
stablelm-tuned-alpha-7b | 1.2 |
---|
Extraction Ranking
Model | Extraction Score |
---|
GPT-4 | 9.375 |
---|
Llama-2-70b-chat | 7.25 |
---|
Llama-2-13b-chat | 6.925 |
---|
Llama-2-7b-chat | 6.5 |
---|
Guanaco-33b | 5.95 |
---|
falcon-40b-instruct | 5.85 |
---|
Vicuna-13B | 5.55 |
---|
mpt-7b-chat | 5.15 |
---|
Nous-Hermes-13b | 5.05 |
---|
Alpaca-13B | 4.15 |
---|
oasst-sft-4-pythia-12b | 3.4 |
---|
rwkv-4-raven-14b | 1.65 |
---|
stablelm-tuned-alpha-7b | 1.15 |
---|
STEM Ranking
Model | STEM Score |
---|
GPT-4 | 9.7 |
---|
Guanaco-33b | 9.175 |
---|
Llama-2-70b-chat | 8.925 |
---|
Llama-2-7b-chat | 8.65 |
---|
Llama-2-13b-chat | 8.625 |
---|
Vicuna-13B | 7.975 |
---|
Nous-Hermes-13b | 7.44736842 |
---|
falcon-40b-instruct | 6.85 |
---|
mpt-7b-chat | 6.6 |
---|
oasst-sft-4-pythia-12b | 6.1 |
---|
rwkv-4-raven-14b | 5.425 |
---|
Alpaca-13B | 5.2 |
---|
stablelm-tuned-alpha-7b | 3.9 |
---|
Humanities Ranking
Model | Humanities Score |
---|
GPT-4 | 9.95 |
---|
Llama-2-13b-chat | 9.75 |
---|
Llama-2-70b-chat | 9.625 |
---|
Guanaco-33b | 9.5 |
---|
Vicuna-13B | 9.45 |
---|
Nous-Hermes-13b | 9 |
---|
Llama-2-7b-chat | 8.75 |
---|
mpt-7b-chat | 8.4 |
---|
falcon-40b-instruct | 7.95 |
---|
Alpaca-13B | 7.85 |
---|
oasst-sft-4-pythia-12b | 6.45 |
---|
rwkv-4-raven-14b | 5.075 |
---|
stablelm-tuned-alpha-7b | 4.6 |
---|
,