What does the OpenLLM Leaderboard measure?
Date : 2023-11-07
Introduction
In this report, the author used Zeno to dive into the data and explore what the benchmark actually measures. What tasks does it test? What does the data look like? They find that it is indeed hard to gauge the real-world usability of LLMs from the results of the leaderboard, as the tasks it includes are disconnected from how LLMs are used in practice. Furthermore, they find clear ways the leaderboard can be gamed, such as by exploiting the common structure of ground truth labels. In sum, they hope that this report demonstrates the importance of testing your model in a disaggregated way on on data that is representative of the downstream use-cases you care about.
Read article here
Recently on :
Artificial Intelligence
PITTI - 2024-09-19
A bubble in AI?
Bubble or true technological revolution? While the path forward isn't without obstacles, the value being created by AI extends ...
PITTI - 2024-09-08
Artificial Intelligence : what everyone can agree on
Artificial Intelligence is a divisive subject that sparks numerous debates about both its potential and its limitations. Howeve...
WEB - 2024-03-04
Nvidia bans using translation layers for CUDA software | Tom's Hardware
Tom's Hardware - Nvidia has banned running CUDA-based software on other hardware platforms using translation layers in its lice...
WEB - 2024-02-21
Retell AI : conversational speech engine
Retell tackle the challenge of real time conversations with voice AI.
WEB - 2024-02-21
Groq Inference Tokenomics: Speed, But At What Cost? | Semianalysis
Semianalysis - Groq, an AI hardware startup, has been making waves with their impressive demos showcasing Mistral Mixtral 8x7b ...