My benchmark for large language models
Date : 2024-02-19


This summary was drafted with mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf

This collection of tests is derived from real-life conversations he had with different LLMs. The benchmark includes tasks such as converting Python functions to equivalent but faster C functions, explaining the functionality of minified JavaScript, identifying data encoding formats, writing parsers from BNF-like grammars, converting English sentences to SQL queries, and writing bash one-liners. Carlini emphasizes the use of a simple dataflow domain-specific language (DSL) that facilitates adding new tests and realistically evaluating model capabilities.

Read article here
We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work
Got it
Learn more