Let's build the GPT Tokenizer

Description

In this lecture, Andrej Karpathy builds from scratch the Tokenizer used in the GPT series from OpenAI. In the process, he shows that a lot of weird behaviors and problems of LLMs actually trace back to tokenization and discusses why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.



Watch and like on Youtube
Link
We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work
Got it
Learn more