PITTI

Explore
Articles
Projects
Blogs
en

MENU
X
Explore
Articles
Projects
Blogs
English

Copyright © All rights reserved

a
@PITTI_DATA
@PITTI_FI
@SorarePITTI

We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work

Got it

Learn more

Let's build the GPT Tokenizer

Artificial Intelligence,Information Processing | Computing

Date : 2024-02-20

Description

In this lecture, Andrej Karpathy builds from scratch the Tokenizer used in the GPT series from OpenAI. In the process, he shows that a lot of weird behaviors and problems of LLMs actually trace back to tokenization and discusses why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely.

Watch and like on Youtube

Artificial Intelligence : what everyone can agree on

How hard does Art need to be ?

Recently on :

Artificial Intelligence

Information Processing | Computing

WEB - 2024-12-30

Fine-tune ModernBERT for text classification using synthetic data

David Berenstein explains how to finetune a ModernBERT model for text classification on a synthetic dataset generated from argi...

WEB - 2024-12-25

Fine-tune classifier with ModernBERT in 2025

In this blog post Philipp Schmid explains how to fine-tune ModernBERT, a refreshed version of BERT models, with 8192 token cont...

WEB - 2024-12-18

MordernBERT, finally a replacement for BERT

6 years after the release of BERT, answer.ai introduce ModernBERT, bringing modern model optimizations to encoder-only models a...

PITTI - 2024-09-19

A bubble in AI?

Bubble or true technological revolution? While the path forward isn't without obstacles, the value being created by AI extends ...

PITTI - 2024-09-08

Artificial Intelligence : what everyone can agree on

Artificial Intelligence is a divisive subject that sparks numerous debates about both its potential and its limitations. Howeve...

more articles on
-
Artificial Intelligence