ChatGPP is a demo app built as a proof of concept for a RAG app running entirely in the browser.
RAG, an acronym for Retrieval Augmented Generation, is a 2-step process involving (1) the retrieval of the relevant context for a certain query, and (2) subsequently feeding this context to a Large Language Model to guide the answer.
AI System
RAG apps therefore include 3 main building blocks:
- a reference corpus. It can be the entire internet or a very narrow, highly curated knowledge base
- A search algorithm. It can be in Google Search, or a text search algorithm or a model to perform semantic search
- A LLM
Data privacy
Given serving a LLM locally isn’t entirely trivial yet, the last step of the RAG process is often where data has to be sent to a third party server. People would typically retrieve relevant - likely valuable - data from their internal database and send it to a LLM provider.
The objective of this project was to demonstrate the feasibility and the limits of local RAG, i.e integrating the 3 building blocks in a modular way, allowing data to come in to bring tools to your machine but not requiring to install an app (everything runs in the browser) and avoiding to share any proprietary data with third parties.
Building blocks
- Dataset: for this demo, we used data from the microu dataset, chunked into pieces of ~300 words or less for semantic search. There are about 5,000 chunks, - all in French which explains why the UI is in French. When the app is started, the dataset is downloaded in a json format. You can hook in your own data instead.
- Search : ChatGPP uses a hybrid approach mixing semantic search and text search. This is one of many options, not necessarily the most advanced but a good opportunity to revisit the basics of search. Given we use semantic search, we needed to have embeddings for all chunks in the database. Whilst it’s possible to do it in the browser, converting 5,000 chunks could take very long. For the demo app, we pre-embedded the dataset, which means that the information downloaded includes the embeddings. Note that the addition of embeddings increases the size of the dataset from 10Mb to 100mb. If you want to pre-embed your own data, make sure to use the same model for pre-embedding and for the processing of your query in the app (see src/sources.ts in the GitHub repo)
- LLM : only small LLMs can run in the browser, which comes with a severe performance hit. Some small LLMs are specifically trained for RAG so it was a good opportunity to put them to test. For multilingual RAG, Pleias Pico is an ideal candidate. Pleias are committed to open research, which is another reason to showcase their work even if it’s experimental (it’s the worst it will ever be). In any case, other models can be chosen in src/sources.ts
When you launch the processing, the dataset and the embedding model and the LLM are downloaded. Note : you can safely start the app, the downloads will only start after clicking a button on the home-screen
In the browser
Just like our sector classification web app from last year - the UI will look familiar-, ChatGPP leverages transformers.js to run AI models in web browsers.
Transformers.js v3 theoretically supports webGPU but we never managed to make it work so all videos shown here are without GPU acceleration. As a result, LLM processing can be slow, notably the prompt processing which can take a minute as the prompt includes all the relevant context. Once generation has started, each token is streamed on the screen.
The sources are available on the right side, which will give you a sense of the search performance. The output on the left gives a sense of the LLM capabilities.
AI models running on transformers.js are onnx models. If you want to change model, make sure you select an onnx models. The onnx-community is a good place to start looking for those. Otherwise, just browse the hugging face page of a model and look for onnx quantizations in the right panel.
Proof of concept
This project cannot be regarded as a finished product : without GPU acceleration, and given the performance of very small LLMs isn't particularly great, it did not make sense to invest a lot of time on UI/UX details.
The same goes for the GitHub repository which will remain work-in-progress until the next iteration at least.
For information on how to run ChatGPP, refer to the README