For most of last year, I ran a blog covering AI developments, analyzing the consequences of real breakthroughs, both from business and societal perspectives while trying to avoid the typical traps of a hype cycle. I never planned to stop it but I didn't find the motivation to write about anything in 2024. Many aspects of my personal and professional lives may have contributed to this lack of motivation but it was mostly because nothing in AI excited me enough to want to read and write about it (I started another blog instead). It came back this weekend and I felt like writing about things that have impressed me recently, about things I've been concerned about, and to correct or update observations I made last year. Not sure why it happened, but here we are; I'm reviving Not_Too_Fast.
Entropix
Soon after the release of LLaMa, the whole model was leaked which kick-started a race to the best hack. It may be too early to declare a winner, but Georgi Gerganov stands out as a serious contender for the title
Not_Too_Fast March 10th 2023
This blog wouldn't have existed without the quantization of the first Llama model and the release of Llama.cpp by Georgi Gerganov. To me, it was more significant than ChatGPT itself and it's the first thing I covered in my initial post on March 10th, 2023. I've been amazed by many releases or research papers since then, but I've never been quite as excited. Until last week, when a novel idea (at least in the open-source world) started to emerge: using measures of entropy to make context aware sampling.
The approach is still experimental, but it's fascinating. For the less technical audience: it involves assessing attention entropy before selecting the next token and using this information to trigger an action based on the value you get. In most cases, you'd just go with the suggested token, but if entropy and/or varentropy values aren't satisfactory, you can choose another route. I understand it was initially tested to insert a "wait" token and stimulate "reasoning," but it opened up countless opportunities, such as branching and sampling to find the best option... or even function calling or RAG if the uncertainty is too high. More information in this discussion on X. It might not be as significant as Georgi Gerganov's ggml library, but it feels very similar in terms of the traction the project gets from developers. If it works out, it will help drastically reduce the size of the models.
Github repository by anonymous developer xjdr: https://github.com/xjdr-alt/entropix
Small Language Models
Entropix was initially designed using the model weights of smaller model of the Llama 3.2 family. It's a reasonably capable model with "only" 1bn parameters, which means it can run on recent consumer hardware without any, or very limited, quantization. Regardless of Entropix, this is a great development, and there are now multiple options to run these models in a hardware-agnostic way, including on phones. One is ggml mentioned above, but models can also run directly in the browser using ONNX weights and the transformer.js library. Or, for Apple hardware, you can run models in MLX, Apple's tensor framework.
Apple may be behind on AI, but the emergence of MLX, a framework adapted to their chips (Apple Silicon), is worth noting. It's still not as versatile as PyTorch or JAX, but the pace at which features are being added is astonishing. There's a strong community behind the project who also port new models to MLX almost as quickly as ggml quantization is done.
Our projects involving these frameworks this year:
- transformers.js: sector classification assistant
- ggml : semi-agentic knowledge base and nard.ai
- MLX : sudokube
Ray-Ban by Meta
This year, Meta has published amazing open-weight models. Not just small ones, but also massive models that compete with the state-of-the-art (Llama 3.1 405B). And the latest models cover all modalities. What really sets Meta apart is that they're simultaneously accelerating on the hardware side with a flagship product: the Ray-Ban by Meta glasses. Mark Zuckerberg himself has led the marketing effort in the past few months, and it would be surprising if it didn't turn into a commercial success.
But like any innovation, this one has a flipside: combining an always-on camera and vision models is the perfect recipe for privacy breaches. There have been warning signs for a while, and the tech itself isn't new: this Wired article from 2021 described how a man was found, 15 years after people started looking for him, based on face recognition software and pictures available on the internet. The key difference now is that glasses remove the friction of having to hold a camera in front of your victims; you can just walk past them, as Harvard students recently showed.
Google and Microsoft
Meta has set a number of standards for open-source LLMs, but they're not big in model inference themselves, at least not outside their organization. OpenAI is the clear leader, and Anthropic the challenger. If we're honest, OpenAI's lead stems from being first to market and excellent PR/lobbying rather than the products. Their models are good, no question, but not far ahead of the competition (despite what they claim based on highly questionable benchmarks) and their UX isn't as good as Anthropic's Claude. What looked like a masterclass 18 months ago is slowly turning into a race between a tortoise and a hare.
This doesn't only concern OpenAI but multiple companies within the Microsoft perimeter :
- Mistral is sadly losing momentum : their large model is really good but they struggle to match competition across weight categories and modalities (again, despite what they say)
- WizardLM seems to have been crushed by the bureaucratic machine
It's very sad for these two companies which played a major role in last year's acceleration. From the internal AI teams: the Phi models struggle to convince, the Copilot implementation is underwhelming, and hiring Suleyman from Inflection was a really odd move. Florence-2, the vision model, was well received though. So it's way too soon to start shorting the MSFT stock, but it's fair to say that our praises from last year didn't age particularly well.
So who is OpenAI losing ground to? There are still a couple of underrated Chinese players, namely Deepseek (real challenger to GPT-4 for coding) and Qwen (real challenger for vision). No one can match OpenAI's latest model's performance (model called o1), but there aren't so many use cases for such a complex model to handle. I probably use 2 queries a week on o1, but I use Claude several times a day, including to help me improve this post.
Google [...] did not make it to the highlights, despite opening up their PaLM model and rolling-out AI throughout their Google Workplace suite. Whether this is down to communication strategy or it is indicative of a fundamental issue with their AI-capabilities, Google was overshadowed by, amongst others, Midjourney v5, Anthropic Claude (Google-backed), Stanford Alpaca and of course OpenAI & Microsoft with GPT-4 and Microsoft 365 copilot
Not_Too_Fast March 17th 2023
One player that everyone almost discarded last year - including me - and really turned its ship around is Google. They've been playing catch-up for years and made many mistakes along the way. They've had very good models since the end of last year, but somehow managed to fumble every release; the woke image generation controversy was a culmination of fumbling, and a low-point in credibility. But the Gemini suite (API models) is a family of really capable models. All multimodal, with long context windows and cheap. In parallel, Google shipped really good open-source models, the Gemma suite, including a surprisingly good vision model (PaLI-Gemma). They recently created buzz with Deep Dive, a feature of their NotebookLM platform which summarizes documents into a podcast format where 2 virtual hosts discuss the documents. Deep Dive is probably just a feature that will attract users to NotebookLM, and users will likely stick around as the platform is underpinned by strong models and has a good UX. Read the candid feedback of the Product Lead (below) after users “jailbroke" NotebookLM, it's genuinely promising.
Google now has good products and is extremely aggressive on prices. The AI market has become a battleground of attrition, with Chinese players and VC-backed LLM-as-a-service providers initiating aggressive pricing strategies. However, competing against the tech giants (including Alibaba) seems untenable, even in the medium term. It's well known that OpenAI is burning billions of dollars annually with most queries likely served at a loss. Google, too, is undoubtedly bleeding cash as a result of the battle. Crucially, this isn't a fixed-cost breakeven problem; increased AI adoption is poised to exacerbate financial pressures, all the more if test-time compute becomes the norm. The tech giants' robust balance sheets and deep cash reserves indicate they're well-equipped for a protracted campaign. They will continue to heavily subsidize tokens simply because they can, not because they know that they will win. Against this backdrop, VC-backed businesses stand no chance. A prudent strategy for investment firms may be to pivot towards tangential investments in AI infrastructure, such as energy provision and data center technologies, to capture some of the value that Big Tech pours in these ecosystems. While startups leveraging state-of-the-art models will undoubtedly create value downstream, the paradigm shift brought about by AI makes predicting the exact nature and extent of this downstream effect very challenging.
PS : re NotebookLM
There's an interesting fact about the Deep Dive feature of NotebookLM: at least one other team at Google is working on a competing product, e.g., Illuminate. From an overall product perspective, Illuminate isn't as good as NotebookLM, but a key difference is also that a member of the NotebookLM team has been actively engaging with users on social media, taking feedback and sharing insights on their roadmap, on what's missing to release the next iteration, and on what they're trying to achieve. The product side of things is an important aspect, clearly overlooked by many AI labs (or teams within the labs), but it could be instrumental in determining the winners and losers of the AI race.