The entire tech world was expected to shift to VR after Apple's announcement of a mixed reality headset for 2024. It's happening so let’s get this out of the way: it is beautiful, it is unaffordable, yet everyone will want one… and the word “metaverse” may even start trending again.
Before the tide completely turns, we wanted to clean-up the bookmarks of the past few weeks. Here are our key takeaways on Creativity, Gaming, Business, Open-Source, Security and Regulations.
Creativity
Over the past six months, tools like Midjourney changed the way one would think about editing pictures : you would now just generate a new, hyper-realistic picture from scratch using text to describe what you want to achieve. This represented a major threat for incumbent players like Adobe (Photoshop). Adobe had already responded with Firefly but it was somewhat underwhelming compared to tools from AI pure-players. Adobe’s most convincing response came at the end of May with Generative Fill, a tool that finally got traction from users. If you try to use the tool to generate characters, it appears to have the same flaws as Firefly : missing limbs or fingers, would not generate anything close to an image subject to licensing. But the sentiment around the tool to edit backgrounds and landscapes seems positive. Dust has not settled yet but, based on what the community has showcased so far, Midjourney has moved from « threat to Photoshop » to « complement to Photoshop ». Let’s see how things play out as Adobe roll out AI features.
@gregoire_r Famous Portraits & Generative Fill with Adobe Photoshop x Firefly AI 🎨 Dites-moi ce que vous en pensez ! #adobefirefly #photoshopai #ai #paintings #generativefill #photoshop #editing #joconde #photoshopediting #firefly #adobe
♬ Aquarium - Kevin MacLeod
In the meantime, AI-generated art will continue to permeate popular culture, sometimes overwriting it: Futurism reported recently that searches for “Johannes Vermeer” or “Edward Hopper” would return an AI image first instead of the original masterpieces.
source: futurism.com
It is impossible to test all new tools/tech for creatives but a number of promising announcements have occupied the newsfeed in the past few weeks. Designers got excited by Google Research’s StyleDrop (logo generator) and ControlNet for QR Code (working QR codes!); we were amazed by the Drag Your Gan demo (clearly on the Photoshop segment). Blockade Labs' Sketch-to-Skybox was another cool one, giving users ability to create 3D illusions.
The future of Gaming
The emergence of so many tools empowering creators to build 3D content (or illusion of 3D) could well be another short-term hype. But if the expected leap in VR hardware materializes, it could also be a catalyst for a Cambrian explosion in immersive digital experiences. Gaming is an obvious field where AI can push the boundaries, not only for the environment, but also for characters. Have a look at this demo from Nvidia, where a player interacts freely with a non-playing character (NPC) that shares information to help progress in the game.
In May, the Nvidia teams also shared a proof-of-concept for playing characters powered by AI (and released the code). In that specific case, an agent powered by an LLM can interact with its environment - Minecraft provides an API - and even acquires skills. Here, acquiring skills means autonomously generating code to achieve a goal, tweaking the code based on the API feedback until it works as intended, and saving it in a skill library for later. Whether or not you are into gaming, it is a fascinating space to watch. Harris Rothaermel builds in public if you are interested.
Economic agents
We are looking forward to seeing several agents interact with each other and learn from each other, in particular if these agents are powered by different models. The field of research is emerging as mentioned in our previous post on von Neumann, Machiavelli and Axelrod. However Axelrod’s work has rarely been referenced – to our knowledge – until this paper from the University of Tubingen and the Max Planck Institute for Biological Cybernetics last week. Axelrod’s work in the early 1980’s was kind of revolutionary as the iterated prisoner’s dilemma (favouring semi-cooperative strategies) contradicts the prisoner’s dilemma (favouring non-cooperative strategies) and the conclusion could only be drawn for progress made in computing. With AI today, it looks like there is an opportunity to revisit many aspects that have been taken for granted for decades in social sciences based on pure mathematical abstraction. It would not be surprising if major breakthroughs came out of the process.
Brain-as-an-Interface
You may think that these AI-agent stories are just science-fiction conveniently pushed by Nvidia, the global leader in chips for both gaming and AI applications, to hype their products. But AI agents do not sit very high on the sci-fi scale. For a real sci-fi vibe, have a look at recent brain-to-image or brain-to-video papers. They mostly rely on fMRI technology (evaluating bloodflow in the brain) except for CEBRA implying a direct connection with the neurons.
Using the brain as an interface seems more tangible than ever, and the upcoming Apple Vision device offers a glimpse at what the future may look like.
One of the coolest results involved predicting a user was going to click on something before they actually did. That was a ton of work and something I’m proud of. Your pupil reacts before you click in part because you expect something will happen after you click. So you can create biofeedback with a user's brain by monitoring their eye behavior, and redesigning the UI in real time to create more of this anticipatory pupil response. It’s a crude brain computer interface via the eyes, but very cool. And I’d take that over invasive brain surgery any day.
Other tricks to infer cognitive state involved quickly flashing visuals or sounds to a user in ways they may not perceive, and then measuring their reaction to it.
Sterling Crispin who claims to have been part of the Apple VR team until 2021
For those who primarily worry about privacy, these new technologies will be a major source of concern. For those who primarily focus on productivity, the new technologies represent incredible opportunities. There is no doubt that regulators in the US and in Europe will take very different stances regarding brain-as-an-interface. The FDA in the US shot first by approving human trials of Neuralink’s brain implant.
AI disruption case study : Stack Overflow
In Finance circles, everyone talks about AI-disruption. Many, including us, thought that it could affect many industries very quickly, potentially in the first half of 2023. In reality, adoption is relatively slow. It is not due to corporate inertia or legitimate caution about security, but rather because many potential users remain unconvinced. As per the Pew Research Center, 42% of US-adults have never heard of ChatGPT and, among the 14% who have used it, only about 1/3 find it very or extremely useful. That’s 5% of US-adults.
Too many people still view LLMs as knowledge bases - which there are absolutely not - instead of reasoning engines. This hinders adoption and leads to controversies as model hallucinations are not identified by users. This is basically what happens when lawyers use ChatGPT for court citations, making-up previous cases.
Using LLMs as reasoning engine, however, is the primary use-case in programming, and checking the output is reasonably easy for developers as they debug the code that AI-assistants spit out. It’s not bullet-proof - in particular for security reasons - but for that use-case the feedback is overwhelmingly positive. And a major disruption is ongoing.
Stack Overflow is the go-to platform for any code-related issue. Stack Overflow will likely be the top-ranking page for any code problem you google, with a solution provided by the developer community. At any level of skills, most developers used to have a Stack Overflow tab open somewhere in their browsers.
But code assistants, and most recently ChatGPT, change everything: Similarweb reported a 14% year-on-year decline in traffic for Stack Overflow, whilst Github and ChatGPT strive. To put the disruption into perspective : exactly 2 years ago, Prosus announced the acquisition of the company for $1.8bn and people are now questioning if it even has a reason to exist.
Who's next?
For information that is public but generally difficult to collect or presented in a complex way (e.g software documentation), users often rely on a company, or contributors for web2.0 platforms, to summarize and repackage the information in a user-friendly way. In that case, the intermediary is highly exposed to AI disruption. Industry-specific data providers, including providers of private financial information, often have this profile and seem under threat in the short term unless they integrate AI themselves.
Historically, these data providers have been very good candidates for LBOs as they had visible revenues (SaaS) and a very sticky customer-base with limited price sensitivity. It is not surprising that so many of these firms have been acquired by Private Equity funds in the past few years, financed with significant debt stacks when base rates were close to zero. Base rates and economic slowdown alone would affect equity value materially. But a Stack Overflow-type disruption could wipe it out entirely, and even eat into the debt. If you are interested in the intersection between business/finance and AI, keep an eye on this space.
Self deep-faking
Intuitively, training a model to build a digital version of yourself and making it available to the public seems a very bad idea. The issue is that you can’t prevent users from making the AI-you do or say anything as it is easy to force generative AI models to return information you specifically designed them not to. Grimes has been one of the first artists to allow anyone to use her voice to create AI music, but she subsequently backtracked slightly after realizing horrendous words that could be put in her mouth. She did nonetheless push through with the idea and launched a platform to help users mimic her voice.
Not everyone is afraid of deepfakes that could go horribly wrong, and some see it as a business opportunity. Caryn Marjorie, a Snapchat personality, recently launched a pay-as-you-go AI-girlfriend service for $1/minute. She claimed to have trained the model on hours of videos of herself. We will not try the product (sorry) so we will never be able to tell if all this was just a PR stunt (which is likely). But she claims to have made over $70k in a week with, reportedly, 1000 beta testers.
This means that each beta tester spent on average 70 minutes interacting with the bot. Noone knows if these 70 minutes were spread over 7 sessions of 10 minutes or if it was typically just one hour-long session. It would make a big difference because, if it turns out that users can engage extensively with an AI girlfriend, it would be a good enough proof of concept for another industry that has been so instrumental to tech innovation historically: Porn.
To know which Tech goes mainstream next: watch Porn, the industry not the content
The Porn industry has always been the first to address critical friction points with new technologies, ultimately unlocking mass adoption : cable television, VCR, internet content, online payments, streaming, live chats, digital rights management to name few innovations where the porn industry has paved the way. If you want to know what tech will soon go mainstream, check out regularly the recent trends in the porn industry.
Today, the industry represents between 10% and 20% of global web searches (vs 40% in the 1990’s), and users spend 10 minutes on average browsing pornsites according to sexualalpha (some surprising stats there). Hour-long engagements would be a revolution. And it is fair to assume that Onlyfans creators could face fierce competition from AI-powered bots. Would this kind of disruption be a good thing or bad thing? That would be an interesting question for the Effective Altruism community.
Towards DIY Ai and local inference
Wheverer you are and whatever you want to build – including for commercial purposes – there are now plenty of opensource tools to experiment with. In earlier posts, we talked about two promising approaches: quantization, allowing to “compress” models and run them on consumer hardware, and Low Rank Adaptation (LoRA), to fine-tune models quickly and at low cost. Early May, Tim Dettmers presented Qlora a finetuning approach combining quantization and LoRA. An associated model - Guanaco - is also available.
But a real game changer could well be the open-sourcing of a State-Of-The-Art large foundation model developed by the UAE. Falcon 40B was initially released with a somewhat restrictive licence (royalty fee above $1m revenues and mandatory revenue declaration) before switching to a permissive Apache 2.0 licence at the end of May. The fact that such a robust model is now open-source is a very big deal. The fact that it comes for Abu Dhabi may be too… As Americans will start looking for a bias to align with UAE’s cultural specificities, it will hopefully increase awareness about the issue of cultural bias in training data.
To date, this has largely been ignored as the whole field is very much centered on the US, using data produced by US sources. As far as training data is concerned, a bias is a political concept (we can hear statisticians choking, apologies to them) in the sense that it indicates a deviation from an "ideal" state, which may differ significantly from a statistician's view. To illustrate this, type "woman" in Midjourney and then ask a statistician to draw a "woman" assuming they can do hyper-realistic drawings. The outputs will be very different because the model was trained on an "ideal" representation of a woman, not on what people would see if they analysed a random sample. And the same thing happens with LLMs without anyone noticing (in the US at least).
Good news is that, unlike most foundation models, the UAE models are open-source so they can in theory be finetuned so cultural bias, if any, can be mitigated.
At the edge : Apple and that's it
An important caveat to the claim of "running Ai on consumer hardware” is that it’s not on any consumer hardware. It is mainly Apple hardware due to Apple’s unique chip design (dubbed "Apple Silicon"). Compared to other businesses, Apple do not receive a lot of coverage for their chip capabilities, and they are certainly not praised enough. They are ideally positioned to benefit from an AI-revolution if it materializes, and they know exactly what they are doing as evidenced by the chip announcement during Apple’s latest keynote.
Immediate AI-risk : Humans
As AI tools grow more complex and get more accessible, it is important to appreciate – or remind people – that the immediate risks have nothing to do with AI-takeover. In fact, tangible risks today always involve a human in the loop.
Given how many fake pictures go viral these days, “consuming news” may soon become the kind of skill you add on your Linkedin profile. Interacting with people online too. Impersonation, deepfakes, pig butchering or catfishing are threats that now have the attention of mainstream media.
Indirect prompt injection on the other hand remains under everyone’s radar. Indirect prompt injection consists in instructing a model to leak user information. Given LLMs can’t keep anything secret, it has been hypothesized that an AI-agent reading a document could pick-up a hidden instruction to send private information to a third-party. The first proof of concept was published in May as Johann Rehberger successfully accessed information allowing to steal email information from ChatGPT plugins users. Rez0 provides a comprehensive explanation of how it works here and points to Daniel Miessler’s breakdown of AI Surface Attack. In summary, users have no moat.
What Miessler does not cover in his AI Attack Surface Map is that hardware can be a source of data leakage too. It has been known for a while that the sound of mechanical keyboards could be analyzed by AI to extract data. Opensource tools allow to achieve this relatively easily if you can get a user to type something specific on the keyboard you are listening to. Even without any help from the targeted user, statistical patterns should get you there eventually.
Now imagine a similar form of data leakage but not through key sounds, instead through adjustments to power, frequency or temperature of CPU/GPU. This is essentially what a Georgia Tech team have demonstrated in this paper, which is worth a read for two reasons : (1) it explains many unexpected sources of data leakage and (2) “AI” is not even mentioned once, which is kind of refreshing these days.
AI regulation: EU and the US may converge despite tackling the problem from very different angles
Early May, EU lawmakers voted in favour of a draft restrictive legislation to regulate AI (so-called “AI Act”), a process initiated two years ago and that is expected to take several years to complete. The process was largely misunderstood on the other side of the Atlantic – many observers thought it was now enforceable – and many interpreted it has a reaction to extinction risk (x-risk, or Ai Doom) – which was never theme in EU’s preliminary discussions for the AI Act. So a lot of criticism followed the European announcement, and ironically, US lawmakers started hearings immediately after, this time with much more focus on x-risk. The AI-takeover discourse continues to gain traction in the US as evidenced by a widely commented blog post from Yoshua Bengio – another Turing award – and later a letter released by the Center for AI Safety, signed by many scientists and CEOs of the field.
Ultimately, US and EU regulation can be expected to converge ; AI regulation is on the agenda of Tech & Trade Council. But there are major headwinds, in particular that the US want private companies to opt-in individually, whilst the EU wants governments to opt-in on their behalf.
During the hearing at the American Senate, Sam Altman, OpenAi CEO, called for strong regulation of the industry, albeit above a certain threshold. If it sounds surprising to you, make no mistake : regulations are beneficial to incumbents as they constitute entry barriers. When incumbents themselves push for regulation, the strategy is called regulatory capture. Noone can tell if this is what Altman has in mind or if he is just genuinely concerned about AI risks. In any case, few weeks later, he met with European Commission President Ursula von der Leyer, another sign of the company’s shift from “open”-source to “open” to regulation. Sam Altman did, at one point, threaten to leave the EU over AI-law but quickly made a U-turn.
Other providers of State-Of-The-Art foundation models seem on a similar path, as demonstrated by the meeting between Dario Amodei, Anthropic’s CEO and Thierry Breton, European Commissioner for Internal Market. Anthropic had released a couple of weeks before a new version of its model, Claude, that amazed the entire community given its 100k-token context length. However Anthropic did not release much information about the model (it is not clear as of today how many people have actually tested it) and the question is still very much open as to how Claude achieves such large context length.
Despite their ties with Anthropic, Google are one notable absentee of the discussions with regulators and they clearly have taken an opposite stance towards regulations. They were invited to the White House early May, but not to the Senate hearing (IBM was) and no engagement with the EU was shared publicly. As explained two months ago, data privacy laws (GDPR) are likely more of a concern than the AI Act, and this explains why Google Bard is not available to EU citizens (despite being available in 180 countries).
Very Big Tech
Nvidia, mentioned above for their capabilities in gaming and AI, reached a valuation of $1trillion at the end of May after a surge in stock price. The stock price subsequently dropped slightly so valuation is now just below this threshold but it is still worth a comment as very few companies are worth that much. According to Forbes, the $1trillion-club only included 6 players at the end of May : Apple, Saudi Aramco, Microsoft, Alphabet, Amazon and Nvidia then. In the past, three other companies had crossed the $1trillion-mark : Tesla, Meta and Chinese oil giant PetroChina.
The club is very heavy on Tech, which makes sense given products are highly scalable so growth potential is virtually uncapped. That said, Nvidia is different and it is noteworthy: Nvidia design chips and produce software to operate them. That IP is scalable, but manufacturing the chips is not. Even though Nvidia volumes exceeded (by far) what everyone thought possible, their value chain is a key bottleneck for volume growth. Optimizations may be possible through software or design, and demand is so high that prices can probably increase two-or-threefold, but growth prospects are otherwise constrained by production capacity upstream. And that’s why Nvidia’s valuation is so extraordinary.
Catching-up with OpenAi
OpenAi are the undisputed leader in the AI space, not only in the eye of the public but also for their peers and the research community who benchmark everything against GPT4 and GPT3.5. The company's executives communicated how the lack of Nvidia’s products slowed them down and impacted user experience. Freeing capacity would be an argument for inference at the edge (i.e running the models on users devices) but it does not seem compatible with closed models like OpenAi's. And as far as training is concerned, there is no alternative to massive Nvidia GPU clusters. Anyone looking to truly challenge OpenAi face similar constraints impairing their ability to train large foundation models.
And GPUs may not be the only resource they desperately need more of : the current approach relying on training on more content (tokens) to achieve better performance implies that, once you have exhausted the entire internet, which includes transcribing Youtube videos, it is impossible to improve. Although there are other avenues to progress further (fine-tuning on “better” datasets or just using content generated by LLMs - which is still heavily debated1), the “token-crisis” scenario is an area of research. For the time-being, companies that hope to build GPT4-like foundation models have to rely on the internet archive. Literally the Internet Archive.
In May, Jason Scott from the non-profit digital library opened up about how difficult their life was in the era of LLM-training as they can’t cope with the volume of requests they receive. The entire Twitter thread is a must-read to understand how things work behind the scene. There is very little doubt that, if similar DDOS attacks originated from states instead of companies, there would be a global outrage.