SOME RACES are over before they really get going. So it can seem in the contest to make the best large language models (LLMs). These algorithms power generative artificial intelligence that can produce humanlike text and other output. OpenAI, the American creator of ChatGPT, appears leagues ahead. It has made the world’s most powerful LLM, called GPT-4. The company is gobbling up talent, data and computing power to build cleverer models. As a result, it attracts more users, and with them more capital to pour into even more sophisticated models.
But a French startup called Mistral is trying to throw a spanner in this AI flywheel. On February 26th it released a new LLM. The model, called Mistral-Large, is smaller than GPT-4, measured by the number of parameters it uses (a common gauge of model power). Even so, it nearly rivals GPT-4 in important aspects of performance, such as reasoning. Mistral also unveiled a Mistral-Large-powered ChatGPT competitor, Le Chat (pronounced le shah, like the French word for cat rather than the English homograph). And it announced a deal with Microsoft, an AI juggernaut which already has a deep partnership with OpenAI. The tech giant will take a small stake in Mistral and make the French firm’s models available via its Azure cloud.
Mistral is proof that the industry is already becoming more open—and less American. If it does mount a serious challenge to OpenAI, this would also confirm the suspicion of some in the industry that in generative AI, size is not everything. “It’s no longer about being bigger—it’s about being creative and being fast,” says Arthur Mensch, Mistral’s chief executive.
The French firm’s rise has been as brisk as the northwesterly winter wind after which it is named. It was founded less than a year ago and still has just 25 employees. Despite this, its LLMs are leading the growing pack of open-source models, the statistical innards of which are, in contrast to proprietary black boxes like GPT-4, publicly available and can be modified by anyone. That has allowed Mistral to tap an impressive €490m ($531m) in funding, valuing the company at more than $2bn. Big investors include leading Silicon Valley venture capitalists such as Andreessen Horowitz and General Catalyst, as well as tech luminaries such as Eric Schmidt, a former chief executive of Google.
Mistral owes its early success to cleverly mixing the main technical ingredients of AI—talent, data and computing power—with politics, which is growing in importance for the AI industry as the world’s governments ponder the technology’s potential.
Start with talent. Here, Mistral is a “match made in heaven” between French engineering education and American big-tech firms, says Stanislas Polu, a co-founder of Dust, another of a clutch of AI firms popping up in Paris. Three of Mistral’s six founders, and its technical brains—Mr Mensch, Timothée Lacroix and Guillaume Lample—are products of France’s elite technical schools. Like many other top AI scientists they have worked at the research labs of Google and Meta, another American tech giant—though in the trio’s case they were building LLMs at those labs’ offshoots in Paris rather than in London or Silicon Valley. This places them among the 100 or so people worldwide who really know how to train cutting-edge models.
They appear to have been particularly adept at marshalling data to train their models—the second ingredient of AI success. Mr Mensch will not be drawn on how exactly Mistral curates its training sets; it is the source of his firm’s competitive advantage, he says. But industry insiders confirm that Mistral is, in the words of one, “really clever” at curation, for instance filtering out information that is repetitive or does not make sense. This has allowed Mistral’s models to be much smaller: the statistical weights, or “parameters”, of Mistral’s models count in the billions, compared with an estimated 1.8trn for OpenAI’s GPT-4 (both firms are mum on the exact sizes). This allows customers to run them on their own computers rather than in a vast data centre, which many proprietary models require.
According to Mr Mensch, Mistral’s focus on data curation lets the firm use computing power, AI’s third crucial component, more efficiently than its competitors. Training Mistral’s latest model cost much less than the $100m that OpenAI apparently spent to develop GPT-4. Mistral’s approach also makes it cheaper for customers both to fine-tune its models with their own data and then to run them.
In technical terms, startups like Mistral enjoy a “second-mover advantage”, benefiting from all the work OpenAI and others have done, argues Jeannette zu Fürstenberg of General Catalyst. Critically, in Mistral’s case these technical chops are complemented by political nous, which is helpful given that many governments think that home-grown LLMs will confer economic and strategic advantages.
So it helps that another of Mistral’s co-founders is Cédric O, a former French digital minister. Mr O retains a direct line to the country’s president, Emmanuel Macron, who has taken a keen interest in all things AI. When a draft of the European Union’s AI Act last year threatened to force Mistral to divulge its data recipe, Mr O co-ordinated, with Mr Macron’s backing, a successful Franco-German effort to oppose such provisions. These were duly excised from the bill.
The question now is whether Mistral, which has yet to generate meaningful revenues, can transform this enticing techno-political mix into profits. The firm’s bet is that many businesses, especially European ones, want more control over the LLMs they use than OpenAI is willing to give them, and do not want to be locked into another American tech platform. Such customers, the thinking goes, would be willing to pay Mistral to maintain and run their models.
One question potential customers may ask themselves is how the world will regulate open-source models. A heated debate about whether they will enable terrorists and other bad actors to build bio- and cyber-weapons has died down. Instead of talking up the risks, the discussion among policymakers is turning to the potential rewards: greater transparency, more innovation and less reliance on a handful of powerful companies that have controlled the technology. Regulators on both sides of the Atlantic have so far tolerated open-source LLMs. But Mr O may again have his hands full if these models keep getting more powerful or are found to be misused, for instance helping to spread disinformation during this year’s welter of elections around the world.
Avoiding a political backlash is, obviously, in Mistral’s interest—but lobbying success has a flipside. Regulatory forbearance would almost certainly lead to more open-source competition. On February 20th Silo AI, a Finnish firm, unveiled a new LLM that is even more open than Mistral’s, furnishing information about the data on which it is trained and the software that did the job. A new version, due out in a few months, will be as good in most European languages as it is now in Finnish and English.
Most important, it is still unclear if size matters for generative AI. A test will come when OpenAI at last releases its next model, GPT-5. If it leaves Mistral-Large and other smaller open-source models in the dust, then Mr Mensch’s talk of creativity and speed may ring hollow. Until then, however, Mistral’s story will continue to resonate. ■
Source: Business - economist.com