Ways To Think About AI: Six Years On
Six years is a long time in technology, especially AI. But a think piece by Benedict Evans in 2018 contains a lot of wisdom that still rings true. Even the tagline feels apposite in 2024, if you replace “AI” with “Chatbots”:
Everyone has heard of machine learning now, and every big company is working on projects around ‘AI’. We know this is a Next Big Thing. But we don’t yet have a settled sense of quite what machine learning means - what it will mean for tech companies or for companies in the broader economy, how to think structurally about what new things it could enable, and what important problems it might actually be able to solve.
There was one particular insight that stuck in my memory and is worth revisiting today. Ben suggests a surprising, but very helpful way to think about AI — relational databases:
Why relational databases? They were a new fundamental enabling layer that changed what computing could do. Before relational databases appeared in the late 1970s, if you wanted your database to show you, say, 'all customers who bought this product and live in this city', that would generally need a custom engineering project.
This analogy is more true than ever today. Language models are an enabling layer that change what, and how, we use computers. Like databases, most if not all software products to be using a language model within the next 5 years.
Is there anything we can learn about language models the way the database market has evolved and matured?
One very active debate at the moment is whether most companies will use a free and transparent model (a Mixtral or Llama), or a closed source model (a ChatGPT or Gemini).
Here parallels with databases seem striking. Databases, like LLMs, deal with a companies’ most valuable data. This privileged level of access requires additional security and compliance hurdles, potentially leading to an advantage for free and transparent models where weights and serving code can be externally analysed and verified. A quote from Mistral’s Chief Business Officer seems to support this hypothesis:
Open-source models were particularly attractive to state-owned or highly regulated entities, such as defence companies or banks, who wanted to experiment with generative AI but could not do it with proprietary software because of compliance reasons, Bressand said.
Given these parallels you might expect to see open source databases have a larger market share than closed source alternatives. It’s striking, however, that the total market cap of 3 of the largest open source database providers (MongoDB, Elastic) is around $50bn - but Snowflake, a company of a similar vintage, has a market cap of $70bn. This leaves out the major closed source database providers - Oracle, Google, AWS, Microsoft.
How much this analysis transfers to LLMs is debatable, but I think the conclusion is directionally correct — there is a place for the unique selling points of open source, but ultimately quality and cost are primary determinants of market share. It may be that the company who can offer the best service is ultimately also an open source company - but to suggest a causal link is, I think, wrong.
At the moment, it is only the closed sourced providers who have the financial firepower to make billions of dollars in losses while training and serving the best LLMs. This observation leads me to a slightly strange conclusion: no matter how crazy some of the venture rounds in LLM foundation startups seem to be, perhaps, it’s still too small.
If you enjoyed this post, please consider following me on twitter.
.