Among all the Big Tech companies who are planting flags in the red hot generative AI arms race, I think the most trustworthy one may actually be Meta. Yep, Zuck’s Meta.
I say this because its recently released large language model (LLM), LLaMA (Large Language Model Meta AI), is open sourced and has the clearest set of disclosures around the model’s biases along gender, religion, race, and six other dimensions, compared to other similar LLMs.
Long time readers of Interconnected know that I have deep convictions in open source as both a superior technology development model, and as the best methodology for building trust between users, products, and the regulators who have oversight over them. That’s a long-winded way of saying: sunlight is the best disinfectant. (See my extensive archive of previous writings on this topic.)
Surprisingly or unsurprisingly, Meta, the one company that may have the most “trust deficit” with the American public, is turning to open source to bolster its AI’s trustworthiness. Meanwhile, Meta’s peers are holding their AI models close to the vest as their “secret sauce.” This divergence in approach will play out in a much bigger way as all forms of generative AI applications become more pervasive.
Model Card
What made LLaMA stand out is its “model card.”
A model card is an emerging standard in the machine learning research space, where every newly-trained model publicly shares a set of performance benchmarks, intended use cases, and bias metrics along multiple cultural and demographic dimensions. A model card is usually a high-level summary of this information, similar to an open source project’s readme page or introductory sections of its documentation.
This approach was first articulated in an academic paper in 2018, “Model Cards for Model Reporting”, by Margaret Mitchell, an AI ethics researcher who used to work at Google and now works at Hugging Face (a developer platform specifically for machine learning). Since then, writing a model card as part of the release process of a new machine learning model has become more common. However, the quality of the model cards are all over the place.
LLaMA’s model card is one of the clearest and most transparent ones I’ve seen yet. Among many things, it lays out four model sizes (7B, 13B, 33B, and 65B parameters), an “Intended Use” section, and detailed information on the type of training data that went into building this model. Most importantly, it discloses a set of bias scores along gender, religion, race/color, sexual orientation, age, nationality, physical appearance, and socioeconomic status, where lower the score, less “biased” is the model. Here is how LLaMA scored:
My point of highlighting the biases section of LLaMA’s model card is not to judge how this model is biased towards whom; you can’t take a simplistic reading of these scores to say LLaMA is somehow less “racist” and more “ageist.” The bigger picture is that every machine learning model, especially the many LLMs that are powering a Cambrian explosion of chatbots, should have a clear bias disclosure like LLaMA’s, but very few do.
OpenAI’s GPT-3 model card does not disclose these bias scores. To find out how GPT-3 is biased along the same nine dimensions, you would have to dig into the LLaMA academic paper, where it presented a side by side comparison:
The Meta AI team, of course, did this at least in part to make LLaMA look good, though its average bias score is less than one point lower than GPT-3’s. In fact, GPT-3 actually scores better than LLaMA in five of the nine categories, so OpenAI has nothing to be ashamed of.
This begs the question: how come no other LLMs – from GPT-3.5 and LaMDA (Google), to ERNIE (Baidu) and Chinchilla (DeepMind, so also Google) – lay out their bias scores as clearly as Meta’s LLaMA? What is there to hide?
If these other models also made similar disclosures, then the AI research community, as well as the general public, would have a starting point to investigate and operate on good faith, rather than dragging AI into a human culture war and wasting our breath arguing if ChatGPT is “woke.”
Model Is Not Moat
Implicit in Meta’s decision to open source LLaMA is a view that the model itself is not that valuable to its business. It is a rather enlightened view and one I happen to share.
All machine learning based AI use cases – LLMs for text generation, Stable Diffusion for image creation, prediction engines for content recommendation – are subjected to the “garbage in, garbage out” problem. If you use the most vitriolic threads on Reddit as the training data for an LLM, a chatbot built using that LLM will sound toxic and hateful, no matter how advanced that model is. If your training data only contains content written in American English, speaking to that chatbot in German won’t perform that well, let alone in Japanese.
That’s why the results that a chatbot like ChatGPT generates – the “answers” you get when you chat with it – is technically called “inferences.” They are outputs inferred from the data that it was used during training.
The business value lies in the cleanliness and quality of the training data, and how closely do those data match with the task that the AI application is supposed to solve. If you are building a self-driving system, having the clearest and most updated driving and road condition data is the most important thing. If you are building a health insurance chatbot, like I have foolishly tried to do before, having accurate insurance data and medical procedure pricing is the most important thing. Other unrelated data are just noise and can have adverse effects on the model’s performance.
This is not to say that the models are not valuable. They are. Being able to build models with hundreds of billions of parameters that generate more accurate inferences faster, all the while consuming less computation when deployed is no easy feat. But this “model value” will become more of a “tax”, less of a product differentiator over time. This progression is already unfolding. The speed in which Salesforce has launched ChatGPT-like functions in Slack and its vast suite of sales and marketing SaaS products, so soon after Microsoft has done the same in its own vaste suite of enterprise applications, is a case in point. Smaller SaaS companies like ZoomInfo and Notion are doing the same thing. They are all using the same set of OpenAI models underneath the hood.
The model is not the moat. An advanced model can transform quality data into business value. But the reverse is not true: without quality data, a model is useless no matter how advanced it is – like a brand new highway without cars.
On the other hand, by keeping models as secrets in a black box, it could breed confusion and distrust. This sense of secrecy may create intrigue and attract more customers in the short term, but over time more users will want to know the “how” and “why” behind the model, which if not answered openly and transparently may decrease usage, which may lead to less quality user data as feedback and ultimately hurts the business. This problem is less of an issue with enterprise software companies like Salesforce (B2B companies don’t get dragged to Congress to testify). But for social media companies, like Meta, being open and transparent with its AI models is both good for building trust and good for business. I have shared similar views of open sourcing Twitter’s algorithm and open sourcing TikTok’s internal access system, along the same vein.
In tech, where data is plentiful but trust is scarce, open sourcing your AI models and algorithms is a rare scenario where you can have your cake and eat it too. Meta seems to have figured that out and is going above and beyond its competitors with the way it released and open sourced LLaMA.
It is an example that others should follow. It is an example that OpenAI should have set, given its original mission and commitment to open source. With harsh critiques coming from the likes of Elon Musk and others, who originally co-founded the organization, OpenAI, Google, and others may be forced to follow Meta’s example eventually, before trust begins to erode.