Llama2-70B-Chat is now available on MosaicML Inference

MosaicML is now part of Databricks

Introducing MPT-30B, the latest addition to the MosaicML Foundation Series of Models.

Why Enterprises Should Treat AI Models Like Critical IP (Part 1)

Why Enterprises Should Treat AI Models Like Critical IP (Part 1)

Five years ago, The Economist proclaimed that data was the new oil. Since then, the power of amassed data to impact the world has become even more undeniable. That’s why companies should treat AI models as some of their most important intellectual property, rather than setting them aside as something with the potential for future impact. Today’s large, state-of-the-art AI models can be viewed as a powerful tool to activate an organization's data - and maximize its value.

AI models are simply an organized version of a company’s data in the form of a flexible, intuitively queryable database. As such, trained models represent the unique value a company brings to the world via the data it has collected by selling products and services to its customers. In order to take full advantage of their most valuable asset, companies must be able to build these models internally.

Understanding AI Models: A Quick History

To gain a better understanding of what AI models represent, it’s helpful to look at their inspiration: the human brain. Our brains encode information in the connection strength between neurons, known as a synapse. Information is typically organized into a learned conceptual hierarchy; raw data is transformed into concepts and relationships between concepts. Long-term memories form when these concepts and connections are encoded into persistent synaptic strengths. We’re able to retrieve those encoded memories by language, sight, touch, or any kind of trigger where relevant information is needed.

In the early days of artificial neural networks, scientists attempted to replicate some of these behaviors by modeling synaptic strength as a numeric weight and connections as a multiplication operation between weights and neurons. Biological neural networks (aka brains) use chemical processes to “update their weights” and modify the influence of one neuron upon another during learning. Artificial neural networks update their weights during learning using an algorithm called backpropagation; this is an iterative method that reduces the error of the output. Backpropagation generally requires global error information to update each weight after each sample has been pushed through. While artificial neural networks differ from biological ones in terms of how these weights are learned, they both produce the same result: input data is encoded in the weights of the neural network.

Up until about five years ago, AI model innovation was focused on the topology of the neural network— literally, how the model is organized. The idea was that the capabilities of the trained model largely stemmed from the topology and its ability to provide a better understanding of the inherent structure of the data on which it is trained. Every year, new topologies were introduced that increased benchmark performance on industry-standard datasets such as MNIST, CIFAR, and ImageNet. AlexNet, VGG, Inception Networks, and ResNets are all examples of networks able to achieve new levels of learning from the same dataset.

More recently, we’ve seen the advent of new types of models like large language models (LLMs) and diffusion models that learn about our world by exposure to data. LLMs like BERT learn about sentence structure and knowledge that’s embedded in the text they are fed. Diffusion models (most famously, DALL-E)  learn about relationships between textual descriptions of images and are able to produce images based on novel descriptions. In contrast to our previous focus on model topology, we’re beginning to understand that the data itself and how it is presented during learning dictate the performance achieved. In fact, there is a standardization of neural network topology occurring, similar to an evolutionary phenomenon observed in biology.

AI Models Are Organized Data

When an AI model is being trained, it encodes relationships observed in the data on which it is trained. The resultant model is like a fingerprint of its training data. It’s almost as if each AI model has its own identity, as each dataset and training method creates a unique set of representations. In fact, generative AI models can even memorize portions of the training dataset that can be retrieved when prompted. Neural networks are capable of representing vast data sets. Suddenly, the entire space represented by the data set can be used for decision making by expressing it through a generative model.

These models can be thought of as a form of database that is unique and specific to the data on which it is trained. But they aren’t databases in any standard sense. Traditional databases rely on rigidly structured input data, like columns and rows, a predefined organizational schema, and highly structured queries. SQL and other languages were designed to precisely express these queries within these organizational schemas. 

Large models handle data very differently. They can find inherent relationships within unstructured, noisy input data, and discover an organizational schema to represent this data. The data can be queried in unstructured and imprecise ways. Large language models like GPT-3 allow these queries to come in the form of natural human language with all of its imprecision. For example, users can essentially engage with OpenAI’s GPT-3 model by asking it questions about history, politics, or science. Diffusion models like Stable Diffusion allow human language queries that produce novel combinations of pixels (generated images) that represent associations between those words and images in the training data set. These models are able to extract meaningful relationships embedded in the data set, coming closer to the capability of knowledge extraction.

However, despite the magical-seeming qualities of these popular models, the power to interact with data in this way is only just now being realized. The precision of knowledge recall is tough to guarantee at the moment, making these models unreliable for mission-critical applications. For instance, a neural network that powers a healthcare chatbot would need to be incredibly accurate in order to prevent potential harm to patients. This is an active area of research and development.

Why Model Building Must Be Accessible

As we learn how to harness the power of data and build and train domain-specific AI models, we have the potential to transform nearly every industry. AI model training is in contrast to AI inference; the latter is much less computationally demanding and only deals with data points at time of usage. Model training, however, requires a large amount of three things that are in short supply: compute, data, and talent. And as model sizes increase, so does the need for these three elements.

Although many of the techniques commonly used in model training are publicly available, there is limited access to the knowledge — and technology — required to make them work. Simply put, the capabilities are obscured by complexity. What’s more, an enormous amount of compute capacity is required to build an AI model from scratch using a given dataset, and only a handful of companies like OpenAI, Meta, and Google have access to the hundreds of millions of dollars of processing power needed. This situation has created a world of haves and have-nots; AI model building and training has become inaccessible to most organizations since only the most advanced, most well-resourced groups can do it. 

The result is a world where a handful of teams are building new models on datasets they have curated, then selling access to these models via API. This might seem like progress — sophisticated teams are building solutions and making them broadly available — but, as previously mentioned, models are a representation of their data. Companies that specialize in a given field tend to have the best understanding of the data and how it should be leveraged, but they lack the algorithmic and systems expertise required to build AI models. This siloing of capabilities is a fundamental obstacle to the broad and open application of AI to new fields.

In part two of this blog, I’ll go into more detail about why this state of affairs can be problematic…and how MosaicML is working to break down these silos and make ML models trained on domain-specific data widely available.

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.