🚀 Our MPT-7B family of open-source models is trending on the Hugging Face Hub! Take a look at our blog post to learn more. 🚀

✨ We’ve just launched our Inference service. Learn more in our blog post ✨

Why Enterprises Should Treat AI Models Like Critical IP (Part 2)

Why Enterprises Should Treat AI Models Like Critical IP (Part 2)

In 2022, the potential of Large Language Models (LLM) and Generative AI entered the mainstream, while organizations began to recognize the value of state-of-the-art AI models to activate their data. In part 2 of this blog, I explore why enterprises should treat AI models as some of their most important intellectual property in 2023 and beyond.

In part 1 of this blog, I gave a brief history of the development of AI models - and why they should be considered an essential part of a company’s IP that leverages its most valuable asset: data. At the same time, access to AI model building and training has been limited due to the cost and complexity of model development. So - where do we go from here?

First and foremost: A central party serving a single, generic model provides no unique differential value to its users. Every user of that model has access to the same capabilities. When two competitors are using the same model, there’s little competitive advantage gained from the model itself. By definition, any competitive advantage comes from what each company is doing differently or better than the other.

However, there are scenarios where "genericized" data baked into a model enables innovation. For example, when the cost of data aggregation is very high, building a central model based on generic data can be useful when differentiation is added in other ways. For instance, in the medical field, a model built off of a single data source like cell pathology can be used to help automate cancer detection, search for genetic abnormalities, or identify tissue abnormalities. Applying a model to adjacent, non-competitive applications within the field of medical diagnosis can create unique value for each one.

All Data is Different

It is very unlikely that an AI model trained on generic data will derive insight from domain-specific inputs. For instance, models like GPT-3 are trained on Reddit and OpenWebText. If a company wants to use them with domain-specific data such as legal or medical text, the learned relationships in the generic model won’t have much predictive power. This should come as no surprise. Doctors, lawyers, and scientists all use specific ways to communicate that are precise to their field, require years of study, and often have large deviations from vernacular language. 

If you’re well-versed in the AI and ML space, here’s the part of the blog where you ask: But what about fine-tuning? For everyone else: fine-tuning is the process of starting with a model trained on generic data (a pre-trained model) and continuing to train it on a more specific data set. Fine-tuning can improve a generic model and certainly has its place. When the domain-specific data points are much smaller in number than the generic data points, for example, fine tuning can augment a generic model to give it some domain specificity. However, this has limits. A pre-trained model has established representations of the data on which it was initially trained. If the statistics of the fine-tuning data are vastly different, the model simply won’t be able to capture these differences well. There’s a growing body of literature on this topic which is beyond the scope of this article. 

We’ve observed that enterprises often have vast datasets as big or bigger than the generic datasets used to build the model initially. If that’s the case, why start with a generic model? It’s much more powerful to build a domain-specific model that’s unique to the data on which it’s trained. The provenance of that data is known, and companies can be certain that the ML model doesn’t contain any data from apocryphal (or worse, biased) sources. 

You’ve Got the Data - Let’s Put It to Use

MosaicML was founded with the belief that it is crucial for enterprises to have the power to build, train, maintain, and update their own ML models. However, we recognize that resources for training large-scale neural network models are still not widely available to enterprises. 

Building large models requires large, difficult-to-manage computing resources: A GPT-3 scale model, for example, requires at least 256 GPUs to complete the training in a tractable amount of time (approximately one month). Currently, this requires 1) an infrastructure operations team to set up and maintain compute at this scale, 2) software to coordinate and orchestrate these resources, 3) machine learning software knowledge required to optimize the compute, and 4) general machine learning knowledge to apply neural network techniques to the enterprise’s data sets. This equates to a team of ~100 people and tens of millions of dollars of infrastructure. That's a high barrier to entry, and the reason these capabilities are only available to the largest and most advanced tech companies. But enterprises that can overcome these challenges will have a competitive advantage by creating differentiated value within their industries. 

Ideally, any reasonably sized organization should be able to apply ML insights with access to tools that remove barriers to entry and make training neural network models faster, cheaper, and easier. Infrastructure can be accessed with stable APIs that abstract away the details of scaling, orchestration, optimization, and management of the underlying hardware. Technologies that greatly reduce the cost of training models by more efficiently using hardware resources will stabilize costs so that more enterprises have access to state-of-the-art capabilities. 

These tools don’t have to be hard to use or require specialized talent. If we think of ML models as an activated form of data, the ability to train models becomes a basic need for organizations that store and generate lots of data. Data provenance is becoming increasingly important as we’ve seen LLMs either attempt to impute data points that don’t exist (ahem, hallucinate) or produce data seen during training. Building smaller, more domain-specific models can not only increase in-area performance, but also reduce inference costs and latency. The impact of machine learning in our daily lives will be fully realized once it is accessible to a wide variety of enterprise companies.

Interested in learning more about our approach to making state-of-the-art AI & ML widely available? Check out a demo of our purpose-built platform for accessing the best algorithms, systems and architectures - or sign up for early access to the MosaicML cloud today.

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.