🚀 Our MPT-7B family of open-source models is trending on the Hugging Face Hub! Take a look at our blog post to learn more. 🚀

✨ We’ve just launched our Inference service. Learn more in our blog post

MosaicML Inference

Add AI to your apps 15x cheaper. Choose any model, any size, in your secure environment.

Generative AI Inference Cost Comparison

Deploy Your ML Models Faster, with Full Control of Your Data and Model

Starter Edition

Prototype quickly

Start with our endpoints for popular open source models that we’ve curated from the best of the ML community.

  • MPT-7B-Instruct
  • Instructor-Large by HKU
  • GPT2-XL by OpenAI
  • Instructor-XL by HKU
  • GPT-NeoX 20B by EleutherAI
  • Dolly 12B by Databricks

Enterprise Edition

Maximum privacy & control
  • Deploy any model — open source or your own
  • Seamless integration with models trained with MosaicML Platform
  • Deploy on any cloud provider (GCP, AWS, OCI) or on-prem
  • Easy integration with Hugging Face Hub

Fully Featured API Endpoints

Difficult features like word-by-word output streaming and dynamic request batching are already set up for you.

Customize authorization settings in your own VPC for maximum compliance with regulations like HIPAA and SOC2.

Feature Breakdown

MosaicML optimized inference cluster
Any cloud provider, hardware, region, availability zone
Top open-source AI models, curated by MosaicML
Any AI model, fully customizable
API usage: $ / 1K Tokens
Pay per GPU-minute of usage. Contact us for details
Up to 4x lower than OpenAI
Up to 15x lower than OpenAI
Business hours, shared Slack
24/7/365, dedicated support
Data Privacy
We never store nor use your data —period
Data never leaves your secure environment
ResponseTime SLA
4 hours for SEV-1*
1 business day for SEV-2**
1 hour for SEV-1*
4 hours for SEV-2**
*SEV-1: Production system is down or is severely impacted such that routine operation is impossible
**SEV-2: System is functional but offers service in degrated or restricted capacity

The Best Open Source LLMs, Curated for You

Every one of these models provides an easy starting point when adding generative AI to applications. Start coding right away and see your application functionality in action.

Text Embedding

Turn text into vectors that can be consumed by machine learning models and algorithms.
Model Name
Number of Parameters
Price Per 1K Tokens
335 million
1.2 billion

Text Completion

Return predicted text based on a prompt.
Model Name
Number of Parameters
Price Per 1K Tokens
1.3 billion
6.7 billion
12 billion
20 billion