Llama2-70B-Chat is now available on MosaicML Inference

MosaicML is now part of Databricks

Introducing MPT-30B, the latest addition to the MosaicML Foundation Series of Models.

MosaicML Inference

Add the power of LLMs to your app with optimized performance.
Now with Llama2-70B-Chat.

Generative AI Inference Cost Comparison

Free Trial! Click the Get Started link above for 30 days of complimentary access to MosaicML Inference

MosaicML Inference

Enterprise-grade, secure APIs

Use our reliable endpoints for popular open source models curated from the ML community.

Text Generation
  • MPT-7B-Instruct
  • MPT-30B-Instruct
  • Llama2-70B-Chat
Text Embeddings
  • Instructor-XL by HKU
  • Instructor-Large by HKU
MPT-7B-Instruct and MPT-30B-Instruct are open source models from MosaicML, see https://mosaicml.com/blog/mpt-30b
Instructor-Large and Instructor XL are open source models from HKUNLP, see https://instructor-embedding.github.io
Llama2-70B-Chat is an open model from Meta. Llama 2 is licensed under the Llama 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

Fully Featured API Endpoints

Difficult features like word-by-word output streaming and dynamic request batching are already set up for you.

Customize authorization settings for maximum compliance with regulations like HIPAA and SOC2.

Feature Breakdown

Models
Top open AI models, curated by MosaicML
Pricing
API usage: $ / 1K Tokens
Cost
Up to 4x lower than OpenAI
Support
Business hours, shared Slack
Data Privacy
We never store nor use your data —period
ResponseTime SLA
4 hours for SEV-1*
1 business day for SEV-2**
*SEV-1: Production system is down or is severely impacted such that routine operation is impossible
**SEV-2: System is functional but offers service in degrated or restricted capacity

The Best Open LLMs, Curated for You

Every one of these models provides an easy starting point when adding generative AI to applications. Start coding right away and see your application functionality in action.

Text Embedding

Turn text into vectors that can be consumed by machine learning models and algorithms.
Model Name
Number of Parameters
Price Per 1K Input Tokens
Instructor-Large
335 million
$0.0001
Instructor-XL
1.2 billion
$0.0002

Text Completion

Return predicted text based on a prompt.
Model Name
Number of Parameters
Price Per 1K Tokens
(Input + Output)
Llama-2-70b-Chat
(best quality)
70 billion
$0.002
MPT-30B Instruct
30 billion
$0.001
MPT-7B Instruct
(fastest)
6.7 billion
$0.0005