Llama2-70B-Chat is now available on MosaicML Inference

MosaicML is now part of Databricks

Introducing MPT-30B, the latest addition to the MosaicML Foundation Series of Models.

MosaicML Training is where the magic happens. Build models like MPT-30B, the latest addition to the MosaicML Foundation Series.


Complete ownership of your AI. 
  • Train multi-billion-parameter models in days, not weeks
  • Maintain complete data privacy and full model ownership
  • Ensure compliance with regulatory requirements


Customize AI with your data.
  • Use our MPT series or any other pretrained model
  • Our optimization techniques give you the most efficient training runs.
  • Experiment with RLHF and IFT to get the best output for your use case.
"Using the MosaicML platform, we were able to train and deploy our LLM with our own data within a week and achieve leading results."


Amjad Masad, CEO

"MosaicML was able to abstract away all the complexity of distributed model training."


Tony Francis, CEO

“MosaicML has helped us make the training of our large models so much faster.”

Twelve Labs

Aiden Lee, Co-Founder & CTO


Train multi-billion-parameter models in hours, not days. Efficient scaling for large (>70B parameter) models.


Train 2x-7x faster, without changing your code. Our software automatically applies the latest optimizations.


No vendor lock-in. Orchestrate across multiple clouds. Escape data gravity with our StreamingDatset.

Complete Control

Train advanced LLMs and generative AI models in any environment with complete data privacy and full model ownership.

Effortless Scale

Train Large Language Models (LLMs) at scale with a single command. Just point to your S3 bucket and we take care of the rest: launching, monitoring, auto-recovery.

Automated Performance

Stay on the bleeding edge of efficiency. Our performance gurus continually add the latest optimizations into our platform.

Designed for Large Models

Organizations like Replit and Stanford's Center for Research on Foundation Models (CRFM) are training GPT models on specialized datasets with MosaicML. We built features to address the pain points for training LLMs and other generative models.

Training large models is expensive. Extracting performance requires tuning everything from the network interconnects to GPU parallelism strategies to software frameworks. With our optimized platform, you can skip the setup and get training right the first time.
Learn more

⏯ AutoRecovery

Automatic resumption from node failures and loss spikes. No need to babysit LLM training. We monitor and restart from previous checkpoints.


Train any size model on any hardware, without tedious settings trial-and-error. We dynamically adjust memory usage on-the-fly to prevent OOM.

🚀 Efficient

40%+ utilization out of the box with our tuned parallelism settings across model and compute scales.

🔀 Stream

Stream datasets from anywhere quickly and accurately. Resume from checkpoints instantly, no need to wait an hour for dataloader spinning.

Deploy across multiple clouds

Avoid vendor lock-in with our multi-cloud orchestration. With our fast dataset streaming, escape data gravity and route workloads across clusters.


Deploy on MosaicML managed infrastructure, which has been optimized down to the hardware for ML efficiency.

☁️ Public Clouds

Deploy on any public cloud provider (AWS, Azure, GCP, or OCI). Your training data never leaves your network.

Rich Python SDK

Build custom workflows and tooling on top of the MosaicML platform with our comprehensive python SDK. We support integrations with your favorite MLOps tools. Automatically package and submit local files with a few lines of code.