MosaicBERT: Pretraining BERT from Scratch for $20
With the MosaicBERT architecture + training recipe, you can now pretrain a competitive BERT-Base model from scratch on the MosaicML platform for $20. We’ve released the pretraining and finetuning code, as well as the pretrained weights.
Train Custom GPT & Diffusion Models with MosaicML
The MosaicML platform is designed to tackle the challenges of training large models such as ChatGPT, LaMDA, and Stable Diffusion. Our blog post breaks down the difficulties of training such models, and shows how our platform makes training large AI models easier.
MosaicML StreamingDataset: Fast, Accurate Streaming of Training Data from Cloud Storage
Loading your training data becomes an escalating challenge as datasets grow bigger in size and the number of nodes scales. We built StreamingDataset to make training on large datasets from cloud storage as fast, cheap, and scalable as possible. Specially designed for multi-node, distributed training, StreamingDataset maximizes correctness guarantees, performance, and ease of use.
Blazingly Fast LLM Evaluation for In-Context Learning
With MosaicML you can now evaluate LLMs on in-context learning tasks (LAMBADA, HellaSwag, PIQA, and more) hundreds of times faster than other evaluation harnesses. For 70B parameter models, LAMBADA takes only 100 seconds to evaluate on 64 A100 GPUs, and evaluation of a 1.2 trillion parameter model takes less than 12 minutes when using 256 NVIDIA A100 GPUs.
Training Stable Diffusion from Scratch Costs <$160k
We wanted to know how much time (and money) it would cost to train a Stable Diffusion model from scratch using our Streaming datasets, Composer, and MosaicML platform. Our results: it would take us 79,000 A100-hours in 13 days, for a total training cost of less than $160,000. Our tooling not only reduces time and cost by 2.5x, but it is also extensible and simple to use.
Why Enterprises Should Treat AI Models Like Critical IP (Part 2)
In 2022, the potential of Large Language Models (LLM) and Generative AI entered the mainstream, while organizations began to recognize the value of state-of-the-art AI models to activate their data. In part 2 of this blog, I explore why enterprises should treat AI models as some of their most important intellectual property in 2023 and beyond.
New in Composer 0.12: Mid-Epoch Resumption with MosaicML Streaming, CometML ImageVisualizer, HuggingFace Model and Tokenizer Loading, and more!
We’re announcing the 0.12 release of Composer, MosaicML’s open-source library that makes scalable, efficient neural network training easy. Composer 0.12 is available as a Python package via pip, and the source code is on GitHub.
MosaicML + Comet
We’ve integrated MosaicML Cloud and Composer with Comet's experiment tracking platform, so ML practitioners can easily log relevant metrics and metadata. Improve your speed and efficiency with an end-to-end solution that helps you visualize and track your training runs to get the best model for your needs in the shortest time. In this blog post, we will show how easy it is to monitor and log your training workloads on the MosaicML Cloud with Comet.
BioMedLM: a Domain-Specific Large Language Model for Biomedical Text
The Stanford Center for Research on Foundation Models (CRFM) and MosaicML announce the release of BioMedLM, a purpose-built AI model trained to interpret biomedical language. Editorial update: this blog post was revised on 1/30/2023 to reflect name change from PubMed GPT.
Supercharge Your Model Training with MosaicML Composer
MosaicML was founded in 2020 to address the challenges of growing AI complexity and cost. We want advanced AI to be accessible to a broad set of enterprises and organizations - so we we built Composer, an open-source library that speeds up neural network training.
MosaicML Platform Demo
MosaicML makes it easy to train any size model on any number of GPUs. Achieve more accurate results faster and seamlessly scale your workloads with our distributed training methods. In this video, we show how to easily run and monitor ML training jobs, scale training across multiple GPUs and multiple nodes, and lastly speed up training with algorithmic and system efficiency methods.
Why Enterprises Should Treat AI Models Like Critical IP (Part 1)
Five years ago, The Economist proclaimed that data was the new oil. Since then, the power of amassed data to impact the world has become even more undeniable. That’s why companies should treat AI models as some of their most important intellectual property, rather than setting them aside as something with the potential for future impact. Today’s large, state-of-the-art AI models can be viewed as a powerful tool to activate an organization's data - and maximize its value.
5x Faster Image Segmentation Training with MosaicML Recipes
Can’t stop, won’t stop. Earlier this year, we shared a new baseline for semantic segmentation (basically, classifying an image at the pixel level) using DeepLabv3+ model architecture on the ADE20k dataset. Now, we’re introducing recipes for training semantic segmentation models that either reduce time-to-train by up to 5.4x or improve quality by up to +4.6 mIoU. If you want to train your segmentation models on the best ML training platform available, learn more at mosaicml.com/cloud
MosaicML Delivers Leading NLP Performance in MLPerf v2.1
MosaicML leads the MLPerf NLP results, delivering a score of 7.9 minutes on 8x NVIDIA A100 GPUs in the Open Division, thanks to algorithmic and systems optimizations delivered through our platform.
New in Composer 0.11: FSDP Support, Streaming v0.1 Release, Simplified Checkpointing and Distributed Experience
We’re announcing the 0.11 release of Composer, MosaicML’s open-source library for training PyTorch neural networks faster, cheaper, and to higher accuracy. With Composer, we stack and combine speed-up methods into recipes that optimize your training. Composer 0.11 is available as a Python package via pip, and the source code is on GitHub.
Train Faster & Cheaper on AWS with MosaicML Composer
Use Composer, our open-source training library, to reduce deep learning training time and cost on AWS. Composer makes it easy to use the latest and greatest training algorithms, composing them together to speed up training and improve model quality.
MosaicML: Your Path to State of the Art AI
MosaicML gives you unprecedented access to state-of-the-art AI training. With our purpose-built, full-stack managed platform, we handle the systems and hardware complexities so you can build high-performing, domain-specific AI models - and transform your business.
Mosaic LLMs (Part 2): GPT-3 quality for <$500k
Training large language models (LLMs) costs less than you think. Using MosaicML Cloud, we show how fast, cheap, and easy it is to train these models at scale (1B -> 70B parameters). With new training recipes and infrastructure designed for large workloads, we enable you to train LLMs while maintaining total customizability over your model and dataset.
New in Composer 0.10: CometML integration, Auto evaluation batch size selection, Streaming dataset preview, and API improvements!
New in Composer 0.9: Export for inference APIs, ALiBi for efficient BERT training, TPU beta support, and more!
We’re announcing the 0.9 release of Composer, MosaicML’s open source library for training PyTorch neural networks faster, at lower cost, and to higher accuracy. Composer 0.9 is available as a Python package via Conda or pip, and the source code is on GitHub.
Mosaic LLMs (Part 1): Billion-Parameter GPT Training Made Easy
In Part 1 of this LLM blog post series, we use the MosaicML platform to train vanilla GPT-3 models up to 1.3B params, and show how to cut training times down to hours with strong multi-node scaling. We also discover that larger models can train more efficiently than smaller models on modern hardware, and that a 10x in parameter count may only result in ~5x the training time.
Behind the Scenes: Setting a Baseline for Image Segmentation Speedups
We establish a new semantic segmentation baseline of 45.56 mIoU on the ADE20k segmentation benchmark in 3.5 hours on a system with 8x NVIDIA A100 GPUs. For this baseline, we train the DeepLabv3+ model architecture on the ADE20k dataset and evaluate the model’s performance using its scene parsing benchmark. In particular, we update the previous DeepLabv3+ baseline by using the improved PyTorch pre-trained weights and increasing the batch size for additional computational efficiency. By the end, we demonstrate a DeepLabv3+ baseline on ADE20k with +1.4 mean intersection-over-union and a 1.8x faster training time than previously published baselines.
Mosaic ResNet Deep Dive
TL;DR: We recently released a set of recipes which can accelerate training of a ResNet-50 on ImageNet by up to 7x over standard baselines. In this report we take a deep dive into the technical details of our work and share the insights we gained about optimizing the efficiency of model training over a broad range of compute budgets.
MosaicML Satisfies the Need for Speed with MLPerf Results
MosaicML’s Open Division submission to the MLPerf Image Classification benchmark delivers a score of 23.8 minutes (4.5x speed-up relative to our baseline) on 8x NVIDIA A100 GPUs. Our results show how algorithmic speedups written in PyTorch deliver ML innovation that truly can benefit everyone, from academic researchers to enterprise practitioners.
Farewell, CUDA OOM: Automatic Gradient Accumulation
With automatic gradient accumulation, Composer lets users seamlessly change GPU types and number of GPUs without having to worry about batch size. CUDA out of memory errors are a thing of the past!
Composer + FFCV: Faster Together
Composer is pushing the envelope on speed and efficiency in model training. Integrating Composer with FFCV, a fast dataloading library from Aleks Madry’s lab at MIT, unlocks new speedup methods by eliminating the dataloader bottleneck often experienced when using CPU-intensive operations in the training loop. The FFCV dataloader is one of the ingredients of our Mosaic ResNet recipe, which demonstrates how algorithmic efficiency can dramatically speed up model training.
Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer
Match benchmark accuracy on ImageNet (He et al., 2015) in 27 minutes, a 7x speedup (ResNet-50 on 8xA100s). Reach higher levels of accuracy up to 3.8x faster than existing state of the art (Wightman et al., 2021). Try it out in Composer, our open-source library for efficient neural network training. It’s written in standard, easy-to-use PyTorch, so modify it to suit your needs and build on it!
Efficiently Estimating Pareto Frontiers with Cyclic Learning Rate Schedules
Benchmarking the tradeoff between model accuracy and training time is computationally expensive. Cyclic learning rate schedules can construct a tradeoff curve in a single training run. These cyclic tradeoff curves can be used to evaluate the effects of algorithmic choices on network training efficiency.
At MosaicML, we're working on making neural network training more efficient algorithmically. In this post, we describe this research problem and how we're solving it.
5 Best Practices for Efficient Model Training
In the course of our research and product development we’ve codified a number of best practices for efficient CNN training, and we’d like to share some of them with you here.
Connect With The Community
Let’s make ML better, one method at a time.
We want our community to be a safe and inclusive space for all current and future ML practitioners. Learn more in our Community Guidelines and Code of Conduct