MosaicML Cloud Demo
The MosaicML Cloud makes it easy to train any size model on any number of GPUs. Achieve more accurate results faster and seamlessly scale your workloads with our distributed training methods. In this video, we show how to easily run and monitor ML training jobs, scale training across multiple GPUs and multiple nodes, and lastly speed up training with algorithmic and system efficiency methods.
Why Enterprises Should Treat AI Models Like Critical IP (Part 1)
Five years ago, The Economist proclaimed that data was the new oil. Since then, the power of amassed data to impact the world has become even more undeniable. That’s why companies should treat AI models as some of their most important intellectual property, rather than setting them aside as something with the potential for future impact. Today’s large, state-of-the-art AI models can be viewed as a powerful tool to activate an organization's data - and maximize its value.
5x Faster Image Segmentation Training with MosaicML Recipes
Can’t stop, won’t stop. Earlier this year, we shared a new baseline for semantic segmentation (basically, classifying an image at the pixel level) using DeepLabv3+ model architecture on the ADE20k dataset. Now, we’re introducing recipes for training semantic segmentation models that either reduce time-to-train by up to 5.4x or improve quality by up to +4.6 mIoU. If you want to train your segmentation models on the best ML training platform available, learn more at mosaicml.com/cloud
MosaicML Cloud Delivers Leading NLP Performance in MLPerf v2.1
MosaicML leads the MLPerf NLP results, delivering a score of 7.9 minutes on 8x NVIDIA A100 GPUs in the Open Division, thanks to algorithmic and systems optimizations delivered through the MosaicML Cloud.
New in Composer 0.11: FSDP Support, Streaming v0.1 Release, Simplified Checkpointing and Distributed Experience
We’re announcing the 0.11 release of Composer, MosaicML’s open-source library for training PyTorch neural networks faster, cheaper, and to higher accuracy. With Composer, we stack and combine speed-up methods into recipes that optimize your training. Composer 0.11 is available as a Python package via pip, and the source code is on GitHub.
Train Faster & Cheaper on AWS with MosaicML Composer
Use Composer, our open-source training library, to reduce deep learning training time and cost on AWS. Composer makes it easy to use the latest and greatest training algorithms, composing them together to speed up training and improve model quality.
Introducing MosaicML Cloud: Your Path to State of the Art AI
Today, we’re launching the MosaicML Cloud to give you unprecedented access to state-of-the-art AI training. With our purpose-built, full-stack managed platform, we handle the systems and hardware complexities so you can build high-performing, domain-specific AI models - and transform your business.
Mosaic LLMs (Part 2): GPT-3 quality for <$500k
Training large language models (LLMs) costs less than you think. Using MosaicML Cloud, we show how fast, cheap, and easy it is to train these models at scale (1B -> 70B parameters). With new training recipes and infrastructure designed for large workloads, we enable you to train LLMs while maintaining total customizability over your model and dataset.
New in Composer 0.10: CometML integration, Auto evaluation batch size selection, Streaming dataset preview, and API improvements!
New in Composer 0.9: Export for inference APIs, ALiBi for efficient BERT training, TPU beta support, and more!
We’re announcing the 0.9 release of Composer, MosaicML’s open source library for training PyTorch neural networks faster, at lower cost, and to higher accuracy. Composer 0.9 is available as a Python package via Conda or pip, and the source code is on GitHub.
Mosaic LLMs (Part 1): Billion-Parameter GPT Training Made Easy
In Part 1 of this LLM blog post series, we use the MosaicML platform to train vanilla GPT-3 models up to 1.3B params, and show how to cut training times down to hours with strong multi-node scaling. We also discover that larger models can train more efficiently than smaller models on modern hardware, and that a 10x in parameter count may only result in ~5x the training time.
Behind the Scenes: Setting a Baseline for Image Segmentation Speedups
We establish a new semantic segmentation baseline of 45.56 mIoU on the ADE20k segmentation benchmark in 3.5 hours on a system with 8x NVIDIA A100 GPUs. For this baseline, we train the DeepLabv3+ model architecture on the ADE20k dataset and evaluate the model’s performance using its scene parsing benchmark. In particular, we update the previous DeepLabv3+ baseline by using the improved PyTorch pre-trained weights and increasing the batch size for additional computational efficiency. By the end, we demonstrate a DeepLabv3+ baseline on ADE20k with +1.4 mean intersection-over-union and a 1.8x faster training time than previously published baselines.
Mosaic ResNet Deep Dive
TL;DR: We recently released a set of recipes which can accelerate training of a ResNet-50 on ImageNet by up to 7x over standard baselines. In this report we take a deep dive into the technical details of our work and share the insights we gained about optimizing the efficiency of model training over a broad range of compute budgets.
MosaicML Satisfies the Need for Speed with MLPerf Results
MosaicML’s Open Division submission to the MLPerf Image Classification benchmark delivers a score of 23.8 minutes (4.5x speed-up relative to our baseline) on 8x NVIDIA A100 GPUs. Our results show how algorithmic speedups written in PyTorch deliver ML innovation that truly can benefit everyone, from academic researchers to enterprise practitioners.
Farewell, CUDA OOM: Automatic Gradient Accumulation
With automatic gradient accumulation, Composer lets users seamlessly change GPU types and number of GPUs without having to worry about batch size. CUDA out of memory errors are a thing of the past!
Composer + FFCV: Faster Together
Composer is pushing the envelope on speed and efficiency in model training. Integrating Composer with FFCV, a fast dataloading library from Aleks Madry’s lab at MIT, unlocks new speedup methods by eliminating the dataloader bottleneck often experienced when using CPU-intensive operations in the training loop. The FFCV dataloader is one of the ingredients of our Mosaic ResNet recipe, which demonstrates how algorithmic efficiency can dramatically speed up model training.
Blazingly Fast Computer Vision Training with the Mosaic ResNet and Composer
Match benchmark accuracy on ImageNet (He et al., 2015) in 27 minutes, a 7x speedup (ResNet-50 on 8xA100s). Reach higher levels of accuracy up to 3.8x faster than existing state of the art (Wightman et al., 2021). Try it out in Composer, our open-source library for efficient neural network training. It’s written in standard, easy-to-use PyTorch, so modify it to suit your needs and build on it!
Efficiently Estimating Pareto Frontiers with Cyclic Learning Rate Schedules
Benchmarking the tradeoff between model accuracy and training time is computationally expensive. Cyclic learning rate schedules can construct a tradeoff curve in a single training run. These cyclic tradeoff curves can be used to evaluate the effects of algorithmic choices on network training efficiency.
At MosaicML, we're working on making neural network training more efficient algorithmically. In this post, we describe this research problem and how we're solving it.
5 Best Practices for Efficient Model Training
In the course of our research and product development we’ve codified a number of best practices for efficient CNN training, and we’d like to share some of them with you here.