✨ We just announced Composer to speed up training your models. Check us out on GitHub! ✨

MosaicML Satisfies the Need for Speed with MLPerf Results

MosaicML Satisfies the Need for Speed with MLPerf Results

MosaicML’s Open Division submission to the MLPerf Image Classification benchmark delivers a score of 23.8 minutes (4.5x speed-up relative to our baseline) on 8x NVIDIA A100 GPUs. Our results show how algorithmic speedups written in PyTorch deliver ML innovation that truly can benefit everyone, from academic researchers to enterprise practitioners.

MLPerf is an AI performance benchmark that is home to some of the fastest code on the planet, fine-tuned by each hardware vendor. However, for enterprise machine learning practitioners, MLPerf results have largely been inaccessible. Jaw-dropping results (ResNet-50 on ImageNet in just a few minutes!) often employ rarely used frameworks (like MXNet) and bespoke implementations that look completely different from the solutions produced by working data scientists. Each benchmark relies on hardware-targeted custom code loaded with tricks to squeeze out every last ounce of performance, tailored to the exact models or datasets under test. While impressive, it's difficult for enterprise data scientists to enjoy these extraordinary speedups in their everyday work, because the techniques employed rarely generalize to any other use case.

MosaicML’s mission is to make AI model training more efficient for everyone. In service of that goal, our submission to MLPerf Training 2.0 brings MLPerf-level speed to data scientists and researchers. We use general purpose training code built on PyTorch, and include algorithmic efficiency methods, all wrapped into our open-source library Composer. These optimizations can be applied to your own model architectures and datasets with just a few lines of code.


Our open result of 23.8 minutes is 17% faster than NVIDIA’s submission of 28.7 minutes in the closed division on the same hardware.

We submitted two results to the MLPerf Training Open division’s Image Classification benchmark. The first result is our baseline ResNet-50, which uses standard hyperparameters from research. The second result adds our algorithmic efficiency methods with just a few lines of code, and achieves a 4.5x speed-up (23.8 minutes) compared to our baseline (110.5 minutes), on 8x NVIDIA A100-80GB.1

Our open result of 23.8 minutes is 17% faster than NVIDIA’s submission of 28.7 minutes in the closed division on the same hardware.2 Our submission uses PyTorch instead of MXNet, includes more common research hyperparameters, and adds algorithmic techniques for more efficient training.3

Submission Summary
Submission MLPerf Division Time to Train4
Framework Hyperparameters
MosaicML ResNet-50 Baseline1 Open 110.5 PyTorch MosaicML Research Baseline Hyperparameters (Derived from NVIDIA Deep Learning Examples)
MosaicML ResNet-50 Optimized1 with Composer Speedup Methods Open 23.8 PyTorch MosaicML Research Baseline Hyperparameters (Derived from NVIDIA Deep Learning Examples)
NVIDIA ResNet-50 Optimized2 Closed 28.7 MXNet Optimized for MLPerf

Our techniques preserve the model architecture of ResNet-50, but change the way it is trained. For example, we apply Progressive Image Resizing, which slowly increases the image size throughout training. These results demonstrate why improvements in training procedure can matter as much, if not more, than specialized silicon or custom kernels and compiler optimizations. Deliverable purely through software, we argue that these efficient algorithmic techniques are more valuable and also accessible to enterprises.

For more details on our recipes, see our Mosaic ResNet blog post, and our submission code README.

Put our MLPerf Submission to Work for You!

Since adding algorithms requires just a few lines of code, test our recipes on your own datasets, or experiment with algorithm combinations, by using the code below with your own models and datasets.

from composer import Trainer, algorithms

trainer = Trainer(
    # speed-up algorithms below


Looking for faster training on other workloads? Stay tuned as we release fast recipes for NLP models, semantic segmentation, object detection, and more! Questions? Contact us at community@mosaicml.com!

1 Official MLPerf 2.0 Training results. Baseline: 2.0-2126; Methods: 2.0-2127

2 Official MLPerf 2.0 Training results. NVIDIA: 2.0-2090

3 Our baseline of 110.5 minutes is slower than NVIDIA’s submission of 28.7 minutes because we use PyTorch instead of MXnet, and more common research hyperparameters instead of benchmark-optimized hyperparameters. As our methods demonstrate, our algorithmic efficiencies more than compensate, leading to a 17% improvement. Future work can combine these approaches.

4 The MLPerf Image Classification score measures time to train to 75.9% validation accuracy on ImageNet.

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.