🚀 Our MPT-7B family of open-source models is trending on the Hugging Face Hub! Take a look at our blog post to learn more. 🚀

✨ We’ve just launched our Inference service. Learn more in our blog post

New in Composer 0.12: Mid-Epoch Resumption with MosaicML Streaming, CometML ImageVisualizer, HuggingFace Model and Tokenizer Loading, and more!

New in Composer 0.12: Mid-Epoch Resumption with MosaicML Streaming, CometML ImageVisualizer, HuggingFace Model and Tokenizer Loading, and more!

We’re announcing the 0.12 release of Composer, MosaicML’s open-source library that makes scalable, efficient neural network training easy. Composer 0.12 is available as a Python package via pip, and the source code is on GitHub.

We are excited to announce the release of Composer 0.12 (release notes)! This release includes several new features, plus improvements to existing capabilities - kudos to the Composer community for your engagement, feedback, and contributions to this release.

Install via pip:
pip install mosaicml==0.12.0

For those who want to join the Composer community:  learn more about contributing here, and message us on Slack if you have any questions or suggestions!

There are many features included in this release, so let’s dive in and see what’s new.

What's in this release

  • CometML ImageVisualizer to visualize images and segmentation masks while training your computer vision models
  • Oracle Cloud Infrastructure ObjectStore and Google Cloud Storage support for saving and loading checkpoints
  • Mid-Epoch Resumption with MosaicML Streaming support
  • Support for loading HuggingFace models and tokenizers for easier pre-training to fine-tuning workflow
  • OptimizerMonitor updates to track optimizer-specific metrics
  • New PyTorch 1.13 + CUDA 11.7 Docker images
  • New community contribution for speedup algorithm - GyroDropout

About Composer

The Composer library helps developers train PyTorch neural networks faster, at lower cost, and to higher accuracy. Composer includes:

  • 20+ methods for speeding up training networks for computer vision and language modeling.
  • An easy-to-use trainer that integrates best practices for efficient training.
  • Functional forms of all speedup methods that are easy to integrate into your existing training loop.
  • Strong and reproducible training baselines to get you started as quickly as possible.

Visualize your images and segmentation masks while training with CometML ImageVisualizer callback

Visualize your images and segmentation masks as you train with CometML and Composer using our ImageVisualizer callback with CometML. Check out our latest MosaicML + Comet blog to learn more about monitoring and logging your training workloads on the MosaicML Cloud with Comet.
from composer.trainer import Trainer

trainer = Trainer(...,
    callbacks=[ImageVisualizer()],
    loggers=[CometMLLogger()]
)

Train the MosaicML DeepLabV3 that yields a 5x faster time-to-train compared to a strong baseline. See our latest segmentation blog for more details and recipes, and start training today with our example starter code for segmentation.

Save and load your checkpoints using Oracle Cloud Infrastructure (OCI) or Google Cloud Storage (GCS)

With this latest Composer release, you can save or load your checkpoints using OCI or GCS. We added direct support for Oracle Cloud Infrastructure (OCI) as an ObjectStore and support for Google Cloud Storage (GCS) via URI.

from composer.trainer import Trainer

# Checkpoint saving to Google Cloud Storage.
trainer = Trainer(
    model=model,
    save_folder="gs://my-bucket/{run_name}/checkpoints",
    run_name='my-run',
    save_interval="1ep",
    save_filename="ep{epoch}.pt",
    save_num_checkpoints_to_keep=0,  # delete all checkpoints locally
    ...
)

trainer.fit()

Track your training metrics with MLflow logging

We've added support for using MLflow to log experiment metrics and hyperparameters. Please see our Logging documentation for additional details.

from composer.loggers import MLFlowLogger
from composer.trainer import Trainer

mlflow_logger = MLFlowLogger(experiment_name=mlflow_exp_name,
                             run_name=mlflow_run_name,
                             tracking_uri=mlflow_uri)
trainer = Trainer(..., loggers=[mlflow_logger])

Simplified console and progress bar logging

To turn off the progress bar, set composer.Trainer(progress_bar=False). To turn on logging directly to the console, set composer.Trainer(log_to_console=True). To control the frequency of logging to console, set the Trainer argument console_log_interval (e.g. to 1ep or 1ba).

Support for Mid-Epoch Resumption and more with the latest MosaicML Streaming Library v0.2 release

We've added support in Composer for the latest v0.2 release of our Streaming library. This includes awesome new features like instant mid-epoch resumption and deterministic shuffling, regardless of the number of nodes.

MosaicML Streaming is a PyTorch-compatible dataset that enables users to stream training data from cloud-based object stores. MosaicML Streaming can read files from local disk or from cloud-based object stores.

Key Benefits of the MosaicML Streaming library include:

  • High performance, accurate streaming of training data from cloud storage
  • Efficiently train anywhere, independent of training data location
  • Cloud-native, no persistent storage required
  • Enhanced data security—data exists ephemerally on training cluster

See the MosaicML Streaming release notes for more about the latest updates, and stay tuned for a blog coming up detailing more about the MosaicML Streaming Library!

Load a 🤗 HuggingFace Model and Tokenizer out of a Composer checkpoint

The model pre-training to fine-tuning workflow in Composer with HuggingFace models is now easier than ever! We've added a new utility to load a HuggingFace model and tokenizer out of a Composer checkpoint.

Getting started is easy: check out our example notebook for a full tutorial on pre-training and fine-tune a HuggingFace transformer using the Composer library, read the docs for more details.

Track Optimizer Specific Metrics with OptimizerMonitor (previously GradMonitor)

We’ve renamed our GradMonitor callback to OptimizerMonitor, and added the ability to track optimizer-specific metrics. OptimizerMonitor computes and logs the L2 norm of gradients after the reduction of gradients across GPUs.

Check out the docs for more details, and add to your code just like any other callback!

New PyTorch and CUDA versions

We've expanded our library of Docker images with support for PyTorch 1.13 + CUDA 11.7:
  • mosaicml/pytorch:1.13.0_cu117-python3.10-ubuntu20.04
  • mosaicml/pytorch:1.13.0_cpu-python3.10-ubuntu20.04
The mosaicml/pytorch:latestmosaicml/pytorch:cpu_latest and mosaicml/composer:0.12.0 tags are now built from PyTorch 1.13 based images. Please see our DockerHub repository for additional details.

New community speedup algorithm: Gyro Dropout

Special thank you to Junyeol Lee and Gihyun Park at BDSL in Hanyang University for the Composer implementation of Gyro Dropout and the accompanying documentation.

Gyro Dropout is a variant of dropout that improves the efficiency of training neural networks. Instead of randomly dropping out neurons in every training iteration, gyro dropout pre-selects and trains a fixed number of subnetworks. Gyro Dropout replaces implementations of torch.nn.Dropout. You can read more about Gyro Dropout in the Composer method card.

We’re always looking for community contributions for speedup methods for Composer! Whether you’re looking to contribute your own speed-up method from your published research or need to do a final class project on efficient machine learning and want ideas, reach out to community@mosaicml.com to get started.

Learn more!

Thanks for reading! If you'd like to learn more about Composer and to be part of the community you are welcomed to download Composer and try it out for your training tasks. As you try it out, come be a part of our community by engaging with us on Twitter, joining our Slack channel, or just giving us a star on Github.

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.