Researchers Publish Algorithmically Efficient Deep Learning Survey

Researchers Publish Algorithmically Efficient Deep Learning Survey

Researchers from Lawrence Livermore National Laboratory and MosaicML have published a survey of over 200 papers on algorithmically efficient deep learning. The survey includes a taxonomy of methods to accelerate training as well as a practitioner’s guide to alleviating training bottlenecks.

The team began by observing that efficiency measures for deep learning all have confounding factors, such as hardware, that make it difficult to compare results from different research papers. With this in mind, they developed a definition of algorithmic acceleration: changing the training recipe to reduce total training time while maintaining comparable model quality. Given this definition, they categorized acceleration strategies along three axes: Componentsor where to make changes; Shares, or what changes to make; and mechanisms, or when and how to make the changes. After categorizing the existing literature on acceleration, the team produced their Practitioner’s Guide to Training Acceleration. According to the researchers:

Our main contributions are an organization of the literature on algorithmic efficiency…and a technical characterization of the practical issues affecting the ratio and realization of speedups…With these contributions, we hope to improve the research and application of the efficiency, an essential element of the computationally efficient deep learning needed to overcome the economic, environmental, and inclusion barriers facing existing research.

Deep learning models have achieved impressive results and are often capable of superhuman performance across many benchmarks. However, this came at the cost of increased model size as well as increased training time and cost, with models such as GPT-3 having cost nearly $2 million to train. . Besides the financial cost, many people worry about the energy used to train and deploy the models. The easiest way to reduce these loads is to reduce the time spent training a model. Most of the research paper is devoted to summarizing techniques to reduce training time while maintaining high model quality, i.e. its accuracy on a benchmark or test data set.

Taxonomy of Deep Learning Acceleration

Taxonomy of Deep Learning Speedup (source: https://arxiv.org/abs/2210.06640)

These techniques are first categorized by component, which are function (e.g., model parameters), data (e.g., training dataset), and optimization (e.g., training goal). training). Then they are categorized by the actions that can be taken on the training components. The actions are the “5 Rs” and the targeted reduction of either the time of a training iteration, or the number of iterations, or both:

  • Delete: Delete component elements to reduce iteration time
  • Restrict: reduce the space of possible values ​​to reduce the iteration time
  • Reorder: shift when elements are introduced to reduce both iteration time and number of iterations
  • Replace: replace one element with another to reduce both the teration time and the number of iterations
  • Retrofit: opposed to shrinkage to reduce the number of iterations

The document ends with a set of practical guidelines for reducing training time. The authors identify a set of hardware-level bottlenecks, such as GPU memory or storage capacity, along with tips for mitigating those bottlenecks. For example, to alleviate the bottleneck of GPU calculation, one method consists in reducing the size of the tensors on which one operates. They point out that loading data is a frequent bottleneck, but “by no means the only one”.

Co-author Davis Blalock, research scientist at MosaicML, posted a summary of the work on Twitter, where he noted that “just working out for less time” is a very powerful strategy. He also recommended:

Pay attention to data loader bottlenecks. If you’re training an image classifier and you’re not sure if your training speed is limited by the dataloader, it is. This not only wastes computation, but also artificially penalizes fast models. For example, your method might not seem slower than a baseline, but that’s just because your data loader is masking the slowdown.

MosaicML recently entered the MLPerf competition, where they “achieved industry-leading NLP performance” in the Open Division with a 2.7x speedup when training a BERT model compared to the base recipe. In early 2022, InfoQ covered the previous set of MLPerf results from December 2021.


#Researchers #Publish #Algorithmically #Efficient #Deep #Learning #Survey

Leave a Comment

Your email address will not be published. Required fields are marked *