Latest Machine Learning Research Offers FP8 Binary Swap Format: A Natural Progression to Accelerate Deep Learning Inference

Latest Machine Learning Research Offers FP8 Binary Swap Format: A Natural Progression to Accelerate Deep Learning Inference

To meet the growing computational needs of neural networks, AI processing requires comprehensive innovation across hardware and software platforms. Using lower precision digital formats to increase computational efficiency, reduce memory usage, and optimize interconnect bandwidth is a crucial area for improving efficiency.

The researchers believe that having a standard interchange format will promote the rapid development and interoperability of platforms for software and hardware to advance computers. So, recently, the industry has moved from 32-bit precision to 16-bit, and now even 8-bit, precision formats to take advantage of these advantages. 8-bit floating point precision is particularly advantageous for transformer networks, one of the most important developments in AI. In this context, NVIDIA, Intel and Arm jointly published an article defining an 8-bit floating point (FP8) specification. It introduces a standard format that aims to push the development of AI through optimization of memory usage and works in both the training and inference stages of AI. This FP8 specification was presented in two variants, E5M2 and E4M3.

The names of the two encodings, E4M3 and E5M2, specify the number of bits of the exponent (E) and mantissa (M) according to the IEEE 754 standard. It is recommended to use E4M3 for weight and d tensors. activation and E5M2 for gradient tensors when using FP8 encodings. Although some networks only require the E4M3 or E5M2 type to be trained, some networks require both types (or need to maintain far fewer tensors in FP8). Inference and the forward pass of training are performed using a variation of E4M3, while gradients are performed in the backward pass using a variation of E5M2. The FP8 format was created using the guiding idea of ​​adhering to the IEEE-754 standards and only deviating where it would significantly improve the accuracy of DL applications. The E5M2 structure is therefore IEEE half-precision with fewer mantissa bits and follows the IEEE 754 rules for exponent and particular values. This makes it easy to convert between IEEE FP16 and E5M2 formats. The dynamic range of E4M3 is increased by recovering most bit patterns used for particular values ​​instead of allowing multiple encodings for the specific values ​​in this case.

To verify the effectiveness of the proposal formulated in this article, the authors conducted an experimental study concerning the training and inference phases to compare the results obtained with baselines trained either in FP16 or in bfloat16. On the vision and linguistic translation models, the results of the FP8 training correspond to those of the 16-bit training sessions.

This document introduced a new binary interchange format FP8, E4M3 and E5M2. The authors ensure that software implementations can continue to rely on IEEE FP features such as the ability to compare and sort values ​​using integer operations with little deviation from IEEE-FP standards. 754 for the binary encoding of floating point values. The experimental study shows that using the same model, optimizer, and training hyperparameters, a wide range of neural network models for image and language tasks can be trained in FP8 with equal model accuracy. achieved with 16-bit training sessions. By using the same data types for training and inference, FP8 not only accelerates and minimizes the resources needed for training, but also simplifies the deployment of 8-bit inference.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'FP8 FORMATS FOR DEEP LEARNING'. All Credit For This Research Goes To Researchers on This Project. Check out the paper.

Please Don't Forget To Join Our ML Subreddit


Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical sciences and a master’s degree in
telecommunications systems and networks. His current areas of
research focuses on computer vision, stock market prediction and
learning. He has produced several scientific articles on the re-
identification and study of the robustness and stability of
networks.


#Latest #Machine #Learning #Research #Offers #FP8 #Binary #Swap #Format #Natural #Progression #Accelerate #Deep #Learning #Inference

Leave a Comment

Your email address will not be published. Required fields are marked *