Meta AI Open Sources AITemplate (AIT), a Python framework that transforms deep neural networks into C++ code to accelerate inference services

Meta AI Open Sources AITemplate (AIT), a Python framework that transforms deep neural networks into C++ code to accelerate inference services

GPUs are essential for providing the computing power needed to deploy AI models for large-scale pre-trained models in various areas of machine learning such as computer vision, natural language processing, and machine learning. multimodal. Currently, AI practitioners now have minimal choice when it comes to choosing high-performance GPU inference solutions due to their platform-specific nature. A machine learning system created for a company’s GPU must be completely reimplemented to run on hardware from a different technology vendor. Due to hardware dependencies in complex runtime environments, it is difficult to maintain the code that makes up these solutions. Additionally, AI production pipelines often need rapid development. Although proprietary software toolkits such as TensorRT provide customization options, they often fall short of this demand. Further reducing development agility, the proprietary solution can make it harder to debug code quickly.

Meta AI created AITemplate (AIT), a unified open-source inference solution with separate acceleration back-ends for AMD and NVIDIA GPU technology, to address these industry challenges. On a range of popular AI models, including convolutional neural networks, transformers, and broadcasters, it delivers nearly identical performance to native Tensor Core (NVIDIA GPU) and Matrix Core (AMD GPU) hardware architectures. The team improved performance by up to 12x on NVIDIA GPUs when using AIT and 4x on AMD GPUs when using PyTorch’s impatient mode. Currently, AITemplate is enabled on NVIDIA’s A100 and AMD’s MI200 GPU systems, both of which are commonly used in technology company data centers, research facilities, and cloud service providers.

AITemplate is a Python system that converts AI models into high-performance C++ GPU model code for faster inference. A front-end layer that performs various graphics transformations to optimize the graphics and a back-end layer that outputs C++ kernel models for the GPU target make up the system. The vision behind the framework is to support high speed while maintaining simplicity. The project includes several performance advancements, such as improved core fusion, an optimization technique that unifies multiple cores into a single core to make them run more efficiently, and advanced transformer block optimizations. These enhancements dramatically increase the use of AMD’s Matrix Cores and NVIDIA’s Tensor Cores, resulting in peak performance. Additionally, AIT minimizes its reliance on external libraries.

With support for three advanced optimizations (vertical, horizontal, and in-memory blending), AITemplate has one of the most sophisticated core blending systems in the business. Moreover, its ease of deployment makes AITemplate a viable solution. An independent, self-contained binary containing the AI ​​model is created. This binary has good backwards compatibility as it can work in any environment with the same hardware and newer versions of CUDA 11 / ROCM 5. Additionally, AITemplate offers commonly used predefined templates (e.g. VisionTransformer, BERT, Stable Diffusion, ResNet and MaskRCNN). This streamlines the deployment procedure and allows professionals to easily deploy pre-trained PyTorch models. The Python Jinja2 model and the Tensor Core/Matrix Core C++ GPU model are the two layers of model systems that make up AITemplate. After profiling in Python, the system converts the Jinja2 model to C++ code to determine the optimal kernel configuration. The final binary code of the model is created by compiling the generated source code using the C++ GPU compiler. Users can convert their templates from various frameworks, including PyTorch, to AITemplate due to its front-end design, similar to PyTorch.

In addition to increasing the number of platforms available for AI, Meta AI hopes to develop techniques that can also help solve environmental problems by reducing carbon emissions. According to studies, the use of GPUs can influence carbon emissions. AITemplate speeds up GPU execution, which can further reduce emissions. In short, AITemplate provides cutting-edge performance for current and upcoming generation AMD and NVIDIA GPUs with minimal system complexity. Nevertheless, according to the researchers, they are only at the beginning of the development of a successful AI inference engine. They are actively trying to improve AITemplate with new optimizations and full support for dynamic shapes. Their long-term goals include extending AITemplate to more hardware platforms from different technology vendors. Meta aims to create a greener and more efficient ecosystem for AI inference, with more outstanding performance, flexibility and back-end options and the development of AITemplate is a stepping stone in this direction.

This Article is written as a research summary article by Marktechpost Staff based on the research article 'Faster, more flexible inference on GPUs using AITemplate, a revolutionary new inference engine'. All Credit For This Research Goes To Researchers on This Project. Check out the code and reference article.

Please Don't Forget To Join Our ML Subreddit

Khushboo Gupta is an intern consultant at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing and web development. She likes to learn more about the technical field by participating in several challenges.

#Meta #Open #Sources #AITemplate #AIT #Python #framework #transforms #deep #neural #networks #code #accelerate #inference #services

Leave a Comment

Your email address will not be published. Required fields are marked *