GPUs are crucial in delivering the computational power required for deploying AI models for large-scale pretrained models in various machine learning domains like computer vision, natural language processing, and multimodal learning. Currently, AI practitioners now have a minimal choice in the matter of choosing high-performance GPU inference solutions due to their platform-specific nature. A machine learning system created for one company’s GPU must be entirely reimplemented to run on hardware from a different technology vendor. Because of hardware dependencies in complicated runtime environments, it is challenging to maintain the code that makes up these solutions.
Additionally, AI production pipelines frequently need rapid development. Although proprietary software toolkits like TensorRT offer customization options, they frequently fail to meet this demand. Further reducing development agility, the proprietary solution may make it more difficult to debug the code swiftly.
Meta AI has created AITemplate (AIT), a unified open-source inference solution with distinct acceleration back ends for AMD and NVIDIA GPU technology, to address these industry difficulties. On a range of popular AI models, including convolutional neural networks, transformers, and diffusers, it provides performance almost identical to that of hardware-native Tensor Core (NVIDIA GPU) and Matrix Core (AMD GPU) architectures. The team improved performance by up to 12x on NVIDIA GPUs when utilizing AIT and 4x on AMD GPUs when using PyTorch’s eager mode. Currently, AITemplate is enabled on NVIDIA’s A100 and AMD’s MI200 GPU systems, which are both commonly used in data centers of technology businesses, research facilities, and cloud computing service providers.
AITemplate is a Python system that converts AI models into high-performance C++ GPU template code to speed up inference. A front-end layer that performs various graph transformations to optimize the graph and a back-end layer that produces C++ kernel templates for the GPU target make up the system. The vision behind the framework is to support high speed while maintaining simplicity.
The project includes several performance advances, such as enhanced kernel fusion, an optimization technique that unifies several kernels into one kernel to operate them more effectively, and advanced transformer block optimizations. These improvements dramatically increase the use of AMD’s Matrix Cores and NVIDIA’s Tensor Cores, resulting in cutting-edge performance. Additionally, AIT keeps its reliance on external libraries to a minimum.
Thanks to its support for three advanced optimizations—vertical, horizontal, and memory fusions—AITemplate boasts one of the business’s most sophisticated kernel fusion systems. Moreover, being easy to deploy makes AITemplate a viable solution. An independent, self-contained binary containing the AI model is created. This binary has good backward compatibility because it can operate in any environment with the same hardware and more recent CUDA 11 / ROCM 5 versions. Additionally, AITemplate offers commonly used pre-built models (e.g., VisionTransformer, BERT, Stable Diffusion, ResNet, and MaskRCNN). This streamlines the deployment procedure and makes it simple for professionals to deploy PyTorch pretrained models.
The Python Jinja2 template and the GPU Tensor Core/Matrix Core C++ template are the two layers of template systems that make up the AITemplate. After profiling in Python, the system converts the Jinja2 template into C++ code to determine the optimum kernel setup. The model’s final binary code is created by compiling the generated source code using the GPU C++ compiler. Users can convert their models from a variety of frameworks, including PyTorch, to AITemplate because of its front-end design, which is similar to PyTorch.
In addition to increasing the number of platforms available for AI, Meta AI hopes to develop techniques that can also help solve environmental concerns by lowering carbon emissions. According to studies, the use of GPUs can influence carbon emissions. AITemplate speeds up GPU execution, which can minimize emissions even further. To summarize, AITemplate provides cutting-edge performance for present-generation and upcoming AMD and NVIDIA GPUs with minimal system complexity. Nevertheless, according to the researchers, they are merely at the start of developing a high-performance AI inference engine.
They are actively trying to improve AITemplate with new optimizations and complete support for dynamic shapes. Their long-term goals include expanding AITemplate to more hardware platforms from different technology vendors. Meta aims to create an ecosystem for AI inference that is greener and more effective, with more remarkable performance, flexibility, and back-end options and developing AITemplate is a stepping stone in that direction.
Please Don't Forget To Join Our ML Subreddit
Social Plugin