NVIDIA Announces TensorRT 8 for Smarter and More Responsive Conversational AI

Author: James Gane
NVIDIA Announces TensorRT 8 for Smarter and More Responsive Conversational AI
NVIDIA recently launched TensorRT 8, the eighth generation of the company’s AI software, slashing inference time in half for language queries, enhancing search engines, ad recommendations, and chatbots, and offering them from the cloud to the edge.
TensorRT 8 has brought substantial optimisations to compilers for transformer-based networks used in natural language applications, offering twice the performance over TensorRT 7 on BERT-Large with just 1.2ms inference latency. This allows companies to drastically increase their model size up to three-fold, improving accuracy without impacting the processing speeds.

In addition to transformer optimisations, TensorRT 8’s breakthroughs in AI inference are made possible through two other important features: Sparsity, a new performance technique for NVIDIA's Ampere based GPUs that increases efficiency, allowing developers to accelerate their neural networks by reducing computational operations, and Quantization Aware Training (QAT) which lets developers use trained models to run inference in INT8 precision without losing accuracy, significantly reducing the compute and storage overheads for inference when using Tensor cores.
 

About TensorRT

TensorRT is an SDK for high-performance deep learning inference, which aims to help developers focus on creating novel AI-powered applications instead of performance tuning for deployment. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications in data centres, as well as embedded and automotive environments. TensorRT-based applications can perform inference up to 40x faster than CPU-only platforms, providing INT8 and FP16 optimisations for production deployments of deep learning inference applications such as video streaming, speech recognition, recommendation, fraud detection, and natural language processing. Reduced precision inference significantly reduces application latency, which is a priority for real-time services, as well as autonomous and embedded applications

TensorRT 8 is now generally available and free of charge to members of the NVIDIA Developer program.

Our Edge AI capabilities

Developing an industrial AI computing solution can be a difficult, costly and time-consuming experience. Impulse can help you design and create a reliable, repeatable and robust system helping reduce your costs and development time. With our team of in-house engineers and specialists, all with decades of experience, we can offer you fully deployable embedded Edge AI computing solutions straight out of the box.

For more information about Edge AI and our capabilities please click here.
Get in touch
Our technical sales team are ready to answer your questions.
T: +44 (0)1782 337 800 • E: sales@impulse-embedded.co.uk
+44(0)1782 337 800
WE'RE HIRING!If you are looking for a new and exciting challenge please take a look at our recruitment page
Andrew .
Very helpful in sourcing the kit required and arrived when it was required.
...
Sep 2021
+44(0)1782 337 800
MediaNewsNVIDIA Announces TensorRT 8 for Smarter and More R...