Jetson AGX Orin for the Next Generation of AI Systems

News15 Nov 2021Edge AI
Jetson AGX Orin for the Next Generation of AI Systems
Nvidia just announced the Jetson AGX Orin, the successor to their popular Jetson AGX Xavier. Set to be released in Q1 2022, this new module is described by Nvidia as an "energy-efficient AI supercomputer", which they say is six times more powerful than its predecessor and is intended to support large, complex AI models for natural language understanding, 3D perception, and multi-sensor fusion.
Built on Nvidia’s Ampere architecture, their follow up to the Turing architecture, the Jetson AGX Orin is geared towards industries like manufacturing, logistics, retail, and healthcare, by providing a suite of new tools and software designed specifically for both vertical and horizontal markets within these sectors. For instance, Nvidia Isaac Sim is a photorealistic virtual environment where developers of AI-based robots can test, manipulate, and train their products effectively. Nvidia Clara for healthcare is an application framework containing full-stack libraries and SDKs for AI-powered imaging, genomics, and the development and deployment of smart sensors. Nvidia Drive, as the name suggests, is designed for those in the autonomous driving industry, providing a family of software products and SDKs for “robust self-driving and intelligent cockpit capabilities”.

Bringing a remarkable 200 TOPS to the table, the Jetson AGX Orin is capable of 200 trillion operations per second (according to Nvidia), which is realistic given the doubling of CUDA cores over the Xavier, topping out at 2048 across its 16 multiprocessors (SM). Combining this increase in performance with next-gen deep learning and vision accelerators provided by the Ampere architecture (and the new Arm Cortex-A78AE CPU), high-speed interfaces, faster memory bandwidth, and multimodal sensor support, the Orin even has the ability to run multiple AI application pipelines concurrently.

New architecture, new CPU

Straight off the bat, the new ARM Cortex-A78AE chips used in the Jetson AGX Orin are 70% faster than the Carmel CPUs found in the Xavier and, even with twelve cores over the Xavier’s eight, the Orin consumes just 15~50 Watts and is still able to fit in the palm of your hand. Delving into the technical aspects of this new CPU, each of the twelve cores has 64KB instruction cache, 64KB of data cache and 256KB of L2 cache, whereas the Carmel CPU featured in the Xavier offers just 2MB of L2 cache shared by all eight cores. L2 cache serves to speed up the accessing of data by providing the processor all the necessary stored information without interruptions, especially data which has already been accessed so it doesn’t have to be loaded again, meaning the Orin gets another performance boost where it counts.

Addressing the problem of Sparsity

The term “sparsity” in the context of AI Inference and machine learning describes the denseness of matrices of numbers contained within an AI algorithm. Matrices containing mostly zero values are labelled as sparse, and those containing mostly non-zero values are called dense. Treating a sparse matrix in the same way as a dense matrix is computationally expensive, as most of the values being queried don’t directly impact the calculation at all.

A focus for GPU manufacturers has long been to try and remove these unneeded values in order to lessen overhead and speed up calculations, but the more unwanted values are pulled out, the more the integrity of the machine learning program can be compromised. The time saved by removing these values in the past has often been annulled by the time GPU manufacturers then have to spend fixing the accuracy of the now creaking calculation. What we’ve ended up with is a compromise — 95% of the weights (the learnable parameters of machine learning models) having been removed, leaving a somewhat stable computation procedure.

Turning to the Ampere architecture, Nvidia has apparently solved this problem by adding support for fine-grained structured sparsity to its 3rd generation Tensor Cores. Fine-grained structured sparsity operates a 2:4 pattern, with two values in every four being zero, and using a tiny amount of metadata stored within the matrix itself, the Nvidia Ampere GPU can decide to skip the computation of these zero values within each weight to achieve a throughput twice that of a regular Tensor Core.

The upshot of using fine-grained structured sparsity is even and efficient load balancing, regular memory accesses, and twice the calculation efficiency over regular Tensor Cores with no loss in network accuracy, but it has taken until now for GPUs to be able to use the structure effectively.

Jetson AGX Orin technical specifications

  
Memory32GB of 256-bit LPDDR5 3200MHz, plus 64GB eMMC
CSI CameraUp to 6 cameras (16 via virtual channels*), 16 lanes MIPI CSI-2, D-PHY 1.2 (up to 40Gbps) | C-PHY 1.1 (up to 164Gbps)
Video Encode2x 4K60 | 4x 4K30 | 8x 1080p60 | 16x 1080p30 (H.265)
Video Decode1x 8K30 | 3x 4K60 | 6x 4K30 | 12x 1080p60 | 24x 1080p30 (H.265)
UPHY2 x8 (or 1x8 + 2x4), 1 x4, 2 x1 (PCIe Gen4, Root Port & Endpoint), 3x USB 3.2, Single-lane UFS
Networking1x GbE, 4x 10GbE
Display1x 8K60 multi-mode DP 1.4a (+MST)/eDP 1.4a/HDMI 2.1
Other I/O4x USB2.0, 4x UART, 3x SPI, 4x I2S, 8x I2C, 2x CAN, DMIC & DSPK, GPIO
Get in touch
Our technical sales team are ready to answer your questions.
T: +44 (0)1782 337 800 • E: sales@impulse-embedded.co.uk
+44(0)1782 337 800
INTEL 10TH GEN. CORE VELA PCS NOW IN STOCKVIEW RACK PCS VIEW TOWER PCS
Simon P.
The brief we gave to Impulse was technically challenging and made harder by the timeframe which we set them to deliver a solution. The end result exceeded our expectations in terms of the solution's specification, performance and delivery date for the Albeego Pinion.
...
Sep 2021
+44(0)1782 337 800
MediaNewsJetson AGX Orin for the Next Generation of AI Syst...