# Blackwell Architecture



# The Challenge: Increasing Visual Quality at the End of Moore's Law





## **The Solution:** Neural Rendering



| 16x |      |
|-----|------|
| 14x |      |
| 12x |      |
| 10x |      |
| 8x  |      |
| 6x  |      |
| 4x  |      |
| 2x  |      |
| Ox  | 2016 |





## **RTX Blackwell Design Goals**

 Optimize for new neural workloads Reduce memory footprint New quality of service capabilities Energy efficiency



# **NVIDIA GeForce Blackwell Neural Rendering Architecture**



Displayport 2.1 UHBR20 | PCIe Gen 5 | 4X NVDEC, 4X NVENC with 4:2:2

5<sup>th</sup> GEN TENSOR CORES 4,000 AI TOPS | High Speed FP4

4<sup>th</sup> GEN RT CORES 360 RT TFLOPS | Built for Mega Geometry

AI MANAGEMENT PROCESSOR Simultaneous AI Models + Graphics

BLACKWELL SM 125 TFLOPS | Built for Neural Shaders

BLACKWELL MAXQ 2X Power Efficiency

G7 MEMORY 30 Gbps | World's Fastest



## **Blackwell SM: Built for Neural Shaders**



## Shaders



Ada SM

## Neural Shaders



Blackwell SM



## Neural Shaders



# Blackwell SM Improves SER by 2X



BLACKWELL SHADER EXECUTION REORDERING









GDDR7: Pam3 Higher Frequency, Lower Voltage

# **GDDR7: The New Graphics DRAM Standard**

2X Data Rate of G6

Energy Efficiency reflects the average graphics application with 30% DRAM utilization



2X Efficiency



**©** NVIDIA.

# Blackwell 4<sup>th</sup> Generation RT Core—Built for Mega Geometry



Ada 3<sup>rd</sup> Generation RT Core



Triangle Cluster Intersection Engine



Linear Swept Spheres



Triangle Cluster Decompression Engine

Blackwell 4<sup>th</sup> Generation RT Core



**NVIDIA**.

# Blackwell 4<sup>th</sup> Generation RT Core—Built for Mega Geometry



8x Ray Triangle Intersection Rate



## 75% Memory Footprint



**NVIDIA**.

# Blackwell 5<sup>th</sup> Generation Tensor Cores with FP4



Pascal



Turing Tensor Core FP16 Ada Tensor Core FP8 Blackwell Tensor Core FP4









# Simultaneous Al and Graphics Workloads

Response

Frame Frame Frame Frame Render Gen Gen Gen





## **Al Management Processor**

## Response

Frame Frame Frame Frame Frame Gen Render Gen Gen Gen

## **Evenly Paced Frames**



## Blackwell is Designed for Max-Q





Power Gating



Sleep



**Clock Gating** 

## **Advanced Power Gating**



## **Power Gating**



Rail Gating





# Ada Blackwell

Power

## **Power Savings Advancements**



Time

## 50% Power Savings

\_\_\_\_\_





## **GPU Clock**

# **Accelerated Frequency Switching**

• 1000x Faster Clock Responsiveness

 Higher SM efficiency through rapid clock adjustments in dynamic workloads



## Blackwell Display and Video



Ada Display Engine

DisplayPort 1.4a HBR3 8.1 Gbps

Ada

8<sup>th</sup> Generation Encoder 5<sup>th</sup> Generation Decoder

| AV1  | H.264 Decode          |
|------|-----------------------|
| HEVC | 420<br>Encode / Decod |

## Blackwell Display Engine









## Blackwell Optimized for Multi Frame Gen



Enhanced Tensor Core Throughput Enhanced Flip Metering

## Al-Management Processor

