Nvidia reveals Blackwell B200 GPU, the ‘world’s most powerful chip’ for AI

3 months ago 6

Nvidia’s must-have H100 AI spot made it a multitrillion-dollar company, 1 that whitethorn beryllium worthy more than Alphabet and Amazon, and competitors person been fighting to drawback up. But possibly Nvidia is astir to widen its pb — with the caller Blackwell B200 GPU and GB200 “superchip.”

Nvidia says the caller B200 GPU offers up to 20 petaflops of FP4 horsepower from its 208 cardinal transistors, and that a GB200 that combines 2 of those GPUs with a azygous Grace CPU tin connection 30 times the show for LLM inference workloads portion besides perchance being substantially much efficient. It “reduces outgo and vigor depletion by up to 25x” implicit an H100, says Nvidia.

On a GPT-3 LLM benchmark with 175 cardinal parameters, Nvidia says the GB200 has a somewhat much humble 7 times the show of an H100, and Nvidia says it offers 4x the grooming speed.

Nvidia told journalists 1 of the cardinal differences is simply a second-gen transformer motor that doubles the compute, bandwidth, and exemplary size by utilizing 4 bits for each neuron alternatively of 8 (thus the 20 petaflops of FP4 I notation earlier.) A 2nd cardinal quality lone comes erstwhile you nexus up immense numbers of these GPUs successful a server: a next-gen NVLink networking solution that lets 576 GPUs speech to each other, with 1.8 terabytes per 2nd of bidirectional bandwidth.

Previously, Nvidia says, a clump of conscionable 16 GPUs would walk 60 percent of their clip communicating with 1 different and lone 40 percent really computing.

Nvidia is counting connected companies buying ample quantities of these GPUs, of course, and is packaging them successful larger supercomputer-ready designs, similar the GB200 NVL72 which plugs 36 CPUs and 72 GPUs into a azygous liquid-cooled rack for a full of 720 petaflops of AI grooming show oregon 1,440 petaflops (aka 1.4 exaflops) of inference. Each tray successful the rack contains either 2 GB200 chips, oregon 2 NVLink switches, with 18 of the erstwhile and 9 of the second per rack. In total, Nvidia says 1 of these racks tin enactment a 27-trillion parameter model. GPT-4 is rumored to beryllium astir a 1.7-trillion parameter model.

The institution says Amazon, Google, Microsoft, and Oracle are each already readying to connection the NVL72 racks successful their unreality work offerings, though it’s not wide however galore they’re buying.

And of course, Nvidia is blessed to connection companies the remainder of the solution, too. Here’s the DGX Superpod for DGX GB200, which combines 8 systems successful 1 for a full of 288 CPUs, 576 GPUs, 240TB of memory, and 11.5 exaflops of FP4 computing.

Nvidia says its systems tin standard to tens of thousands of the GB200 superchips, connected unneurotic with 800Gbps networking with its caller Quantum-X800 Infiniband (for up to 144 connections) oregon Spectrum-X800 Ethernet (for up to 64 connections).

Read Entire Article