Intel's new Gaudi 3 accelerators massively undercut Nvidia GPUs as AI race heats up -Discussion- Socratic Lab

Intel provided several benchmarks to compare the performance of its Gaudi 3 AI accelerator against Nvidia's H100 GPU. According to Intel, the Gaudi 3 is capable of delivering up to 40% faster training times than the H100 in large 8,192-chip clusters. In a smaller 64-chip setup, the Gaudi 3 is said to offer 15% higher throughput than the H100 on the popular LLaMA 2 language model.

For AI inference tasks, Intel claims that the Gaudi 3 is twice as fast as the H100 on models like LLaMA and Mistral. These benchmarks suggest that the Gaudi 3 can offer competitive performance to the H100 in various AI workloads while potentially undercutting Nvidia's pricing significantly. However, it's important to note that the Gaudi chips don't have optimizations for Nvidia's CUDA platform, which most AI software relies on today. This could pose a challenge for Intel in convincing enterprises to refactor their code for Gaudi.

Intel's Gaudi 3 AI accelerator chips, unveiled at Computex, boast several key features:

Performance: The Gaudi 3 is designed to deliver similar performance to Nvidia's H100 GPU but at a significantly lower cost1. Intel claims that the Gaudi 3 either keeps pace with or outperforms the H100 across a variety of important AI training and inference workloads.
Pricing: The flagship Gaudi 3 accelerator will cost around $15,000 per unit when purchased individually, which is 50% cheaper than Nvidia's competing H100 data center GPU. For the Gaudi 3, an 8-accelerator kit configuration costs $125,000. Intel insists it's two-thirds cheaper than alternative solutions at that high-end performance tier.
Open Standards: The Gaudi chips leverage open standards like Ethernet for easier deployment. This could potentially make them more appealing to enterprises looking for flexibility and freedom from proprietary lock-in.
Partnerships: To drive adoption, Intel says it has lined up at least 10 major server vendors – including new Gaudi 3 partners like Asus, Foxconn, Gigabyte, Inventec, Quanta, and Wistron. Familiar names like Dell, HPE, Lenovo, and Supermicro are also on board.
Benchmarks: Intel's benchmarks show the Gaudi 3 delivering up to 40 percent faster training times than the H100 in large 8,192-chip clusters. Even a smaller 64-chip Gaudi 3 setup offers 15 percent higher throughput than the H100 on the popular LLaMA 2 language model. For AI inference, Intel claims a 2x speed advantage over the H100 on models like LLaMA and Mistral.
Architecture: The Gaudi 3 is architected for efficient large-scale AI compute and is manufactured on a 5-nanometer (nm) process. It is designed to allow activation of all engines in parallel, including the Matrix Multiplication Engine (MME), Tensor Processor Cores (TPCs), and Networking Interface Cards (NICs).
Memory: The Gaudi 3 accelerator features 128 gigabytes (GB) of HBMe2 memory, providing ample memory for processing large GenAI datasets.
Networking: Twenty-four 200 gigabit (Gb) Ethernet ports are integrated into every Intel Gaudi 3 accelerator, providing flexible and open-standard networking. They enable efficient scaling to support large compute clusters and eliminate vendor lock-in from proprietary networking fabrics.
Software: Intel Gaudi software integrates the PyTorch framework and provides optimized Hugging Face community-based models. This allows GenAI developers to operate at a high abstraction level for ease of use and productivity and ease of model porting across hardware types.
Availability: The Intel Gaudi 3 accelerators are expected to be available in Q3 2024.

Intel's new Gaudi 3 accelerators massively undercut Nvidia GPUs as AI race heats up

Can you detail the performance benchmarks provided by Intel for the Gaudi 3 in comparison to Nvidia's H100?

How does the cost of Intel's Gaudi 3 accelerator compare to Nvidia's H100 data center GPU?

What are the key features of Intel's Gaudi 3 AI accelerator chips that were unveiled at Computex?