AI Networking & Interconnect Specifications
Technical reference for GPU-to-GPU, node-to-node, and cluster-scale networking in AI infrastructure. Covers NVLink 1.0–5.0, NVSwitch generations, InfiniBand SDR through XDR, RoCE, PCIe 3.0–6.0, cluster topologies, and DGX SuperPOD configurations.
What Is NVLink Bandwidth? Generation-by-Generation Comparison
| Generation | BW per Link (Bidir) | Links/GPU | Total GPU BW | First Available | GPUs Supported |
|---|---|---|---|---|---|
| NVLink 1.0 | 40 GB/s | 4 | 160 GB/s | 2016 | P100 |
| NVLink 2.0 | 50 GB/s | 6 | 300 GB/s | 2017 | V100 |
| NVLink 3.0 | 50 GB/s | 12 | 600 GB/s | 2020 | A100 |
| NVLink 4.0 | 50 GB/s | 18 | 900 GB/s | 2022 | H100, H200 |
| NVLink 5.0 Latest | 100 GB/s | 18 | 1,800 GB/s | 2024 | B100, B200, GB200 |
NVSwitch Generations
| Generation | Ports | Per-Port BW | Total Switch BW | GPU Topology |
|---|---|---|---|---|
| NVSwitch 1.0 | 18 | 50 GB/s | 900 GB/s | DGX-2 (16× V100) |
| NVSwitch 2.0 | 36 | 50 GB/s | 1.8 TB/s | DGX A100 (8× A100) |
| NVSwitch 3.0 | 64 | 50 GB/s | 3.2 TB/s | DGX H100 (8× H100) |
| NVSwitch 4.0 | 64 | 100 GB/s | 6.4 TB/s | DGX B200, GB200 NVL72 |
InfiniBand: What Speed Is Used for H100 and B200 Clusters?
H100/H200 clusters use NDR 400G InfiniBand (96 GB/s effective bandwidth, NVIDIA ConnectX-7). B200/GB200 clusters use XDR 800G InfiniBand (192 GB/s effective bandwidth, ConnectX-8). Older A100 clusters use HDR 200G.
| Standard | Per-Port Rate | 4× Typical | Effective BW (Bidir) | Year | Used For |
|---|---|---|---|---|---|
| HDR | 50 Gb/s | 200 Gb/s | 48 GB/s | 2018 | A100 clusters |
| NDR | 100 Gb/s | 400 Gb/s | 96 GB/s | 2022 | H100/H200 clusters |
| XDR | 200 Gb/s | 800 Gb/s | 192 GB/s | 2025 | B200/GB200 clusters |
NVIDIA InfiniBand NICs and Switches
| Product | Generation | Bandwidth | Use Case |
|---|---|---|---|
| ConnectX-6 | HDR 200G | 200–400 Gb/s | A100 clusters |
| ConnectX-7 | NDR 400G | 400–800 Gb/s | H100/H200 clusters |
| ConnectX-8 | XDR 800G | 800–1600 Gb/s | B200/GB200 clusters |
| Quantum-2 (QM9700) | NDR | 51.2 Tb/s switch | Spine/leaf switches |
| Quantum-3 (QM9790) | XDR | 102.4 Tb/s switch | Next-gen fabric |
PCIe Bandwidth by Generation
| Generation | x16 Bandwidth | Per Lane | Available | GPU Examples |
|---|---|---|---|---|
| PCIe 3.0 | 16 GB/s | 1 GB/s | 2010 | P100, V100 |
| PCIe 4.0 | 32 GB/s | 2 GB/s | 2017 | A100 |
| PCIe 5.0 | 64 GB/s | 4 GB/s | 2021 | H100 PCIe, L40S |
| PCIe 6.0 | 128 GB/s | 8 GB/s | 2024 | B200 |
PCIe bandwidth is negligible compared to NVLink for multi-GPU communication: PCIe 5.0 provides 64 GB/s vs NVLink 4.0's 900 GB/s — a 14× difference. Always use NVLink topologies for multi-GPU workloads.
AI Cluster Networking Topologies
| Cluster Size | Intra-Node | Inter-Node Fabric | Topology | Total Bisection BW |
|---|---|---|---|---|
| 8 GPUs (1 node) | NVLink + NVSwitch | N/A | Fully connected | 7.2 TB/s (H100) |
| 64 GPUs (8 nodes) | NVLink + NVSwitch | 400G IB NDR | Fat-tree | ~25.6 TB/s |
| 256 GPUs (32 nodes) | NVLink + NVSwitch | 400G IB NDR | 2-tier fat-tree | ~51.2 TB/s |
| 1,024 GPUs (128 nodes) | NVLink + NVSwitch | 400G IB NDR | 3-tier fat-tree | ~204.8 TB/s |
| 4,096 GPUs (512 nodes) | NVLink + NVSwitch | 800G IB XDR | 3-tier fat-tree | ~819 TB/s |
DGX SuperPOD Configurations
| Configuration | GPUs | Nodes | Network Fabric | Compute |
|---|---|---|---|---|
| DGX H100 SuperPOD | 256 | 32 | NDR 400G IB | ~256 PFLOPS FP8 |
| DGX B200 SuperPOD | 576 | 72 (NVL72) | XDR 800G IB | ~1.4 EFLOPS FP4 |
| DGX GB200 NVL72 | 72 B200 + 36 Grace | 36 | NVLink 5.0 + XDR IB | ~720 PFLOPS FP4 |
Frequently Asked Questions
What is NVLink 5.0 bandwidth compared to NVLink 4.0?
NVLink 5.0 (B200/GB200) delivers 1,800 GB/s total bidirectional bandwidth per GPU — double NVLink 4.0's 900 GB/s (H100/H200). This is achieved by doubling per-link bandwidth from 50 to 100 GB/s while maintaining 18 links per GPU. NVSwitch 4.0 provides 6.4 TB/s total switch bandwidth vs NVSwitch 3.0's 3.2 TB/s.
What InfiniBand should I use for an H100 cluster?
H100/H200 clusters should use NDR InfiniBand 400G (NVIDIA ConnectX-7 NICs, Quantum-2 switches). NDR provides 96 GB/s effective bidirectional bandwidth per node and is the standard for H100-era clusters. A 256-GPU DGX H100 SuperPOD with NDR provides ~51.2 TB/s total bisection bandwidth.
When should I use InfiniBand vs RoCE for AI clusters?
Use InfiniBand for large-scale training clusters (64+ GPUs) where performance is critical — it provides native RDMA, deterministic latency, and better congestion control. Use RoCE (RDMA over Converged Ethernet) for cost-sensitive deployments or when integrating with existing Ethernet infrastructure. RoCE at 400 GbE is now competitive for H100-era clusters. Intel Gaudi 3 uses RoCE natively via 24× 200GbE ports.
What is the bandwidth hierarchy in an AI cluster?
AI cluster bandwidth hierarchy (fastest to slowest): HBM3e memory (H200): 4.8 TB/s → NVLink 4.0: 900 GB/s (19% of HBM) → PCIe 5.0 x16: 64 GB/s (1.3% of HBM) → NDR InfiniBand: ~50 GB/s (1% of HBM) → 100 GbE: 12.5 GB/s (0.26% of HBM). This hierarchy explains why model parallelism strategies must match communication patterns to available bandwidth.