How much did it cost to train GPT-4?

GPT-4 training is estimated to have cost approximately $63 million in compute, using ~25,000 A100 GPUs for ~50 million GPU-hours. This is an estimate based on reported compute scale and prevailing 2023 A100 pricing — OpenAI has not officially confirmed training costs.

What is the formula for calculating AI training cost?

Training cost formula: GPU_Hours = (6 × Parameters × Tokens) / (GPU_FLOPS × MFU × 3600). Total Cost = GPU_Hours × Hourly_GPU_Rate. Example: 7B model on 1T tokens with H100s (989 TFLOPS, 0.45 MFU) = 26,249 GPU-hours = ~$2.57M at AWS P5 rates ($98.32/hr for 256 H100s).

AI Model Training Cost Estimates: GPT-4, LLaMA 3, DeepSeek TCO

Q: How much did DeepSeek V3 cost to train?

DeepSeek V3 reportedly cost approximately $5.5 million to train, using 2,000 H800 GPUs for ~2.7 million GPU-hours. This compares to GPT-4's estimated $63 million, demonstrating a 10x reduction in training cost through efficiency innovations including mixture-of-experts architecture and optimized training pipelines.

How Much Did It Cost to Train Major AI Models?

DeepSeek V3 ($5.5M) is notably cheaper than GPT-4 (~$63M) — a ~10x reduction demonstrating efficiency gains via mixture-of-experts and optimized training pipelines. LLaMA 3 70B cost ~$7.7M using 16,000 H100s. Gemini Ultra (~$200M) remains the most expensive reported training run.

Model	Org	Year	GPUs	GPU-Hours	Compute Cost (Est.)
GPT-3 175B	OpenAI	2020	10K A100s	~3.1M	~$4.6M
LLaMA 2 70B	Meta	2023	2K A100s	~1.7M	~$2.1M
Mistral 7B	Mistral	2023	512 H100s	~200K	~$240K
Mixtral 8×7B	Mistral	2023	512 H100s	~600K	~$720K
Falcon 180B	TII	2023	4K A100s	~7.5M	~$9M
GPT-4 (est.)	OpenAI	2023	~25K A100s	~50M	~$63M
LLaMA 3 70B	Meta	2024	16K H100s	~6.4M	~$7.7M
LLaMA 3 405B	Meta	2024	16K H100s	~30.8M	~$37M
Claude 3 Opus (est.)	Anthropic	2024	~30K H100s	~60M	~$72M
DeepSeek V3	DeepSeek	2024	2K H800s	~2.7M	~$5.5M
DeepSeek R1	DeepSeek	2025	2K H800s	~2.7M	~$5.5M
Gemini Ultra (est.)	Google	2023	~32K TPUv4	~80M	~$200M

Note: Costs are estimates based on public disclosures and compute estimates. Actual costs vary based on negotiated contracts, energy, and cluster efficiency.

How Do You Calculate AI Training Cost?

Use the Chinchilla scaling law GPU-hour formula to estimate training compute requirements:

Total Cost = GPU_Hours × Hourly_GPU_Rate

GPU_Hours = (6 × Parameters × Tokens) / (GPU_FLOPS × MFU × 3600)

Where:
  Parameters = model parameter count
  Tokens     = training tokens
  GPU_FLOPS  = GPU peak FLOPS (BF16)
  MFU        = Model FLOP Utilization (typically 0.35–0.55)
  3600       = seconds per hour

Example: 7B model on 1T tokens with H100s (989 TFLOPS, MFU=0.45):
  GPU_Hours = (6 × 7e9 × 1e12) / (989e12 × 0.45 × 3600) = 26,249
  With 256 H100s: 26,249 / 256 = 102 hours (~4.3 days)
  Cost at $98.32/hr (256× AWS P5): 256 × 102 × $98.32 = ~$2.57M

Reference GPU FLOPS (BF16) and Typical MFU

GPU	BF16 TFLOPS	Typical MFU
NVIDIA H100 SXM	989	0.45–0.55
NVIDIA H200 SXM	989	0.45–0.55
NVIDIA B200 SXM	2,250	0.50–0.60
NVIDIA A100 80GB SXM	312	0.35–0.45
AMD MI300X	1,307	0.40–0.50
Intel Gaudi 3	1,835	0.35–0.45

How Much Does Fine-Tuning an LLM Cost?

Fine-tuning costs vary enormously by method. QLoRA on a 70B model can cost as little as $196 for a small dataset, while full PPO RLHF training costs $990K+.

Instruction Fine-Tuning Costs (AWS P5 H100 rates)

Base Model	Dataset	Method	GPUs	Duration	Cost
LLaMA 3 8B	50K examples	Full FT	8× H100	~3 hrs	~$2.4K
LLaMA 3 8B	1M examples	Full FT	8× H100	~2 days	~$38K
LLaMA 3 70B	50K examples	Full FT	64× H100	~4 hrs	~$25K
LLaMA 3 70B	1M examples	Full FT	64× H100	~3 days	~$450K
LLaMA 3 405B	50K examples	Full FT	256× H100	~8 hrs	~$200K

QLoRA (Parameter-Efficient) Fine-Tuning Costs

Base Model	Dataset	Method	GPUs	Duration	Cost
LLaMA 3 8B	50K examples	QLoRA	1× A100 40GB	~6 hrs	~$9.70
LLaMA 3 8B	1M examples	QLoRA	2× A100 80GB	~18 hrs	~$58
LLaMA 3 70B	50K examples	QLoRA	2× H100 80GB	~8 hrs	~$196
LLaMA 3 70B	1M examples	QLoRA	4× H100 80GB	~3 days	~$3.5K

Frequently Asked Questions

How much did it cost to train LLaMA 3 70B?

Meta trained LLaMA 3 70B using approximately 16,000 H100 GPUs for ~6.4 million GPU-hours, with an estimated compute cost of ~$7.7 million. This was trained on a 15T token dataset. A fine-tuning run using QLoRA (4× H100, 3 days, 1M examples) costs approximately $3,500.

How much did DeepSeek V3 cost to train compared to GPT-4?

DeepSeek V3 training is estimated at ~$5.5 million (2,000 H800 GPUs, ~2.7M GPU-hours). GPT-4 is estimated at ~$63 million (~25,000 A100 GPUs, ~50M GPU-hours). DeepSeek achieved ~10x cost reduction through mixture-of-experts architecture (671B total params, ~37B active), load-balanced routing, and FP8 mixed-precision training.

What percentage of AI training cost is GPU compute?

GPU compute is the largest cost driver at 60–75% of total training cost. Other costs: data collection and cleaning 10–20% (often underestimated), engineering labor 5–15%, storage and networking 2–5%, evaluation and iteration 5–10%, and energy (on-prem) 5–10% (~$0.06–$0.12/kWh).

How have AI training costs changed over time?

H100 training cost per token has fallen approximately 40% from Q1 2023 to Q1 2026: from ~$5,200/billion tokens at peak scarcity pricing (~$40+/hr) to ~$2,700/billion tokens at Q1 2026 rates (~$22–25/hr effective). B200/GB200 deployment will further reduce training cost per token by 3–5× due to higher FLOPS density and larger memory capacity.

AI Model Training Cost Index