ai-gpu

DeepSeek R1 VRAM Calculator

Calculate VRAM for DeepSeek R1 (671B MoE) and its distilled variants (1.5B–70B). The full R1 requires massive multi-GPU setups; distilled versions run on consumer hardware.

DeepSeek R1: Full Model vs Distilled Variants

DeepSeek R1 is a reasoning-first LLM from DeepSeek AI, trained using reinforcement learning to excel at math, code, and logical reasoning. The full 671B MoE model rivals GPT-4o, but the distilled variants are what most engineers will actually run.

VRAM by Model Size at INT4

Model	Params	VRAM (INT4)	Minimum GPU
R1 full	671B MoE	~335 GB	8× A100 80GB
R1-Distill-70B	70B dense	~38 GB	A100 40GB
R1-Distill-32B	32B dense	~18 GB	RTX 4090 24GB (tight)
R1-Distill-14B	14B dense	~8 GB	RTX 3070 8GB
R1-Distill-7B	7B dense	~4.5 GB	Any 6GB+ GPU
R1-Distill-1.5B	1.5B dense	~1 GB	CPU-only feasible

Why the Distilled Models Are Remarkable

The distilled variants inherit R1's chain-of-thought reasoning style through knowledge distillation. R1-Distill-7B beats GPT-4 on several reasoning benchmarks — running on a consumer RTX 3070.

MoE Memory Note

The full R1 671B is MoE — only ~37B parameters are active per token. But ALL 671B parameters must be in VRAM. INT4 brings the memory footprint from 1.3 TB (FP16) to ~335 GB, which still requires serious multi-GPU hardware.

Key Terms

Full glossary →

VRAM (Video RAM)

Memory on a GPU used to store model weights, activations, and KV cache during LLM inference. VRAM is the primary constraint when running large language models locally.

Quantization

A technique to reduce model memory usage by representing weights in lower precision (INT8, INT4, GGUF-Q4). Quantization trades a small accuracy loss for significant VRAM reduction.

Frequently Asked Questions

How much VRAM does DeepSeek R1 671B need?

The full DeepSeek R1 is a 671B Mixture of Experts model. At INT4, it needs ~335 GB VRAM — requiring 8× A100 80GB or 5× H100 80GB. In practice, most users run the distilled variants (7B–70B) which offer strong reasoning on consumer hardware.

What are the DeepSeek R1 distilled models?

DeepSeek released smaller distilled versions trained from R1: R1-Distill-Qwen-1.5B, 7B, 14B, 32B and R1-Distill-Llama-8B, 70B. The 7B distill runs at GGUF Q4 on any 8GB GPU. The 70B distill runs at INT4 on an A100 40GB, with reasoning quality close to the full 671B model.

Is DeepSeek R1 better than GPT-4 for reasoning?

DeepSeek R1 matches or exceeds GPT-4o on AIME 2024, Codeforces, and MATH benchmarks — at a fraction of the training cost. For open-source local deployment, R1-Distill-70B-INT4 is the strongest reasoning model available below $2/hr cloud cost.

How do I run DeepSeek R1 locally?

Use Ollama: `ollama run deepseek-r1:7b` (for the 7B distill, ~4.5GB VRAM) or `ollama run deepseek-r1:70b` (for 70B distill at Q4, ~40GB VRAM). For the full 671B model you need a multi-GPU cluster — use vLLM with tensor parallelism.

Related Tools

AI VRAM

Calculate the GPU VRAM required to run any LLM locally or on cloud GPU. Supports all quantization levels — FP32, FP16, INT8, INT4, and GGUF variants.

GPU Cloud Cost

Compare GPU cloud rental costs across RunPod, Lambda Labs, and Vast.ai. Calculate monthly spend for LLM inference, fine-tuning, and ML training workloads.

Llama 3 70B VRAM

Calculate the exact GPU VRAM needed to run Meta Llama 3 70B. At FP16 it needs 140 GB — but INT4 quantization brings it down to ~35 GB, fitting on a single A100 40 GB.

Related Comparisons

RunPod vs Lambda Labs RunPod vs Vast.ai

Related Guides