Skip to main content
K8sCalc

ai-gpu

Qwen 2.5 72B VRAM Calculator

Calculate the exact GPU VRAM needed to run Qwen 2.5 72B. At FP16 it requires ~144 GB — INT4 brings it down to ~36 GB, fitting on a single A100 80 GB.

Qwen 2.5 72B: GPU Requirements

Qwen 2.5 72B is Alibaba's flagship open-weight model with strong multilingual and code capabilities. Its 72B parameters place it in the same VRAM tier as Llama 3 70B.

VRAM by Quantization

QuantizationVRAM neededMinimum GPU
FP16~144 GB2× A100 80 GB
INT8~72 GBA100 80 GB (tight)
INT4 / GGUF Q4~36 GBA100 80 GB
GGUF Q4_K_M~39 GBA100 80 GB

Context Length Impact

Qwen 2.5 72B supports up to 128K tokens. At long contexts, KV cache memory grows significantly:

KV cache ≈ 2 × num_layers × num_heads × head_dim × seq_len × batch × 2 bytes

At 32K context with FP16, add ~8–12 GB to the base VRAM requirement.

Running on Kubernetes

yaml
resources:
  limits:
    nvidia.com/gpu: 2    # 2× A100 80GB for FP16
  requests:
    nvidia.com/gpu: 2

Use vLLM with tensor parallelism for multi-GPU: ``bash vllm serve Qwen/Qwen2.5-72B-Instruct --tensor-parallel-size 2 ``

Frequently Asked Questions

What GPU can run Qwen 2.5 72B at full precision (FP16)?

FP16 requires ~144 GB VRAM. You need 2× A100 80 GB (NVLink), 4× A100 40 GB, or 2× H100 80 GB. For cloud inference, 2× A100 80GB on RunPod costs ~$5–6/hr.

Can I run Qwen 2.5 72B on a single GPU?

At INT4 (GGUF Q4_K_M), Qwen 2.5 72B requires ~36–40 GB VRAM. It fits on a single A100 80 GB or H100 SXM. A single A100 40 GB is too small at INT4 — use 2× A100 40 GB instead.

How does Qwen 2.5 72B compare to Llama 3 70B in VRAM?

Nearly identical. Both are ~72–70B parameter models. Qwen 2.5 72B supports up to 128K context length, so KV cache can be significantly larger if you use long contexts — factor in additional VRAM for context lengths above 8K.

What inference framework works best with Qwen 2.5?

vLLM has native Qwen 2.5 support and is recommended for production inference. Ollama supports GGUF-quantized Qwen 2.5 for local use. Both support the 72B model with INT4 on a single A100 80GB.

Related Tools

Related Guides