K8sCalc

ai-gpu

GPU Hosting Cost Calculator

Compare GPU cloud rental costs across RunPod, Lambda Labs, and Vast.ai. Calculate monthly spend for LLM inference, fine-tuning, and ML training workloads.

Choosing a GPU Cloud for LLM Inference

GPU cloud pricing varies significantly between providers and GPU models. Understanding these differences can save 40–60% on inference costs.

Provider Overview

ProviderStrengthWeakness
RunPodAlways available, large marketplaceSlightly higher prices
Lambda LabsCheapest A100/H100 on-demandLimited availability
Vast.aiCheapest spot pricesVariable reliability

On-Demand vs Reserved vs Spot

  • On-demand: Full price, always available — best for inference serving
  • Reserved (Lambda): 1-month commitment, 20–40% discount — best for 24/7 workloads
  • Spot/Interruptible: Lowest price, can be interrupted — best for training with checkpoints

Right-Sizing for LLM Inference

Match VRAM to your model + quantization:

  • 7B FP16 → 14 GB → RTX 4090 (24 GB)
  • 13B INT4 → 6.5 GB → RTX 4090 (24 GB)
  • 70B INT4 → 35 GB → A100 40 GB
  • 70B FP16 → 140 GB → 2× A100 80 GB

Cost Optimization Tips

  • Use INT4/GGUF quantization for inference — minimal accuracy loss, 4× VRAM reduction
  • Batch requests to keep GPU utilization >70%
  • Use spot instances for batch inference jobs with a queue + retry logic

Frequently Asked Questions

Which GPU cloud is cheapest for running Llama 3 70B?

For Llama 3 70B at INT4, an A100 40 GB is sufficient. Lambda Labs typically offers the lowest on-demand A100 rate (~$1.29/hr) when available. Vast.ai spot instances can be 30–50% cheaper but are interruptible.

Is RunPod or Lambda Labs better for LLM inference?

Lambda Labs is often cheaper for A100s but has limited availability. RunPod has a larger GPU marketplace and on-demand availability. For production inference, Lambda Labs Reserved Instances offer the best $/hr. For dev and experimentation, RunPod is more flexible.

What is the difference between on-demand and spot GPU instances?

On-demand instances are always available and never interrupted. Spot instances (called 'interruptible' on RunPod, 'spot' on Vast.ai) can be 30–60% cheaper but may be reclaimed by the provider with short notice. Use spot for training jobs with checkpointing, on-demand for inference serving.

How much does it cost to run a 70B LLM 24/7 on A100?

An A100 80 GB running 24/7 at RunPod costs ~$1,793/month ($2.49/hr). Two A100 40 GB instances for 70B INT4 would cost ~$1,361/month at Lambda Labs pricing. A dedicated bare-metal A100 server is cheaper at scale — typically $800–1,200/month.

Related Tools

Related Guides