LLM Memory Bottleneck Simulator

Understand why massive context windows crash local AI models.

Configuration

Mac Unified Memory (RAM) 48 GB

16GB 64GB 128GB

Model Size (Parameters) 35 Billion

Simulating 4-bit quantization (e.g., Q4_K_M).

Context Window (Tokens) 32,000 Tokens

4k 128k 256k

Model Weights (Static)

The core parameters of the AI. These files must be loaded into memory to execute inference. At 4-bit compression, this requires roughly 0.62 GB per 1 Billion parameters.

KV-Cache (Dynamic)

The model's active working memory. Every input token and generated output token requires expanding this cache to maintain the conversation context.

Memory Distribution

macOS VRAM Limit (~75%)

OS & Apps

Weights

KV-Cache

Swap Memory
(Extremely Slow)

VRAM Available

-- GB

Model Weights

-- GB

KV-Cache

-- GB

Performance

Optimal