Skip to content

History & Benchmarks

The evolution from high-RAM Xeon setups to high-bandwidth GPU clusters.

Performance Comparison

The move from CPU-based inference to GPU acceleration changed the usability of the lab entirely.

LLM Token Performance Comparison of tokens per second across different hardware configurations.

LLM Performance Evolution

Year System RAM/VRAM Bandwidth Speed
2024 Xeon E5-2696v3 128 GB ECC 8.27 GB/s 0.2 t/s
2026 Multi-GPU P104 22 GB GDDR5X 320 GB/s 23 t/s

Efficiency Gain

By switching to a modern MoE model like GLM-4.7-Flash, the system is now roughly 3x faster than Llama 3.1 70B while fitting in the 22 GB VRAM.