History & Benchmarks

The evolution from high-RAM Xeon setups to high-bandwidth GPU clusters.

The move from CPU-based inference to GPU acceleration changed the usability of the lab entirely.

Comparison of tokens per second across different hardware configurations.

Year	System	RAM/VRAM	Bandwidth	Speed
2024	Xeon E5-2696v3	128 GB ECC	8.27 GB/s	0.2 t/s
2026	Multi-GPU P104	22 GB GDDR5X	320 GB/s	23 t/s

Efficiency Gain

By switching to a modern MoE model like GLM-4.7-Flash, the system is now roughly 3x faster than Llama 3.1 70B while fitting in the 22 GB VRAM.