History & Benchmarks
The evolution from high-RAM Xeon setups to high-bandwidth GPU clusters.
Performance Comparison
The move from CPU-based inference to GPU acceleration changed the usability of the lab entirely.
Comparison of tokens per second across different hardware configurations.
LLM Performance Evolution
| Year | System | RAM/VRAM | Bandwidth | Speed |
|---|---|---|---|---|
| 2024 | Xeon E5-2696v3 | 128 GB ECC | 8.27 GB/s | 0.2 t/s |
| 2026 | Multi-GPU P104 | 22 GB GDDR5X | 320 GB/s | 23 t/s |
Efficiency Gain
By switching to a modern MoE model like GLM-4.7-Flash, the system is now roughly 3x faster than Llama 3.1 70B while fitting in the 22 GB VRAM.