Skip to the content.

Linpack benchmark in DP (double precision, 64bit)

When the initial linpack was released in 1979 minicomputers like the PDP-11 (from 1970) had just moved from 16bit to 32bit with superminicomputers like the VAX-11 (1978). The supercomputer Cray-1 uses 64bit for data since 1975. By the time the TOP500 of fastest supercomputers was created in 1993 most supercomputers were using 64bit. For comparison this value is been used ever since.

64bit for normal citizens took a little longer. In 2003 AMD starts shipping the Athlon 64 processor lines with the first x86-based 64-bit processor architecture. For smartphones to move to 64bit it took until 2013 with the iPhone 5S.

Test with the Phoronix Test Suite

It can be installed in WSL or Ubuntu with:

sudo apt install php php-cli php-xml
git clone https://github.com/phoronix-test-suite/phoronix-test-suite/
cd phoronix-test-suite
sudo ./install-sh
phoronix-test-suite benchmark hpl

Speed comparison results

CPU MHz FLOPS
ATmega328P 16 94,300
i7-6820HQ 3,600 99,997,800,000
i3-10100 4,038 132,278,500,000

Downloaded from https://www.techpowerup.com/download/linpack-xtreme/.

List from 2017

Platform CPU/MCU Architecture MFlops DMIPS MHz RAM kB
Arduino Uno R3 ATmega328P AVR 8bit RISC 0.0943 10 16 2
Embedded Pi STM32F103RB ARM Cortex-M3 (ARMv7-M) 32bit 0.552 92 72 20
Node MCU 1.0 ESP8266 Tensilica Xtensa LX106 32bit 1.207 113 80 64
Node MCU32 ESP32s Tensilica Xtensa LX106 32bit 2.805 176 160 520
NUCLEO F746ZG STM32F746Z ARM Cortex-M7 (ARMv7E-M) 32bit 3.588 763 216 320
Raspberry Pi 1B BCM2835 ARM1176 (v6) 32bit 42 875 700 512000
Raspberry Pi 2 BCM2836 ARM Cortex-A7 (v7-A) 32bit 170.92 2019 900 1024000
Raspberry Pi 3 BCM2837 ARM Cortex-A53 (v8-A) 32bit 180.14 3039 1200 1024000
      LinpackDP Dhrystone    

Read more in this article - paper ICIST 2017.

HPL by Intel

The multithread version to measure the performance of supercomputers there is the High Performance Linpack:

But you can’t just download it and make/compile the benchmark and run it. You need MPI, BLAS and VSIPL. A simple solution is to download the compiled binaries from Intel

For my i7-6820HQ it reached a maximum of 99.9978 GFlops for a size of 27000. That’s $10^{11}$.