186 lines
5.0 KiB
ReStructuredText
186 lines
5.0 KiB
ReStructuredText
==========
|
|
Benchmarks
|
|
==========
|
|
System benchmarks.
|
|
|
|
|
|
Top500
|
|
======
|
|
In what year would this be the world's fastest computer?
|
|
|
|
`<https://top500.org>`_
|
|
|
|
Of the 500 fastest computers in the world, 500 of them run the Linux kernel.
|
|
Of the distributions used by the top 500 clusters, there is a list, but not
|
|
really summarized. For instance at this URL for the November 2023,
|
|
"Operating System" category, RHEL, Ubuntu, etc. are listed in many different
|
|
versions. Just plain "Linux" is listed for 45% of the clusters.
|
|
|
|
`<https://top500.org/statistics/list/>`_
|
|
|
|
By my rough summary, of the 500 machines on the list in November, 2023,
|
|
272 of them have a known distro. Broken down into major distro categories,
|
|
it is roughly:
|
|
|
|
* RHEL: 56%, 152 systems.
|
|
* SLES: 29%, 79 systems.
|
|
* Ubuntu: 15%, 41 systems.
|
|
|
|
Although Ubuntu is a Debian derivative, none of the systems listed Debian.
|
|
There were no Arch Linux, Gentoo, or similar other distros listed.
|
|
Of the RHEL clones, Rocky Linux appears to be ascendant.
|
|
|
|
|
|
Linpack
|
|
-------
|
|
The Linpack TPP benchmark "measures the floating point rate of execution for solving a linear
|
|
system of equations."
|
|
|
|
`<https://www.netlib.org/benchmark/hpl>`_
|
|
|
|
rocHPL
|
|
^^^^^^
|
|
There is a ROCm optimized version of HPL.
|
|
|
|
`<https://github.com/ROCm/rocHPL>`_
|
|
|
|
* It looks like it hasn't been updated for ROCm release 6.0.2 though. The ``gfx1100`` isn't listed.
|
|
* Depends on ``roctracer`` and ``roctx``.
|
|
* May need MPI recompiled for GPU.
|
|
* OpenMP may be needed too (if not here, elsewhere).
|
|
|
|
DGEMM
|
|
-----
|
|
DGEMM "measures the floating point rate of execution of double precision real matrix-matrix multiplication."
|
|
|
|
STREAM
|
|
------
|
|
STREAM is "a simple synthetic benchmark program that measures sustainable memory bandwidth (in GB/s)
|
|
and the corresponding computation rate for simple vector kernel."
|
|
|
|
`<https://www.cs.virginia.edu/stream>`_
|
|
|
|
PTRANS
|
|
------
|
|
PTRANS (parallel matrix transpose) "exercises the communications where pairs of processors
|
|
communicate with each other simultaneously. It is a useful test of the total communications
|
|
capacity of the network."
|
|
|
|
`<https://www.netlib.org/parkbench/html/matrix-kernels.html>`_
|
|
|
|
RandomAccess
|
|
------------
|
|
"RandomAccess measures the rate of integer random updates of memory (GUPS)."
|
|
|
|
`<https://hpcchallenge.org/projectsfiles/hpcc/RandomAccess.html>`_
|
|
|
|
FFT
|
|
---
|
|
"FFT measures the floating point rate of execution of double precision complex
|
|
one-dimensional Discrete Fourier Transform (DFT)."
|
|
|
|
`<http://www.ffte.jp>`_
|
|
|
|
Communication Bandwidth and Latency
|
|
-----------------------------------
|
|
Communication bandwidth and latency is "a set of tests to measure latency and bandwidth of a
|
|
number of simultaneous communication patterns; based on b_eff (effective bandwidth benchmark)."
|
|
|
|
`<https://fs.hlrs.de/projects/par/mpi/b_eff>`_
|
|
|
|
``hpcc``
|
|
--------
|
|
HPC Challenge benchmarks.
|
|
|
|
`<https://hpcchallenge.org/hpcc>`_
|
|
|
|
The HPC Challenge benchmarks are in the Debian ``hpcc`` package.
|
|
|
|
.. code-block:: sh
|
|
|
|
cp -p /usr/share/doc/hpcc/examples/_hpccinf.txt hpccinf.txt
|
|
hpcc
|
|
|
|
See the Output section of this documentation for benchmark results.
|
|
|
|
|
|
tinygrad
|
|
========
|
|
Benchmarks in tinygrad.
|
|
|
|
mlnotcommons
|
|
------------
|
|
Proprietary with a few libre datasets and benchmarks available.
|
|
|
|
Don't let "Commons" in the name lead you to think this is available to the mere public.
|
|
Lots of proprietary bits involved, closed lists, corporate signups and signatures, etc.
|
|
Their use of "Commons" in their name perhaps causes confusion in the marketplace
|
|
with Wikipedia Commons (and other groups that serve the public).
|
|
This isn't like Wikipedia Commons at all.
|
|
|
|
The upstream tinycorp is working on implementing some of their benchmarks using
|
|
``tinygrad`` and AMD GPUs.
|
|
|
|
`<https://mlcommons.org/datasets>`_
|
|
|
|
`<https://mlcommons.org/benchmarks>`_
|
|
|
|
`<https://github.com/mlcommons>`_
|
|
|
|
|
|
Phoronix Test Suite
|
|
===================
|
|
Phoronix test suite:
|
|
|
|
`<https://github.com/phoronix-test-suite/phoronix-test-suite/>`_
|
|
|
|
`<https://www.phoronix-test-suite.com/>`_
|
|
|
|
.. code-block:: sh
|
|
|
|
git clone https://github.com/phoronix-test-suite/phoronix-test-suite/
|
|
cd phoronix-test-suite/
|
|
apt install php-cli php-xml
|
|
./phoronix-test-suite list-missing-dependencies
|
|
./phoronix-test-suite list-tests
|
|
|
|
Meh, this automatically installs dependencies and builds, but doesn't use ROCm.
|
|
|
|
|
|
ROCm
|
|
====
|
|
Benchmarks optimized for ROCm.
|
|
|
|
HPL
|
|
---
|
|
HPL for ROCm from AMD.
|
|
|
|
.. code-block:: sh
|
|
|
|
git clone https://github.com/ROCm/rocHPL
|
|
cd rocHPL/
|
|
# git checkout v6.0.0 # build fails in Ubuntu
|
|
./install.sh
|
|
# ./build/bin/rochpl --input ./build/rocHPL/HPL.dat
|
|
# 1 GPU (works then fails subsequent runs)
|
|
./mpirun_rochpl -P 1 -Q 1 -N 45056 --NB 384
|
|
Node Binding: Process 0 [(p,q)=(0,0)] CPU Cores: 64 - {0-63}
|
|
GPU Binding: Process 0 [(p,q)=(0,0)] GPU: 0, pciBusID c3
|
|
Local matrix size = 15.1361 GBs
|
|
./mpirun_rochpl -P 1 -Q 2 -N 64000 --NB 384
|
|
./mpirun_rochpl -P 2 -Q 2 -N 90112 --NB 384
|
|
./mpirun_rochpl -P 2 -Q 4 -N 128000 --NB 384
|
|
|
|
|
|
HPCG
|
|
----
|
|
HPCG for ROCm.
|
|
|
|
.. code-block:: sh
|
|
|
|
git clone https://github.com/ROCm/rocHPCG
|
|
cd rocHPCG/
|
|
./install.sh
|
|
|
|
|