This is not a perfect test, nor is it exhaustive. OMP_NUM_THREADS = 64 OMP_PLACES =cores OMP_PROC_BIND =spread srun -u -n 1 -c 128 -cpu_bind =sockets. We adjusted the thread and process binding on the AMD Rome to optimize DGEMM GFLOPS/s using the following configuration: We determined the high arithmetic configuration using the standard DGEMM matrix multiplication benchmark you can find more information about this benchmark here. Using 1 AMD EPYC 7702 (Rome) with 64 physical cores and 2 hyperthreads per core.2 different configurations (high arithmetic intensity, highAI, and high memory bandwidth, highMB).3 library configurations ( mkl, mkl-workaround, OpenBLAS) for.4 NumPy functions (, ,, numpy.dot) at.You will find information about our benchmarking study here, including all the materials you need should you wish to reproduce it or run it elsewhere. We performed a small benchmarking study on AMD Rome hardware (AMD EPYC 7702 64-Core Processor) to try to estimate Python performance on Perlmutter. For more information about why this matters, please see the results of our benchmarking study below. To use Daniel de Kok's suggested workaround on Perlmutter, module load fast-mkl-amd.
INTEL C COMPILER AMD CODE
Without any intervention, MKL may use a less-optimized code path on AMD hardware. This is discussed in detail in a blog post by Daniel de Kok and in another blog post by Donald Kinghorn. Intel MKL will check the CPU manufacturer and choose a code path accordingly. Spoiler: the best way to know for sure is to benchmark your code. On Perlmutter however the CPUs are AMD, so does this recommendation still hold? The answer is yes, but with some important caveats. In the past, our advice to NERSC users was generally to use MKL as it was well-adapted for our Intel hardware. Many computationally expensive functions (like those in numpy.linalg) are using optimized libraries like Intel's Math Kernel Library (MKL) or OpenBLAS under the hood.
INTEL C COMPILER AMD SOFTWARE
Software Support Policy Software Support Policy.Application Porting and Performance Application Porting and Performance.ERCAP and Iris Guide for Allocation Managers.