## HPCG介绍
HPCG(High Performance Conjugate Gradients) 是一个用于评估高性能计算机(HPC 系统)真实应用性能的基准测试程序(benchmark),由 Jack Dongarra(2021 年图灵奖得主)、Michael Heroux 等人于 2013 年发起。
它被设计为对传统 LINPACK 基准(即 Top500 榜单所用的 HPL)的重要补充,甚至在某些场景下更贴近实际科学计算负载。
## 下载编译
下载地址: https://github.com/hpcg-benchmark/hpcg
```shell
wget https://github.com/hpcg-benchmark/hpcg/archive/refs/heads/master.zip
unzip master.zip
```
在解压目录下,首先需要确定使用哪种测试方式的Makefile模板,所有模板都在`./setup`目录下:
```shell
[root@localhost hpcg-master]# ll setup/
total 92
-rw-r--r--. 1 root root 5123 Sep 24 2023 Make.GCC_OMP
-rw-r--r--. 1 root root 5063 Sep 24 2023 Make.ICPC_OMP
-rw-r--r--. 1 root root 5189 Sep 24 2023 Make.Linux_MPI
-rw-r--r--. 1 root root 5129 Sep 24 2023 Make.Linux_Serial
-rw-r--r--. 1 root root 5083 Sep 24 2023 Make.Mac_MPI
-rw-r--r--. 1 root root 5133 Sep 24 2023 Make.Mac_Serial
-rw-r--r--. 1 root root 5081 Sep 24 2023 Make.Mac_Serial_debug
-rw-r--r--. 1 root root 5103 Sep 24 2023 Make.MPI_GCC_OMP
-rw-r--r--. 1 root root 5060 Sep 24 2023 Make.MPI_ICPC
-rw-r--r--. 1 root root 5050 Sep 24 2023 Make.MPI_ICPC_OMP
-rw-r--r--. 1 root root 5064 Sep 24 2023 Make.MPIICPC_OMP
-rw-r--r--. 1 root root 786 Sep 24 2023 Make.UNKNOWN
```
本文参考的是`Make.Linux_MPI`,因此需要修改`Makefile`文件中的`arch`为`arch = Linux_MPI`,否则直接运行`make`,则会报错:
```shell
[root@localhost hpcg-master]# make
Makefile:7: setup/Make.UNKNOW: No such file or directory
make: *** No rule to make target 'setup/Make.UNKNOW'. Stop.
```
修改后运行编译指令:`make clean & make`即可完成编译
```shell
[root@localhost hpcg-master]# make clean
rm -f src/CG.o src/CG_ref.o src/TestCG.o src/ComputeResidual.o src/ExchangeHalo.o src/GenerateGeometry.o src/GenerateProblem.o src/GenerateProblem_ref.o src/CheckProblem.o src/OptimizeProblem.o src/ReadHpcgDat.o src/ReportResults.o src/SetupHalo.o src/SetupHalo_ref.o src/TestSymmetry.o src/TestNorms.o src/WriteProblem.o src/YAML_Doc.o src/YAML_Element.o src/ComputeDotProduct.o src/ComputeDotProduct_ref.o src/finalize.o src/init.o src/mytimer.o src/ComputeSPMV.o src/ComputeSPMV_ref.o src/ComputeSYMGS.o src/ComputeSYMGS_ref.o src/ComputeWAXPBY.o src/ComputeWAXPBY_ref.o src/ComputeMG_ref.o src/ComputeMG.o src/ComputeProlongation_ref.o src/ComputeRestriction_ref.o src/GenerateCoarseProblem.o src/ComputeOptimalShapeXYZ.o src/MixedBaseCounter.o src/CheckAspectRatio.o src/OutputFile.o bin/xhpcg src/main.o
[root@localhost hpcg-master]# make
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/main.o src/main.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/CG.o src/CG.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/CG_ref.o src/CG_ref.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/TestCG.o src/TestCG.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeResidual.o src/ComputeResidual.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ExchangeHalo.o src/ExchangeHalo.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/GenerateGeometry.o src/GenerateGeometry.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/GenerateProblem.o src/GenerateProblem.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/GenerateProblem_ref.o src/GenerateProblem_ref.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/CheckProblem.o src/CheckProblem.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/OptimizeProblem.o src/OptimizeProblem.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ReadHpcgDat.o src/ReadHpcgDat.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ReportResults.o src/ReportResults.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/SetupHalo.o src/SetupHalo.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/SetupHalo_ref.o src/SetupHalo_ref.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/TestSymmetry.o src/TestSymmetry.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/TestNorms.o src/TestNorms.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/WriteProblem.o src/WriteProblem.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/YAML_Doc.o src/YAML_Doc.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/YAML_Element.o src/YAML_Element.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeDotProduct.o src/ComputeDotProduct.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeDotProduct_ref.o src/ComputeDotProduct_ref.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/finalize.o src/finalize.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/init.o src/init.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/mytimer.o src/mytimer.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeSPMV.o src/ComputeSPMV.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeSPMV_ref.o src/ComputeSPMV_ref.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeSYMGS.o src/ComputeSYMGS.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeSYMGS_ref.o src/ComputeSYMGS_ref.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeWAXPBY.o src/ComputeWAXPBY.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeWAXPBY_ref.o src/ComputeWAXPBY_ref.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeMG_ref.o src/ComputeMG_ref.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeMG.o src/ComputeMG.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeProlongation_ref.o src/ComputeProlongation_ref.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeRestriction_ref.o src/ComputeRestriction_ref.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/GenerateCoarseProblem.o src/GenerateCoarseProblem.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/ComputeOptimalShapeXYZ.o src/ComputeOptimalShapeXYZ.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/MixedBaseCounter.o src/MixedBaseCounter.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/CheckAspectRatio.o src/CheckAspectRatio.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 -c -o src/OutputFile.o src/OutputFile.cpp
mpicxx -DHPCG_NO_OPENMP -I./src -I./src/Linux_MPI -O3 -ffast-math -ftree-vectorize -ftree-vectorizer-verbose=0 src/main.o src/CG.o src/CG_ref.o src/TestCG.o src/ComputeResidual.o src/ExchangeHalo.o src/GenerateGeometry.o src/GenerateProblem.o src/GenerateProblem_ref.o src/CheckProblem.o src/OptimizeProblem.o src/ReadHpcgDat.o src/ReportResults.o src/SetupHalo.o src/SetupHalo_ref.o src/TestSymmetry.o src/TestNorms.o src/WriteProblem.o src/YAML_Doc.o src/YAML_Element.o src/ComputeDotProduct.o src/ComputeDotProduct_ref.o src/finalize.o src/init.o src/mytimer.o src/ComputeSPMV.o src/ComputeSPMV_ref.o src/ComputeSYMGS.o src/ComputeSYMGS_ref.o src/ComputeWAXPBY.o src/ComputeWAXPBY_ref.o src/ComputeMG_ref.o src/ComputeMG.o src/ComputeProlongation_ref.o src/ComputeRestriction_ref.o src/GenerateCoarseProblem.o src/ComputeOptimalShapeXYZ.o src/MixedBaseCounter.o src/CheckAspectRatio.o src/OutputFile.o -o bin/xhpcg
```
### 运行测试
编译成功后,会在当前目录的`bin`目录下生成对应的二进制可执行文件`xhpcg`,运行该二进制即可进行性能测试:
```shell
[root@localhost bin]# mpirun -np 8 ./xhpcg 64 64 64
```
其中`np`是进程数,这里建议和当前节点的物理核数保持一致,`64`为矩阵规模的大小
运行结束后,会在当前目录下生成`HPCG-Benchmark_3.1_***.txt`的文档,其中包含了所有的性能测试结果
```shell
HPCG-Benchmark
version=3.1
Release date=March 28, 2019
Machine Summary=
Machine Summary::Distributed Processes=8
Machine Summary::Threads per processes=1
Global Problem Dimensions=
Global Problem Dimensions::Global nx=128
Global Problem Dimensions::Global ny=128
Global Problem Dimensions::Global nz=128
Processor Dimensions=
Processor Dimensions::npx=2
Processor Dimensions::npy=2
Processor Dimensions::npz=2
Local Domain Dimensions=
Local Domain Dimensions::nx=64
Local Domain Dimensions::ny=64
Local Domain Dimensions::Lower ipz=0
Local Domain Dimensions::Upper ipz=1
Local Domain Dimensions::nz=64
########## Problem Summary ##########=
Setup Information=
Setup Information::Setup Time=2.90589
Linear System Information=
Linear System Information::Number of Equations=2097152
Linear System Information::Number of Nonzero Terms=55742968
Multigrid Information=
Multigrid Information::Number of coarse grid levels=3
Multigrid Information::Coarse Grids=
Multigrid Information::Coarse Grids::Grid Level=1
Multigrid Information::Coarse Grids::Number of Equations=262144
Multigrid Information::Coarse Grids::Number of Nonzero Terms=6859000
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
Multigrid Information::Coarse Grids::Grid Level=2
Multigrid Information::Coarse Grids::Number of Equations=32768
Multigrid Information::Coarse Grids::Number of Nonzero Terms=830584
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
Multigrid Information::Coarse Grids::Grid Level=3
Multigrid Information::Coarse Grids::Number of Equations=4096
Multigrid Information::Coarse Grids::Number of Nonzero Terms=97336
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
########## Memory Use Summary ##########=
Memory Use Information=
Memory Use Information::Total memory used for data (Gbytes)=1.501
Memory Use Information::Memory used for OptimizeProblem data (Gbytes)=0
Memory Use Information::Bytes per equation (Total memory / Number of Equations)=715.733
Memory Use Information::Memory used for linear system and CG (Gbytes)=1.32071
Memory Use Information::Coarse Grids=
Memory Use Information::Coarse Grids::Grid Level=1
Memory Use Information::Coarse Grids::Memory used=0.157993
Memory Use Information::Coarse Grids::Grid Level=2
Memory Use Information::Coarse Grids::Memory used=0.0198085
Memory Use Information::Coarse Grids::Grid Level=3
Memory Use Information::Coarse Grids::Memory used=0.00249264
########## V&V Testing Summary ##########=
Spectral Convergence Tests=
Spectral Convergence Tests::Result=PASSED
Spectral Convergence Tests::Unpreconditioned=
Spectral Convergence Tests::Unpreconditioned::Maximum iteration count=11
Spectral Convergence Tests::Unpreconditioned::Expected iteration count=12
Spectral Convergence Tests::Preconditioned=
Spectral Convergence Tests::Preconditioned::Maximum iteration count=2
Spectral Convergence Tests::Preconditioned::Expected iteration count=2
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon=
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Result=PASSED
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for SpMV=6.40138e-08
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for MG=4.63258e-09
########## Iterations Summary ##########=
Iteration Count Information=
Iteration Count Information::Result=PASSED
Iteration Count Information::Reference CG iterations per set=50
Iteration Count Information::Optimized CG iterations per set=50
Iteration Count Information::Total number of reference iterations=50
Iteration Count Information::Total number of optimized iterations=50
########## Reproducibility Summary ##########=
Reproducibility Information=
Reproducibility Information::Result=PASSED
Reproducibility Information::Scaled residual mean=9.86588e-07
Reproducibility Information::Scaled residual variance=0
########## Performance Summary (times in sec) ##########=
Benchmark Time Summary=
Benchmark Time Summary::Optimization phase=1.3113e-05
Benchmark Time Summary::DDOT=0.962174
Benchmark Time Summary::WAXPBY=0.454792
Benchmark Time Summary::SpMV=2.92182
Benchmark Time Summary::MG=17.1292
Benchmark Time Summary::Total=21.4814
Floating Point Operations Summary=
Floating Point Operations Summary::Raw DDOT=6.3334e+08
Floating Point Operations Summary::Raw WAXPBY=6.3334e+08
Floating Point Operations Summary::Raw SpMV=5.68578e+09
Floating Point Operations Summary::Raw MG=3.17357e+10
Floating Point Operations Summary::Total=3.86882e+10
Floating Point Operations Summary::Total with convergence overhead=3.86882e+10
GB/s Summary=
GB/s Summary::Raw Read B/W=11.0975
GB/s Summary::Raw Write B/W=2.56471
GB/s Summary::Raw Total B/W=13.6622
GB/s Summary::Total with convergence and optimization phase overhead=13.4799
GFLOP/s Summary=
GFLOP/s Summary::Raw DDOT=0.658238
GFLOP/s Summary::Raw WAXPBY=1.39259
GFLOP/s Summary::Raw SpMV=1.94597
GFLOP/s Summary::Raw MG=1.85273
GFLOP/s Summary::Raw Total=1.80101
GFLOP/s Summary::Total with convergence overhead=1.80101
GFLOP/s Summary::Total with convergence and optimization phase overhead=1.77697
User Optimization Overheads=
User Optimization Overheads::Optimization phase time (sec)=1.3113e-05
User Optimization Overheads::Optimization phase time vs reference SpMV+MG time=3.72871e-05
DDOT Timing Variations=
DDOT Timing Variations::Min DDOT MPI_Allreduce time=0.797678
DDOT Timing Variations::Max DDOT MPI_Allreduce time=2.3847
DDOT Timing Variations::Avg DDOT MPI_Allreduce time=1.35692
Final Summary=
Final Summary::HPCG result is VALID with a GFLOP/s rating of=1.77697
Final Summary::HPCG 2.4 rating for historical reasons is=1.80101
Final Summary::Reference version of ComputeDotProduct used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeSPMV used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeMG used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeWAXPBY used=Performance results are most likely suboptimal
Final Summary::Results are valid but execution time (sec) is=21.4814
Final Summary::You have selected the QuickPath option=Results are official for legacy installed systems with confirmation from the HPCG Benchmark leaders.
Final Summary::After confirmation please upload results from the YAML file contents to=http://hpcg-benchmark.org
```
单节点HPCG性能测试