Difference between revisions of "Hardware Benchmarking Topic HPC on CPU vs GPU"

From Tuflow
Jump to navigation Jump to search
Line 9: Line 9:
 
See also [https://en.wikipedia.org/wiki/FLOPS Floating Point Operations per Second (FLOPS) on wikipedia].
 
See also [https://en.wikipedia.org/wiki/FLOPS Floating Point Operations per Second (FLOPS) on wikipedia].
 
=Results=
 
=Results=
The simulations were conducted on a computer with Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz processor. Hyper-threading was disabled during the test This provides access to 8 physical CPU cores. NVIDIA GeForce GTX 980 GPU card (2048 CUDA cores) was used for the HPC GPU test. The table below presents runtimes for the same TUFLOW HPC model on both CPU and GPU hardware, while the the chart below shows the relative speed-up comparing to the same model run with the TUFLOW Classic solver.
+
The simulations were conducted on a computer with Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz processor. Hyper-threading was disabled during the test, and this provides access to 8 physical CPU cores. NVIDIA GeForce GTX 980 GPU card (2048 CUDA cores) was used for the HPC GPU test. The table below presents runtimes for the same TUFLOW HPC model on both CPU and GPU hardware, while the the chart below shows the relative speed-up comparing to the same model run with the TUFLOW Classic solver.
 
<br>
 
<br>
 
{|class="wikitable" style="text-align: center;"
 
{|class="wikitable" style="text-align: center;"

Revision as of 15:47, 18 July 2018

Page Under Construction

Introduction

TUFLOW HPC has the ability to run on both CPU and Nvidia CUDA compatible GPU devices. Both CPU and GPU typically have multiple cores, however, GPU devices can have a large number of cores available which can be used to accelerate the TUFLOW HPC computations. CPU cores are typically faster than GPU cores.
For example an i7-8700 Intel CPU has 6 CPU cores (running at up to 4.7GHz) . By contrast a GeForce GTX 1080ti has a total of 3,584 CUDA cores (running at up to 1.58 GHz). At the time of writing both the i7-8700k and GTX 1080ti are high end desktop components.

Computation Speed

The speed at which TUFLOW HPC can solve depends on more than just the number of cores and processor speed, including; instruction set architecture, microarchitecture, precision of computations. Therefore hardware benchmarks specific to TUFLOW provide the best indication of the relative performance of systems.
See also Floating Point Operations per Second (FLOPS) on wikipedia.

Results

The simulations were conducted on a computer with Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz processor. Hyper-threading was disabled during the test, and this provides access to 8 physical CPU cores. NVIDIA GeForce GTX 980 GPU card (2048 CUDA cores) was used for the HPC GPU test. The table below presents runtimes for the same TUFLOW HPC model on both CPU and GPU hardware, while the the chart below shows the relative speed-up comparing to the same model run with the TUFLOW Classic solver.

Classic 20m (min) HPC 20m (min)
1 CPU 1 CPU 2 CPU 4 CPU 8 CPU 1 GPU
97.9 339.8 226.4 188.0 106.68 15.4

500px

Discussions

TUFLOW HPC on CPU vs GPU

Even through one CPU core is typically faster than one GPU core, the runtime of the HPC solver on i7-5960X using 8 CPU cores is much slower than that on GTX 980 using 2048 CUDA cores. This indicates that GPU hardware has clear advantage over CPU for the parallel computing of the TUFLOW HPC model. Both the i7-5960X and GTX 980 are mid-level desktop components At the time of writing. Please see the latest Hardware Benchmarking Results for the ranking of the CPU/GPU for running the Benchmark Model

TUFLOW Classic vs TUFLOW HPC on CPU

The results also show the runtime of the TUFLOW Classic solver is much faster than that of the HPC solver using 1 CPU core. TUFLOW Classic runs on 1 CPU core by default because it employs an implicit scheme and is not suitable for parallel computing. However, the implicit TUFLOW Classic solver can apply a much larger timestep comparing to the explicit TUFLOW HPC solver. The timestep used for the TUFLOW Classic model was 6 seconds for this benchmark test, while the adaptive timestep used in the TUFLOW HPC solver was in the range of 1.7~2.3 seconds. This why the TUFLOW Classic model run faster than the TUFLOW HPC model, when using just one CPU core. TUFLOW HPC model becomes faster as the number of used CPU cores increases. For this benchmark test the runtime of the TUFLOW HPC model became comparable with the TUFLOW Classic model when 8 CPU cores was used.