Hardware Benchmarking Topic HPC on CPU vs GPU

From Tuflow
Revision as of 11:23, 19 July 2018 by Chris Huxley (talk | contribs)
Jump to navigation Jump to search

Page Under Construction

Introduction

TUFLOW HPC (Heavily Parallelised Compute) has the ability to run on both CPU and Nvidia CUDA compatible GPU devices. This page discusses and compares simulation speed using both sets of computer hardware. As it's name suggests, TUFLOW HPC has been parallelised to enable simulation execution using multiple cores. This code architecture has been implemented to increase simulation speed.

Both CPU and GPU typically have multiple cores, however, GPU devices typically have a significantly larger number. For example a i7-8700 Intel CPU has 6 CPU cores (running at up to 4.7GHz). By contrast a GeForce GTX 1080ti has a total of 3,584 CUDA cores (running at up to 1.58 GHz). At the time of writing both the i7-8700k and GTX 1080ti are high end desktop hardware components. In a one-for-one comparison CPU cores are typically faster than GPU cores. The shear number of CUDA cores however typically mean simulation using GPU hardware will be faster than CPU.

Computation Speed

The speed at which TUFLOW HPC can solve depends on more than just the number of cores and processor speed. It includes things such as the instruction set architecture, microarchitecture, precision of computations, the TUFLOW model design and size (number of cells). For this reason, rather than discussing hardware components generally hardware benchmarks specific to TUFLOW provide the best indication of the relative performance of systems.
See also:

Test Case

Results

The simulations were conducted on a computer with Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz processor. Hyper-threading was disabled during the test, and this provides access to 8 physical CPU cores. NVIDIA GeForce GTX 980 GPU card (2048 CUDA cores) was used for the HPC GPU test. The table below presents runtimes for the same TUFLOW HPC model on both CPU and GPU hardware, while the the chart below shows the relative speed-up comparing to the same model run with the TUFLOW Classic solver.

Classic 20m (min) HPC 20m (min)
1 CPU 1 CPU 2 CPU 4 CPU 8 CPU 1 GPU
97.9 339.8 226.4 188.0 106.68 15.4

500px

Discussions

TUFLOW HPC on CPU vs GPU

Even through one CPU core is typically faster than one GPU core, the runtime of the HPC solver on i7-5960X using 8 CPU cores is much slower than that on GTX 980 using 2048 CUDA cores. This indicates that GPU hardware has clear advantage over CPU for the parallel computing of a TUFLOW HPC model. Both the i7-5960X and GTX 980 are mid-level desktop components at the time of writing. Please see the latest Hardware Benchmarking Results for the rankings of the CPU/GPU for running the Benchmark Model

TUFLOW Classic vs TUFLOW HPC on CPU

The results also show the runtime of the TUFLOW Classic solver is much faster than that of the HPC solver using just 1 CPU core. TUFLOW Classic runs on 1 CPU core by default because it employs an implicit scheme and is not suitable for parallel computing. However, the implicit TUFLOW Classic solver can apply a much larger timestep comparing to the explicit TUFLOW HPC solver. The timestep used for the TUFLOW Classic model was 6 seconds for this benchmark test, while the adaptive timestep used in the TUFLOW HPC solver was in the range of 1.7~2.3 seconds. This is why the TUFLOW Classic model run faster than the TUFLOW HPC model, when both were using just 1 CPU core. TUFLOW HPC model becomes faster as the number of used CPU cores increases. For this benchmark test the runtime of the TUFLOW HPC model became comparable with the TUFLOW Classic model when 8 CPU cores was used for the HPC model.