Hardware Benchmarking Topic HPC on CPU vs GPU
Page Under Construction
Introduction
TUFLOW HPC (Heavily Parallelised Compute) has the ability to run on both CPU and Nvidia CUDA compatible GPU devices. This page discusses and compares simulation speed using both sets of computer hardware. As it's name suggests, TUFLOW HPC has been parallelised to enable simulation execution using multiple cores. This code architecture has been implemented to increase simulation speed.
Both CPU and GPU typically have multiple cores, however, GPU devices typically have a significantly larger number. For example a i7-8700 Intel CPU has 6 CPU cores (running at up to 4.7GHz). By contrast a GeForce GTX 1080ti has a total of 3,584 CUDA cores (running at up to 1.58 GHz). At the time of writing both the i7-8700k and GTX 1080ti are high end desktop hardware components. In a one-for-one comparison CPU cores are typically faster than GPU cores. The shear number of CUDA cores however typically mean simulation using GPU hardware will be faster than CPU.
Computation Speed
The speed at which TUFLOW HPC can solve depends on more than just the number of cores and processor speed. It includes things such as the instruction set architecture, microarchitecture, precision of computations, the TUFLOW model design and size (number of cells). For this reason, rather than discussing hardware components generally hardware benchmarks specific to TUFLOW provide the best indication of the relative performance of systems.
See also:
Test Case
Results
The simulations were conducted on a computer with the following hardware:
- CPU: Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz processor. This provides access to 8 physical CPU cores. Hyper-threading was disabled.
- GPU: NVIDIA GeForce GTX 1080 Ti GPU card (3584 CUDA cores).
The table below presents runtimes for the same model run using TUFLOW Classic and TUFLOW HPC on CPU. These TUFLOW HPC simulations were tested for a range of CPU core multiples (from one through to eight). The TUFLOW HPC simulation was also run using GPU hardware.
Simulation Runtime (min) | ||||||
---|---|---|---|---|---|---|
Solver | TUFLOW Classic | TUFLOW HPC | ||||
Hardware | 1 CPU | 1 CPU | 2 CPUs | 4 CPUs | 8 CPUs | 1 GPU |
Simulation Runtime | 97.9 | 339.8 | 226.4 | 188.0 | 106.68 | 6.5 |
Simulation Speed-up (relative to TUFLOW Classic) | N/A | 0.29 | 0.43 | 0.52 | 0.91 | 15.06 |
Discussions
TUFLOW Classic vs TUFLOW HPC on CPU
The results also show the runtime of TUFLOW Classic is much faster than TUFLOW HPC when both are run using a single (1) CPU core.
TUFLOW Classic uses an implicit solution scheme. Implicit schemes are inefficient to parallelise. As such, TUFLOW Classic can on run on a single CPU. TUFLOW HPC uses an explicit solver. Explicit solver are well suited to code parallelisation. Computationally, implicit schemes can achieve a stable solution at a larger time step comparable to explicit schemes. For this benchmark test the model ran using a 6 second timestep in TUFLOW Classic, while the adaptive timestep used by TUFLOW HPC ranged from 1.7 to 2.3 seconds. the larger timestep used by TUFLOW Classic helps it run faster than TUFLOW HPC if both are limited to using a single CPU.
As expected, TUFLOW HPC simulations run faster as the number of CPU cores increase. For this benchmark test 8 CPU cores were necessary for TUFLOW HPC to achieve the same speed as a single CPU using TUFLOW Classic.
This section does not consider TUFLOW HPC simulation speed with GPU acceleration. Please refer to the paragraph below.
TUFLOW HPC on CPU vs GPU
Even through one CPU core is typically faster than one GPU core, the runtime of the HPC solver on i7-5960X using 8 CPU cores is much slower than that on the NVIDIA GeForce GTX 1080 Ti GPU card using 3584 CUDA cores. This highlights how GPU hardware has clear advantage over CPU for the parallel computing using TUFLOW HPC.