Difference between revisions of "Hardware Benchmarking Topic HPC on CPU vs GPU"

From Tuflow
Jump to navigation Jump to search
Line 34: Line 34:
 
|<b>Simulation Runtime</b>||97.9||339.8||226.4||188.0||106.68||6.5
 
|<b>Simulation Runtime</b>||97.9||339.8||226.4||188.0||106.68||6.5
 
|-
 
|-
|<b>Simulation Speed-up (relative to TUFLOW Classic)</b>||N/A||0.29||0.43||0.52||0.91||15
+
|<b>Simulation Speed-up (relative to TUFLOW Classic)</b>||N/A||0.29||0.43||0.52||0.91||15.06
 
|-
 
|-
 
|}
 
|}

Revision as of 11:44, 19 July 2018

Page Under Construction

Introduction

TUFLOW HPC (Heavily Parallelised Compute) has the ability to run on both CPU and Nvidia CUDA compatible GPU devices. This page discusses and compares simulation speed using both sets of computer hardware. As it's name suggests, TUFLOW HPC has been parallelised to enable simulation execution using multiple cores. This code architecture has been implemented to increase simulation speed.

Both CPU and GPU typically have multiple cores, however, GPU devices typically have a significantly larger number. For example a i7-8700 Intel CPU has 6 CPU cores (running at up to 4.7GHz). By contrast a GeForce GTX 1080ti has a total of 3,584 CUDA cores (running at up to 1.58 GHz). At the time of writing both the i7-8700k and GTX 1080ti are high end desktop hardware components. In a one-for-one comparison CPU cores are typically faster than GPU cores. The shear number of CUDA cores however typically mean simulation using GPU hardware will be faster than CPU.

Computation Speed

The speed at which TUFLOW HPC can solve depends on more than just the number of cores and processor speed. It includes things such as the instruction set architecture, microarchitecture, precision of computations, the TUFLOW model design and size (number of cells). For this reason, rather than discussing hardware components generally hardware benchmarks specific to TUFLOW provide the best indication of the relative performance of systems.
See also:

Test Case

Results

The simulations were conducted on a computer with the following hardware:

  • CPU: Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz processor. This provides access to 8 physical CPU cores. Hyper-threading was disabled.
  • GPU: NVIDIA GeForce GTX 1080 Ti GPU card (3584 CUDA cores).

The table below presents runtimes for the same model run using TUFLOW Classic and TUFLOW HPC on CPU. These TUFLOW HPC simulations were tested for a range of CPU core multiples (from one through to eight). The TUFLOW HPC simulation was also run using GPU hardware.

Simulation Runtime (min)
Solver TUFLOW Classic TUFLOW HPC
Hardware 1 CPU 1 CPU 2 CPUs 4 CPUs 8 CPUs 1 GPU
Simulation Runtime 97.9 339.8 226.4 188.0 106.68 6.5
Simulation Speed-up (relative to TUFLOW Classic) N/A 0.29 0.43 0.52 0.91 15.06

500px

Discussions

TUFLOW HPC on CPU vs GPU

Even through one CPU core is typically faster than one GPU core, the runtime of the HPC solver on i7-5960X using 8 CPU cores is much slower than that on GTX 980 using 2048 CUDA cores. This indicates that GPU hardware has clear advantage over CPU for the parallel computing of a TUFLOW HPC model. Both the i7-5960X and GTX 980 are mid-level desktop components at the time of writing. Please see the latest Hardware Benchmarking Results for the rankings of the CPU/GPU for running the Benchmark Model

TUFLOW Classic vs TUFLOW HPC on CPU

The results also show the runtime of the TUFLOW Classic solver is much faster than that of the HPC solver using just 1 CPU core. TUFLOW Classic runs on 1 CPU core by default because it employs an implicit scheme and is not suitable for parallel computing. However, the implicit TUFLOW Classic solver can apply a much larger timestep comparing to the explicit TUFLOW HPC solver. The timestep used for the TUFLOW Classic model was 6 seconds for this benchmark test, while the adaptive timestep used in the TUFLOW HPC solver was in the range of 1.7~2.3 seconds. This is why the TUFLOW Classic model run faster than the TUFLOW HPC model, when both were using just 1 CPU core. TUFLOW HPC model becomes faster as the number of used CPU cores increases. For this benchmark test the runtime of the TUFLOW HPC model became comparable with the TUFLOW Classic model when 8 CPU cores was used for the HPC model.