Hardware Benchmarking Topic Classic vs HPC

From Tuflow
Jump to: navigation, search

Page Under Construction

Introduction

This page discusses the differences in runtimes and also hardware requirements between TUFLOW Classic and TUFLOW HPC simulations.

Differences In Solution

TUFLOW Classic and TUFLOW HPC are both quite different in their use of hardware. TUFLOW classic can only use a single CPU core / thread. This is largely due to the implicit nature of the solution scheme (for more information see this TUFLOW forum post parallelisation of TUFLOW Classic). TUFLOW Classic can only run on CPU architecture and can not run on Graphical Processing Unit (GPU) devices.
TUFLOW HPC (which uses an explicit solution scheme) does not have the same single core limitation and can make use of multiple CPU or GPU cores (and indeed multiple GPU cards if available).
In general implicit schemes will run at a larger timestep and are therefore typically more efficient when running on a single core. However, TUFLOW HPC on larger numbers of cores will generally become faster.

Influences on runtime

There are a large number of factors that influence the speed at which the simulation runs. Some key factors are described below.

Model Size

When parallelising the computations across multiple computational cores (whether they are CPU or GPU cores), at each timestep information has to pass between the cores. This transfer of information leads to a computational overhead. Therefore, a model never scales perfectly linearly i.e. running on two cores won't be exactly two times faster than on a single core. There is more discussion on the discussion pages HPC Scaling across CPU hardware and HPC Scaling on GPU hardware.
The smaller the model, the more the overhead in talking between cores is likely to influence the model runtimes. For a very small model that takes say 30 seconds to run, it may be faster to run a TUFLOW classic model on a single CPU, than to run the same model using TUFLOW HPC on GPU hardware! However as the model gets larger (more 2D cells) the less the influence of the overhead and the greater the benefit of using multiple cores.

???? Chart of classic runtime v's HPC.

Timestepping

The implicit nature of the TUFLOW Classic solver means that it can generally run with a Courant–Friedrichs–Lewy (CFL) number of greater than 1.0. However the explicit nature of TUFLOW HPC generally requires that a CFL of less than 1 is adopted. Therefore, in general TUFLOW HPC will require a smaller timestep than the same TUFLOW Classic model.
TUFLOW HPC uses an adaptive timestep, at each timestep the depths, velocities at each cell within the model are assessed and a 2D timestep is set to meet the stability criteria. TUFLOW Classic normally uses a fixed 2D timestep for the whole simulation.
Note: It is possible to run a TUFLOW HPC using a fixed timestep. It is also possible to run a 2D only TUFLOW Classic simulation with an adaptive timestep. However, the majority of Classic simulations use a fixed timestep and the majority of HPC simulations use an adaptive timestep!
For simulations with large changes in the in hydraulic behavior throughout the simulation an adaptive timestep may be more efficient. For example, a dambreak simulation at the time of the breach the velocities and/or depths may be very high (requiring a low timestep), however, significantly after the breach the velocities/depths may be quite low (and could therefore handle a larger timestep). In this circumstance a variable timestep may be more efficient. For a tidal model the timestep may stay relatively constant throughout the simulation.

Output Frequency

At each map output interval, data is processed and written to the specified output drive. For TUFLOW HPC simulations when running on a GPU device an additional step is required to "pull" the results / data from the GPU device. This creates an additional overhead for HPC simulations. If very frequent map output data is being written, this transfer of information to/from the GPU device may add a a significant amount of time (and is also likely to take a large amount of space on the output drive).

???? chart or table of output frequency effects on runtime.