HPC FAQ

From Tuflow
Revision as of 13:47, 15 March 2018 by Stephen.kime (talk | contribs) (Created page with " '''Q. Does the HPC 2D solver produce the same results as the Classic 2D solver? '''<br> No, TUFLOW Classic uses a 2nd order ADI (Alternating Direction Implicit) finite diffe...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Q. Does the HPC 2D solver produce the same results as the Classic 2D solver?
No, TUFLOW Classic uses a 2nd order ADI (Alternating Direction Implicit) finite difference solution of the 2D SWE, while the HPC solver uses a 2nd order explicit finite volume TVD (Total Variation Diminishing) solution (a 1st order HPC solution is also available). As there is no exact solution of the equations (hence all the different solvers!), the two schemes produce different results.

However, in 2nd order mode the two schemes are generally consistent with testing thus far indicating Classic and HPC 2nd order produce peak level differences usually within a few percentage points of the depth in the primary conveyance flow paths. Greater differences can occur in areas adjoining the main flow paths and around the edge of the inundation extent where floodwaters are still rising or are sensitive to a minor rise in main flow path levels, or where upstream controlled weir flow across thick or wide embankments occurs due to the different numerical approaches.

For deep, fast flowing waterways, 1st order HPC tends to produce higher water levels and steeper gradients compared with the Classic and HPC 2nd order solutions. These differences can exceed 10% of the primary flow path depth. Typically, lower Manning’s n values are required for HPC 1st order (or the original TUFLOW GPU), to achieve a similar result to TUFLOW Classic or HPC 2nd order.

Significant differences may occur at 2D HQ boundaries. Classic treats the 2D HQ boundary as one HQ boundary across the whole HQ line, setting a water level based on the total flow across the line. Due to model splitting to parallelise the 2D domain across CPU or GPU cores, HPC applies the HQ boundary slope to each individual cell along the boundary. As with all HQ boundaries, the effect of the boundary should be well away from the area of interest, and sensitivity testing carried out to demonstrate this.

Q. Do I need to change anything to run a Classic model in HPC?
For single 2D domain model, no, other than inserting the .tcf commands:
Solution Scheme == HPC and, if running on a GPU device, Hardware == GPU.
HPC does not yet support multiple 2D domain models. Note that some more specialised or rarely used features are still not incorporated into the HPC solver.

Q. Do I need to review or recalibrate if I switch from Classic to HPC?
Yes, if transitioning from Classic to HPC (or any other solver), it is best practice to compare the results, and if there are unacceptable differences, or the model calibration has deteriorated, to finetune the model performance through adjustment of key parameters.

Typically, between TUFLOW Classic and HPC 2nd order this would only require a slight adjustment to Manning’s n values, any additional form losses at bends/obstructions or eddy viscosity values. Regardless, industry standard Manning’s n and other key parameters should only be used/needed.

Use of non-standard values is a strong indicator there are other issues such as inflows, poor boundary representation or missing/erroneous topography. A greater adjustment of parameters would be expected if transitioning between HPC 1st order (or the original TUFLOW GPU) and Classic or HPC 2nd order.

Q. Why does my HPC simulation take longer, or is not much faster, than Classic?
If running on the same CPU hardware, a well-constructed Classic model on a good timestep is nearly always faster than HPC running on a single CPU thread. Running a single HPC simulation across multiple CPU threads may produce a faster simulation than Classic. Primary reasons why the HPC may produce a slower run, especially are discussed below.

Q. Over utilisation of CPU threads/cores
Trying to run multiple HPC simulations across the same CPU threads. If, for example, you have 4 CPU threads on your computer and you run two simulations that both request 4 threads, then effectively you are overloading the CPU hardware by requesting 8 threads in total. This will slow down the simulations by an approximate factor of 2.

For this particular example, we'd suggest running both simulations using 2 threads each, noting that if you are performing other CPU intensive tasks on your machine, this also needs to be considered. You can control the number of threads requested by either using the -nt run time option, e.g. -nt2, or use the .tcf CPU Threads ==. The -nt run time option overrides CPU Threads ==. By default, the number of CPU threads taken is two (2).

Note: If Windows hyperthreading is active there typically will be two threads for each physical core. For computationally intensive processes such as TUFLOW, it is recommended that hyperthreading is deactivated so there is one thread for each core, especially on dedicated modelling machines where an improvement performance is likely.

Note: If Windows hyperthreading is active, there is little benefit in setting CPU Threads == to greater than the number of physical cores, plus this also uses additional TUFLOW Thread licences for no or little gain in performance. Therefore, set CPU Threads == to no greater than the number of physical cores (which can be viewed in Task Manager).
Note: To request the maximum number of threads on a machine use CPU Threads == MAX if hyperthreading is deactivated and CPU Threads == MAX/2 if hyperthreading is active. The latter requests half the number of threads, which for most Windows machines is the same as the number of physical cores.

Q. Low end GPU
If running a simulation using a low end or old GPU device (these are usually the graphics card that comes standard with computer), simulations can be only marginally faster, or even slower, than running Classic or HPC on CPU hardware. If running on a GPU device, high end NVidia graphics are strongly recommended. The performance of different NVidia cards varies by orders of magnitude – for benchmark tests using the original TUFLOW GPU solver review the Hardware Benchmarking Wiki page.

Q. The HPC adaptive timestepping is selecting very small timesteps
Common reasons why HPC adopts a very small timestep are provided below. To review and isolate the location of the minimum timestep the timesteps are output to:

  • Console window and .hpc.tlf file
  • .hpc.dt.csv file (this file contains every timestep)
  • Minimum dt map output (excellent for identifying the location of the minimum timestep adopted – add “dt” to Map Output Data Types ==)


Common reasons for selecting very small timesteps are:
The model has one or more or erroneous deep cells. The Celerity Control Number described further above reduces the timestep proportionally to the square root of the depth, so any unintended deep cells can cause a reduction in the timestep.

Poorly configured or schematised 2D boundary or 1D/2D link causing uncontrolled or inaccurate flow patterns. The high velocities may cause the Courant Number to control the timestep or the high velocity differentials can cause the Diffusion Number to force the timestep downwards. In these situations Classic would often become unstable alerting the modeller to an issue. However, HPC will remain stable relying on the modeller to perform more thorough reviews of flow patterns at boundaries and 1D/2D links.

If using the SRF (Storage Reduction Factor), this proportionally reduces the Δx and Δy length values in the control number formulae. This may further reduce the minimum timestep if a cell with an SRF value greater than 0.0) is the controlling cell. For example, applying a SRF of 0.8 to reduce the storage of a cell by 80% or a factor of 5, also reduces the controlling timestep for that cell by a factor of 5.

Q. I’m familiar with Classic, so do I need to be aware of anything different with HPC? Yes! TUFLOW Classic tells you where your model has deficient or erroneous data, or where the model is poorly set up by going unstable, or producing a high mass error (a sign of poor numerical convergence of the matrix solution). The best approach when developing a Classic model is to keep the timestep high (typically a half to a quarter of the cell size in metres), and if the simulation becomes unstable to investigate why. In most cases, there will be erroneous data or poor set up such as a badly orientated boundary, connecting a large 1D culvert to a single SX cell, etc.

HPC, however, remains stable and does not alert the modeller to these issues. Therefore, the following tips are highly recommended, otherwise there will be a strong likelihood that any deficient aspects to the modelling won’t be found till much further down the track, potentially causing costly reworking. So, it’s very much modeller beware!

  • Use of excessively small timesteps is a strong indicator – see discussion further above.
  • If the timestepping is erratic (i.e. not changing smoothly), or there is a high occurrence of repeated timesteps, these are indicators of an issue in the model data or set up.
  • Be more thorough in reviewing the model results. Although this is best practice for any modelling, it is paramount for unconditionally stable solvers like HPC that thorough checks of the model’s flow patterns, performance at boundaries and links is carried out.
  • The CME%, which is an excellent indicator that the Classic 2D solver is numerically converging, is not generally of use for HPC, which is volume conserving and effectively 0% subject to numerical precision. Non-zero whole of model CME% for HPC 1D/2D linked models is usually an indication of either the 1D and 2D adaptive timesteps being significantly different, or a poorly configured 1D/2D link.