HPC FAQ: Difference between revisions

Content deleted Content added
No edit summary
Line 112:
HPC still does a small amount of work on CPU such as the model initiation and the final step of data reduction for model volume, control numbers, and stability checks. Frequent map outputs specifically for large datasets might also contribute to lower GPU utilisation as writing of outputs happens on CPU. Even in a perfect world and 2D only model it isn't possible to see 100% GPU utilisation. If there are any 1D features in the model the GPU utilisation will be even lower as 1D is processed on CPU only. A model with 1D ESTRY connection can potentially be doing a lot of work on CPU, perhaps as much as 90% CPU and 10% GPU. If the CPU hardware is not matched correctly with the GPU card it can become a bottleneck for HPC-GPU runs even with a few 1D elements. We are investigating the possibility of parallelising 1D for future releases so it is able to run on GPU.<br>
<br>
 
=Why TUFLOW Classic cannot be parallelised like TUFLOW HPC?=
It is due to its implicit solution using matrices, which means some steps in the calculations have dependencies within the numerical loops so cannot or are difficult to parallelise with any worthwhile benefit. We have started work on parallelising sections of the code, but the reduced run times would not be as significant as if using an explicit scheme. Explicit schemes (like Tuflow GPU or FV) have no dependencies in their numerical loops, so all variables on the right hand side of the equation do not appear on the left (i.e. everything on the right hand side is from the previous timestep, except for values at the model’s boundaries).<br>
It is really important to understand that different schemes can have vastly different run times and being parallelised does not necessarily mean that one scheme is faster than another:
*Implicit schemes like TUFLOW Classic use much bigger timesteps than explicit schemes, hence why on a single core, like-for-like comparison they are faster and often a lot faster than explicit schemes.
*An explicit scheme that is parallelised will run a single simulation faster by around a factor of 5 on an 8 core machine – you will never get a mark-up of 8 on an 8 core machine as there is an overhead in managing the computations across the cores.
*Users are often doing two or more simulations at the same time. For example different events (100, 20 year…, different durations, etc). In these situations, even if a scheme is parallelised, it is usually better and sometimes much better, to run each simulation unparallelised on their own core. For example, if you have four simulations and four cores, definitely don’t run them parallelised, but run all four at once unparallelised. If a fifth simulation is started up this will then slow down the other simulations.<br>
<br>
 
{{Tips Navigation
|uplink=[[ HPC_Modelling_Guidance | Back to HPC Modelling Guidance]]