Difference between revisions of "Hardware Benchmarking Topic Single Precision VS Double Precision"

From Tuflow
Jump to navigation Jump to search
 
(33 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<span style="color:#FF0000">
 
<font size = 18>Page Under Construction</font>
 
</span>
 
 
=Introduction=
 
=Introduction=
Both TUFLOW Classic and TUFLOW HPC can run using either a single precision (SP) or double precision (DP). When storing floating point values on a computer, a certain number of bytes per value is needed.  Single precision numbers use 4 bytes and double precision numbers use 8 bytes.  This will yield about 7 digits of precision for single precision and 16 digits for double. <br>
+
The TUFLOW Classic/HPC program download includes both single and double precision versions of the executable:
This page discuss the relative difference in performance of the SP and DP versions of TUFLOW.  This includes comparisons for TUFLOW Classic, TUFLOW HPC on CPU hardware and TUFLOW HPC on GPU hardware.<br>
+
* TUFLOW_iSP_w64.exe = Single Precision
When running a double precision version of TUFLOW, next to longer runtimes, it will require significantly more memory available to run a simulation. The memory requirement of DP is almost twice that of SP. Therefore, if the results of a model run in both SP and DP versions of TUFLOW prove to be similar, the SP version of TUFLOW is recommended to take advantage of the faster simulation times.
+
* TUFLOW_iDP_w64.exe = Double Precision. <br>
 +
This page of the Wiki discusses the functional difference between both, when one should be used instead of the other and also the speed and memory performance differences. <br><br>
 +
Both TUFLOW Classic and TUFLOW HPC can run using either a single precision (SP) or double precision (DP). When storing floating point values on a computer, a certain number of bytes per value is needed.  Single precision numbers use 4 bytes and double precision numbers use 8 bytes.  This will yield from 6 to 9 digits of precision for single precision and 15 to 17 digits for double. This page discuss the relative difference in performance of the SP and DP versions of TUFLOW.  This includes comparisons for TUFLOW Classic, TUFLOW HPC on CPU hardware and TUFLOW HPC on GPU hardware.<br>
 +
<br>
 +
=Benchmark Model=
 +
The benchmark model used for this testing is based on a “challenge” issued prior to the 2012 Flood Managers Association (FMA) Conference in Sacramento, USA. There is more information on the model setup and purpose in the <u>[[FMA_Challenge_Models_Introduction | FMA challenge model introduction]]</u>. This hardware benchmark is based on the second challenge which involves a coastal river in flood with two ocean outlets. The model has been modified slightly (mainly in terms of the outputs). It is setup to use both the TUFLOW Classic (CPU) and TUFLOW HPC (on both CPU and GPU hardware) with 20m cell size and 181,981 2D cells. The model runs for three days of simulation time (72 hours) and outputs xmdf data every two hours. <br>
 
<br>
 
<br>
'''Note''' Single precision calculations are also referred to as FP32 (32 bit floating point) and double precision as FP64 (64 bit floating point) calculations.  This seems to be a more common terminology in GPU benchmarks.
 
  
=TUFLOW Classic=
+
=Benchmark Results=
 +
==TUFLOW Classic==
 
The table below has runtimes for the benchmark model at 20m cell size.  The same model has been run for both the SP and DP versions of TUFLOW using the Classic solution scheme on CPU hardware.  This same test has been performed on a number of CPU chips.
 
The table below has runtimes for the benchmark model at 20m cell size.  The same model has been run for both the SP and DP versions of TUFLOW using the Classic solution scheme on CPU hardware.  This same test has been performed on a number of CPU chips.
{| align="center" class="wikitable"
+
{| class="wikitable"
 
! style="background-color:#005581; font-weight:bold; color:white;" | CPU
 
! style="background-color:#005581; font-weight:bold; color:white;" | CPU
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | SP Runtime (mins)
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | SP Runtime (mins)
Line 17: Line 19:
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | % Change  
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | % Change  
 
|-
 
|-
|Intel(R) Core(TM) i7-6900K CPU @ 3.20 GHz|| 90.5 || 109.3 || 20.7
+
|AMD Ryzen Threadripper 2990WX 32-Core Processor || 65.8 || 80.3 || 22.0
 
|-
 
|-
 
|Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz|| 71.7 || 87.4 || 21.9
 
|Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz|| 71.7 || 87.4 || 21.9
 
|-
 
|-
|AMD Ryzen Threadripper 2990WX 32-Core Processor || 65.8 || 80.3 || 22.0
+
|Intel(R) Core(TM) i7-6800K CPU @ 3.40 GHz || 90.0 || 119.1 || 32.3
 
|-
 
|-
|Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40 GHz || 158.0 || 127.2 || 24.2
+
|Intel(R) Core(TM) i7-6900K CPU @ 3.20 GHz|| 90.5 || 109.3 || 20.7
 
|-
 
|-
 
|Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz || 91.3 || 115.2 || 26.2
 
|Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz || 91.3 || 115.2 || 26.2
Line 29: Line 31:
 
|Intel(R) Core(TM) i7-5960X CPU @ 3.00 GHz || 101.9 || 128.8 || 26.4
 
|Intel(R) Core(TM) i7-5960X CPU @ 3.00 GHz || 101.9 || 128.8 || 26.4
 
|-
 
|-
|Intel(R) Xeon(R) CPU X5680 @ 3.33 GHz || 162.1 || 207.6 || 28.1
+
|Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz || 121.4 || 158.1 || 30.2
 
|-
 
|-
|Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz || 121.4 || 158.1 || 30.2
+
|Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40 GHz || 158.0 || 127.2 || 24.2
 
|-
 
|-
|Intel(R) Core(TM) i7-6800K CPU @ 3.40 GHz || 90.0 || 119.1 || 32.3
+
|Intel(R) Xeon(R) CPU X5680 @ 3.33 GHz || 162.1 || 207.6 || 28.1
 
|}
 
|}
 
+
==TUFLOW HPC on CPU Hardware==
=TUFLOW HPC on CPU hardware=
+
The table below has runtimes for the benchmark model at 20m cell size.  The same model has been run for both the SP and DP versions of TUFLOW using the HPC solution scheme on CPU hardware.  This same test has been performed on a number of CPU chips.<br>
The table below has runtimes for the benchmark model at 20m cell size.  The same model has been run for both the SP and DP versions of TUFLOW using the HPC solution scheme on CPU hardware.  This same test has been performed on a number of CPU chips.
+
'''Note''' The GPU code has been compiled for CPU execution so users can trial the solver without access to an NVidia GPU if necessary, but the solver has been first and foremost designed for Highly Parallel Compute on GPU hardware.
{| align="center" class="wikitable"
+
{| class="wikitable"
 
! style="background-color:#005581; font-weight:bold; color:white;" | CPU
 
! style="background-color:#005581; font-weight:bold; color:white;" | CPU
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | SP Runtime (mins)
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | SP Runtime (mins)
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | DP Runtime (mins)
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | DP Runtime (mins)
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | % Change  
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | % Change  
|-
+
 
|Intel(R) Core(TM) i7-6800K CPU @ 3.40 GHz || 278.3 || 291.4 || 4.7
 
 
|-
 
|-
 
|Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz|| 216.8 || 230.9 || 6.5
 
|Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz|| 216.8 || 230.9 || 6.5
 
|-
 
|-
|Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz || 298.2 || 322.6 || 8.2
+
|Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz || 221.8 || 254.3 || 14.7
 
|-
 
|-
 
|Intel(R) Core(TM) i7-5960X CPU @ 3.00 GHz || 236.9 || 260.3 || 9.9
 
|Intel(R) Core(TM) i7-5960X CPU @ 3.00 GHz || 236.9 || 260.3 || 9.9
 
|-
 
|-
|Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40 GHz || 307.2 || 350.6 || 12.4
+
|Intel(R) Core(TM) i7-6800K CPU @ 3.40 GHz || 278.3 || 291.4 || 4.7
 
|-
 
|-
|Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz || 221.8 || 254.3 || 14.7
+
|AMD Ryzen Threadripper 2990WX 32-Core Processor || 278.8 || 347.7 || 24.7
 
|-
 
|-
 
|Intel(R) Core(TM) i7-6900K CPU @ 3.20 GHz|| 286.0 || 328.9 || 15.0
 
|Intel(R) Core(TM) i7-6900K CPU @ 3.20 GHz|| 286.0 || 328.9 || 15.0
 +
|-
 +
|Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz || 298.2 || 322.6 || 8.2
 +
|-
 +
|Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40 GHz || 307.2 || 350.6 || 12.4
 
|-
 
|-
 
|Intel(R) Xeon(R) CPU X5680 @ 3.33 GHz || 404.0 || 466.4 || 15.5
 
|Intel(R) Xeon(R) CPU X5680 @ 3.33 GHz || 404.0 || 466.4 || 15.5
|-
 
|AMD Ryzen Threadripper 2990WX 32-Core Processor || 278.8 || 347.7 || 24.7
 
 
|}
 
|}
 
+
==TUFLOW HPC on GPU Hardware==
=TUFLOW HPC on GPU hardware=
 
 
For GPU devices, the quoted performance of GPU devices can be very different for single and double precision calculations.
 
For GPU devices, the quoted performance of GPU devices can be very different for single and double precision calculations.
The table below has runtimes for the benchmark model at 20m cell size.  The same model has been run for both the SP and DP versions of TUFLOW using the HPC solution scheme on GPU hardware. This same test has been performed on a number of different GPU cards.
+
The table below has runtimes for the benchmark model at 20m cell size.  The same model has been run for both the SP and DP versions of TUFLOW using the HPC solution scheme on GPU hardware. This same test has been performed on a number of different GPU cards.<br>
{| align="center" class="wikitable"
+
'''Note''' In some cases, output drive writing speed can noticeably affect runtimes, especially when writing intermediate results. For example the runs on MS Azure Cloud were three to seven times slower than in the below table when storage network drive was used instead of a local disk.
 +
{| class="wikitable"
 
! style="background-color:#005581; font-weight:bold; color:white;"| GPU Card
 
! style="background-color:#005581; font-weight:bold; color:white;"| GPU Card
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | SP Runtime (mins)
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | SP Runtime (mins)
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | DP Runtime (mins)
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | DP Runtime (mins)
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | % Change  
 
! style="background-color:#005581; font-weight:bold; color:white;" width=15% | % Change  
 +
 +
|-
 +
|NVIDIA Tesla V100 (MS Azure Cloud)|| 3.2 || 4.3 || 31.4
 +
|-
 +
|NVIDIA GeForce RTX 2080 Ti|| 5.1 || 11.3 || 123.1
 +
|-
 +
|NVIDIA TITAN Xp|| 5.7 || 10.6 || 87.6
 +
|-
 +
|NVIDIA GeForce RTX 2080 SUPER || 7.0 || 14.0 || 100.5
 +
|-
 +
|NVIDIA GeForce RTX 2080 || 7.6 || 16.1 || 111.4
 
|-
 
|-
|NVIDIA Quadro RTX 4000 || 17.6 || 18.2 || 3.4
+
|NVIDIA GeForce RTX 2070 || 8.9 || 18.4 || 107.3
 
|-
 
|-
 
|NVIDIA GeForce GTX 1080 Ti|| 9.4 || 14.6 || 55.4
 
|NVIDIA GeForce GTX 1080 Ti|| 9.4 || 14.6 || 55.4
 
|-
 
|-
|NVIDIA GeForce GTX 750 Ti || 28.6 || 72.8 || 60.8
+
|NVIDIA Tesla K80 (MS Azure Cloud)|| 10.8 || 15.4 || 42.3
 
|-
 
|-
 
|NVIDIA GeForce GTX 1080 || 11.3 || 18.3 || 61.8
 
|NVIDIA GeForce GTX 1080 || 11.3 || 18.3 || 61.8
 +
|-
 +
|NVIDIA Quadro RTX 4000 || 17.6 || 18.2 || 3.4
 
|-
 
|-
 
|NVIDIA GeForce GTX 980|| 17.7 || 29.8 || 68.0
 
|NVIDIA GeForce GTX 980|| 17.7 || 29.8 || 68.0
 
|-
 
|-
|NVIDIA TITAN Xp|| 5.7 || 10.6 || 87.6
+
|NVIDIA GeForce GTX 750 Ti || 28.6 || 72.8 || 60.8
|-
 
|NVIDIA GeForce 840M || 89.2 || 180.2 || 101.9
 
|-
 
|NVIDIA GeForce RTX 2070 || 8.9 || 18.4 || 107.3
 
|-
 
|NVIDIA GeForce RTX 2080 SUPER || 7.0 || 14.0 || 100.5
 
 
|-
 
|-
|NVIDIA GeForce RTX 2080 || 7.6 || 16.1 || 111.4
+
|NVIDIA GeForce 940MX (Laptop) || 71.3 || 156.2 || 118.9
 
|-
 
|-
|NVIDIA GeForce 940MX || 71.3 || 156.2 || 118.9
+
|NVIDIA GeForce 840M (Laptop) || 89.2 || 180.2 || 101.9
 
|}
 
|}
 
+
<br>
 
=Conclusion=
 
=Conclusion=
Running TUFLOW Classic (CPU hardware only) is consistently giving at least 20% difference between single and double precision. It is recommended to use double precision for TUFLOW Classic models for all rain on grid models and for models with elevation over 100mAHD. This may become apparent if high mass balance values are experienced when the model is simulated using single precision. <br>
+
Simulation speed differences between single and double precision compute vary depending on the both the computational scheme and also the hardware being used for the simulation.
<br>
+
Nevertheless, in general terms double precision calculations take slightly longer and require more memory for the field data. The memory requirement of DP is almost twice that of SP.
The calculation method in TUFLOW HPC uses the depth due to its explicit nature, unlike TUFLOW Classic that uses water level due to its implicit scheme. This means that precision issues associated with applying a very small rainfall to a high elevation are not applicable in HPC. Unless testing shows otherwise, single precision version of TUFLOW should be used for all HPC simulations. When TUFLOW HPC is used on CPU hardware the differences between single and double precision are ranging from 5% to 25% depending on the processor specifications.<br>
+
There are a number of specific situations that will require DP compute for TUFLOW Classic (discussed in the following sections). With the exception of those particular cases, if the results of a model run in both SP and DP versions of TUFLOW prove to be similar (as is generally the case) the SP version of TUFLOW is recommended as it will be slightly faster and will enable larger models to be run within available CPU/GPU memory.<br>
<br>
+
==TUFLOW Classic==
Running TUFLOW HPC on GPU hardware shows even more significant differences between single and double precision.
+
Under some situations TUFLOW Classic will require double precision compute to achieve an accurate solution. These situations include:
The precision solver that is required for running TUFLOW on GPU hardware will determine the type of GPU card that is best suited for the compute. For any given generation/architecture of cards, the “gaming” cards such as the GTX GeForce and RTX provide excellent single precision performance – typically comparable to that of the “scientific” cards such as the Tesla series. If double precision is required then the scientific cards are substantially faster, but these are also significantly more expensive. The Quadro series cards sit in between for both double precision performance and cost. <br>
+
* Models with ground elevations greater than 100m or ft (depending on length unit used by your model); and
<br>
+
* Direct rainfall modelling.
'''Note''' that predicting how certain machine will perform and estimating runtime based on the hardware specification isn't possible as the TUFLOW code is very complex.
+
TUFLOW Classic uses water level as the conserved variable in its implicit solution scheme. Due to this, some numerical precision can be lost under the above situations if single precision hardware is used. Loss of solution precision will be apparent by high mass balance error in the simulation log and result files. Single precision hardware can be used for all other situations without loss of accuracy or mass balance error issues. Running TUFLOW Classic (CPU hardware only) will on average increase simulation run times by approximately 20% when using double precision compared to single precision.
 +
==TUFLOW HPC==
 +
Unlike TUFLOW Classic, single precession compute will be suitable for the majority of applications when using TUFLOW HPC with no loss of accuracy. The calculation method in TUFLOW HPC uses depth as its conserved variable in the explicit solution scheme. As a result the precision issues associated with applying a very small rainfall volume in a single timestep, or alternatively modelling at high elevation are not applicable in HPC. Note that a reduced wet/dry depth of 0.0002m (0.0007ft) is still recommended for direct rainfall models. When TUFLOW HPC is used on CPU hardware the differences in simulation speed between single and double precision range from 5% to 25% depending on the processor specifications. Running TUFLOW HPC on GPU hardware shows even greater simulation speed differences between single and double precision. The precision solver that is required for running TUFLOW on GPU hardware will determine the type of GPU card that is best suited for the compute. For any given generation/architecture of cards, the “gaming” cards such as the GTX GeForce and RTX provide excellent single precision performance – typically comparable to that of the “scientific” cards such as the Tesla series. If double precision is required, the scientific cards are substantially faster, though it’s also noting that they are also significantly more expensive. The Quadro series of GPU card currently tend to represent a middle ground between the “gaming”  and “scientific” both in terms of double precision performance and cost.

Latest revision as of 09:57, 17 August 2020

Introduction

The TUFLOW Classic/HPC program download includes both single and double precision versions of the executable:

  • TUFLOW_iSP_w64.exe = Single Precision
  • TUFLOW_iDP_w64.exe = Double Precision.

This page of the Wiki discusses the functional difference between both, when one should be used instead of the other and also the speed and memory performance differences.

Both TUFLOW Classic and TUFLOW HPC can run using either a single precision (SP) or double precision (DP). When storing floating point values on a computer, a certain number of bytes per value is needed. Single precision numbers use 4 bytes and double precision numbers use 8 bytes. This will yield from 6 to 9 digits of precision for single precision and 15 to 17 digits for double. This page discuss the relative difference in performance of the SP and DP versions of TUFLOW. This includes comparisons for TUFLOW Classic, TUFLOW HPC on CPU hardware and TUFLOW HPC on GPU hardware.

Benchmark Model

The benchmark model used for this testing is based on a “challenge” issued prior to the 2012 Flood Managers Association (FMA) Conference in Sacramento, USA. There is more information on the model setup and purpose in the FMA challenge model introduction. This hardware benchmark is based on the second challenge which involves a coastal river in flood with two ocean outlets. The model has been modified slightly (mainly in terms of the outputs). It is setup to use both the TUFLOW Classic (CPU) and TUFLOW HPC (on both CPU and GPU hardware) with 20m cell size and 181,981 2D cells. The model runs for three days of simulation time (72 hours) and outputs xmdf data every two hours.

Benchmark Results

TUFLOW Classic

The table below has runtimes for the benchmark model at 20m cell size. The same model has been run for both the SP and DP versions of TUFLOW using the Classic solution scheme on CPU hardware. This same test has been performed on a number of CPU chips.

CPU SP Runtime (mins) DP Runtime (mins) % Change
AMD Ryzen Threadripper 2990WX 32-Core Processor 65.8 80.3 22.0
Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz 71.7 87.4 21.9
Intel(R) Core(TM) i7-6800K CPU @ 3.40 GHz 90.0 119.1 32.3
Intel(R) Core(TM) i7-6900K CPU @ 3.20 GHz 90.5 109.3 20.7
Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz 91.3 115.2 26.2
Intel(R) Core(TM) i7-5960X CPU @ 3.00 GHz 101.9 128.8 26.4
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz 121.4 158.1 30.2
Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40 GHz 158.0 127.2 24.2
Intel(R) Xeon(R) CPU X5680 @ 3.33 GHz 162.1 207.6 28.1

TUFLOW HPC on CPU Hardware

The table below has runtimes for the benchmark model at 20m cell size. The same model has been run for both the SP and DP versions of TUFLOW using the HPC solution scheme on CPU hardware. This same test has been performed on a number of CPU chips.
Note The GPU code has been compiled for CPU execution so users can trial the solver without access to an NVidia GPU if necessary, but the solver has been first and foremost designed for Highly Parallel Compute on GPU hardware.

CPU SP Runtime (mins) DP Runtime (mins) % Change
Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz 216.8 230.9 6.5
Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz 221.8 254.3 14.7
Intel(R) Core(TM) i7-5960X CPU @ 3.00 GHz 236.9 260.3 9.9
Intel(R) Core(TM) i7-6800K CPU @ 3.40 GHz 278.3 291.4 4.7
AMD Ryzen Threadripper 2990WX 32-Core Processor 278.8 347.7 24.7
Intel(R) Core(TM) i7-6900K CPU @ 3.20 GHz 286.0 328.9 15.0
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz 298.2 322.6 8.2
Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40 GHz 307.2 350.6 12.4
Intel(R) Xeon(R) CPU X5680 @ 3.33 GHz 404.0 466.4 15.5

TUFLOW HPC on GPU Hardware

For GPU devices, the quoted performance of GPU devices can be very different for single and double precision calculations. The table below has runtimes for the benchmark model at 20m cell size. The same model has been run for both the SP and DP versions of TUFLOW using the HPC solution scheme on GPU hardware. This same test has been performed on a number of different GPU cards.
Note In some cases, output drive writing speed can noticeably affect runtimes, especially when writing intermediate results. For example the runs on MS Azure Cloud were three to seven times slower than in the below table when storage network drive was used instead of a local disk.

GPU Card SP Runtime (mins) DP Runtime (mins) % Change
NVIDIA Tesla V100 (MS Azure Cloud) 3.2 4.3 31.4
NVIDIA GeForce RTX 2080 Ti 5.1 11.3 123.1
NVIDIA TITAN Xp 5.7 10.6 87.6
NVIDIA GeForce RTX 2080 SUPER 7.0 14.0 100.5
NVIDIA GeForce RTX 2080 7.6 16.1 111.4
NVIDIA GeForce RTX 2070 8.9 18.4 107.3
NVIDIA GeForce GTX 1080 Ti 9.4 14.6 55.4
NVIDIA Tesla K80 (MS Azure Cloud) 10.8 15.4 42.3
NVIDIA GeForce GTX 1080 11.3 18.3 61.8
NVIDIA Quadro RTX 4000 17.6 18.2 3.4
NVIDIA GeForce GTX 980 17.7 29.8 68.0
NVIDIA GeForce GTX 750 Ti 28.6 72.8 60.8
NVIDIA GeForce 940MX (Laptop) 71.3 156.2 118.9
NVIDIA GeForce 840M (Laptop) 89.2 180.2 101.9


Conclusion

Simulation speed differences between single and double precision compute vary depending on the both the computational scheme and also the hardware being used for the simulation. Nevertheless, in general terms double precision calculations take slightly longer and require more memory for the field data. The memory requirement of DP is almost twice that of SP. There are a number of specific situations that will require DP compute for TUFLOW Classic (discussed in the following sections). With the exception of those particular cases, if the results of a model run in both SP and DP versions of TUFLOW prove to be similar (as is generally the case) the SP version of TUFLOW is recommended as it will be slightly faster and will enable larger models to be run within available CPU/GPU memory.

TUFLOW Classic

Under some situations TUFLOW Classic will require double precision compute to achieve an accurate solution. These situations include:

  • Models with ground elevations greater than 100m or ft (depending on length unit used by your model); and
  • Direct rainfall modelling.

TUFLOW Classic uses water level as the conserved variable in its implicit solution scheme. Due to this, some numerical precision can be lost under the above situations if single precision hardware is used. Loss of solution precision will be apparent by high mass balance error in the simulation log and result files. Single precision hardware can be used for all other situations without loss of accuracy or mass balance error issues. Running TUFLOW Classic (CPU hardware only) will on average increase simulation run times by approximately 20% when using double precision compared to single precision.

TUFLOW HPC

Unlike TUFLOW Classic, single precession compute will be suitable for the majority of applications when using TUFLOW HPC with no loss of accuracy. The calculation method in TUFLOW HPC uses depth as its conserved variable in the explicit solution scheme. As a result the precision issues associated with applying a very small rainfall volume in a single timestep, or alternatively modelling at high elevation are not applicable in HPC. Note that a reduced wet/dry depth of 0.0002m (0.0007ft) is still recommended for direct rainfall models. When TUFLOW HPC is used on CPU hardware the differences in simulation speed between single and double precision range from 5% to 25% depending on the processor specifications. Running TUFLOW HPC on GPU hardware shows even greater simulation speed differences between single and double precision. The precision solver that is required for running TUFLOW on GPU hardware will determine the type of GPU card that is best suited for the compute. For any given generation/architecture of cards, the “gaming” cards such as the GTX GeForce and RTX provide excellent single precision performance – typically comparable to that of the “scientific” cards such as the Tesla series. If double precision is required, the scientific cards are substantially faster, though it’s also noting that they are also significantly more expensive. The Quadro series of GPU card currently tend to represent a middle ground between the “gaming” and “scientific” both in terms of double precision performance and cost.