Hardware Benchmarking Topic Single Precision VS Double Precision
Page Under Construction
Introduction
Both TUFLOW Classic and TUFLOW HPC can run using either a single precision (SP) or double precision (DP). When storing floating point values on a computer, a certain number of bytes per value is needed. Single precision numbers use 4 bytes and double precision numbers use 8 bytes. This will yield about 7 digits of precision for single precision and 16 digits for double.
This page discuss the relative difference in performance of the SP and DP versions of TUFLOW. This includes comparisons for TUFLOW Classic, TUFLOW HPC on CPU hardware and TUFLOW HPC on GPU hardware.
When running a double precision version of TUFLOW, next to longer runtimes, it will require significantly more memory available to run a simulation. The memory requirement of DP is almost twice that of SP. Therefore, if the results of a model run in both SP and DP versions of TUFLOW prove to be similar, the SP version of TUFLOW is recommended to take advantage of the faster simulation times.
Note Single precision calculations are also referred to as FP32 (32 bit floating point) and double precision as FP64 (64 bit floating point) calculations. This seems to be a more common terminology in GPU benchmarks.
TUFLOW Classic
The table below has runtimes for the benchmark model at 20m cell size. The same model has been run for both the SP and DP versions of TUFLOW using the Classic solution scheme on CPU hardware. This same test has been performed on a number of CPU chips.
CPU | SP Runtime (mins) | DP Runtime (mins) | % Change |
---|---|---|---|
Intel(R) Core(TM) i7-6900K CPU @ 3.20 GHz | 90.5 | 109.3 | 20.7 |
Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz | 71.7 | 87.4 | 21.9 |
AMD Ryzen Threadripper 2990WX 32-Core Processor | 65.8 | 80.3 | 22.0 |
Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40 GHz | 158.0 | 127.2 | 24.2 |
Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz | 91.3 | 115.2 | 26.2 |
Intel(R) Core(TM) i7-5960X CPU @ 3.00 GHz | 101.9 | 128.8 | 26.4 |
Intel(R) Xeon(R) CPU X5680 @ 3.33 GHz | 162.1 | 207.6 | 28.1 |
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz | 121.4 | 158.1 | 30.2 |
Intel(R) Core(TM) i7-6800K CPU @ 3.40 GHz | 90.0 | 119.1 | 32.3 |
TUFLOW HPC on CPU hardware
The table below has runtimes for the benchmark model at 20m cell size. The same model has been run for both the SP and DP versions of TUFLOW using the HPC solution scheme on CPU hardware. This same test has been performed on a number of CPU chips.
CPU | SP Runtime (mins) | DP Runtime (mins) | % Change |
---|---|---|---|
Intel(R) Core(TM) i7-6800K CPU @ 3.40 GHz | 278.3 | 291.4 | 4.7 |
Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz | 216.8 | 230.9 | 6.5 |
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz | 298.2 | 322.6 | 8.2 |
Intel(R) Core(TM) i7-5960X CPU @ 3.00 GHz | 236.9 | 260.3 | 9.9 |
Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40 GHz | 307.2 | 350.6 | 12.4 |
Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz | 221.8 | 254.3 | 14.7 |
Intel(R) Core(TM) i7-6900K CPU @ 3.20 GHz | 286.0 | 328.9 | 15.0 |
Intel(R) Xeon(R) CPU X5680 @ 3.33 GHz | 404.0 | 466.4 | 15.5 |
AMD Ryzen Threadripper 2990WX 32-Core Processor | 278.8 | 347.7 | 24.7 |
TUFLOW HPC on GPU hardware
For GPU devices, the quoted performance of GPU devices can be very different for single and double precision calculations. The table below has runtimes for the benchmark model at 20m cell size. The same model has been run for both the SP and DP versions of TUFLOW using the HPC solution scheme on GPU hardware. This same test has been performed on a number of different GPU cards.
GPU Card | SP Runtime (mins) | DP Runtime (mins) | % Change |
---|---|---|---|
NVIDIA GeForce GTX 1080 Ti | 9.4 | 14.6 | 55.4 |
NVIDIA GeForce GTX 750 Ti | 28.6 | 72.8 | 60.8 |
NVIDIA GeForce GTX 1080 | 11.3 | 18.3 | 61.8 |
NVIDIA GeForce GTX 980 | 17.7 | 29.8 | 68.0 |
NVIDIA TITAN Xp | 5.7 | 10.6 | 87.6 |
NVIDIA GeForce 840M | 89.2 | 180.2 | 101.9 |
NVIDIA GeForce RTX 2070 | 8.9 | 18.4 | 107.3 |
NVIDIA GeForce RTX 2080 | 7.6 | 16.1 | 111.4 |
NVIDIA GeForce 940MX | 71.3 | 156.2 | 118.9 |
Conclusion
Running TUFLOW Classic (CPU hardware only) is consistently giving at least 20% difference between single and double precision. It is recommended to use double precision for TUFLOW Classic models for all rain on grid models and for models with elevation over 100mAHD. This may become apparent if high mass balance values are experienced when the model is simulated using single precision.
The calculation method in TUFLOW HPC uses the depth due to its explicit nature, unlike TUFLOW Classic that uses water level due to its implicit scheme. This means that precision issues associated with applying a very small rainfall to a high elevation are not applicable in HPC. Unless testing shows otherwise, single precision version of TUFLOW should be used for all HPC simulations. When TUFLOW HPC is used on CPU hardware the differences between single and double precision are ranging from 5% to 25% depending on the processor specifications.
Running TUFLOW HPC on GPU hardware shows even more significant differences between single and double precision.
The precision solver that is required for running TUFLOW on GPU hardware will determine the type of GPU card that is best suited for the compute. For any given generation/architecture of cards, the “gaming” cards such as the GTX GeForce and RTX provide excellent single precision performance – typically comparable to that of the “scientific” cards such as the Tesla series. If double precision is required then the scientific cards are substantially faster, but these are also significantly more expensive. The Quadro series cards sit in between for both double precision performance and cost.