Page Under Construction

Introduction

Both TUFLOW Classic and TUFLOW HPC can run using either a single precision or double precision. Please refer to the manual for a description on the differences between single precision (SP) and double precision (DP) versions of TUFLOW and also a discussion on which might be appropriate for a model.
This page discuss the relative difference in performance of the SP and DP versions of TUFLOW. This includes comparisons for TUFLOW Classic, TUFLOW HPC on CPU hardware and TUFLOW HPC on GPU hardware.
When running a double precision version of TUFLOW this will require significantly more memory available to run a simulation. The memory requirement of DP is almost twice that of SP.
Note Single precision calculations are also referred to as FP32 (32 bit floating point) and double precision as FP64 (64 bit floating point) calculations. This seems to be a more common terminology in GPU benchmarks.

TUFLOW Classic

The table below has runtimes for the benchmark model at 20m cell size. The same model has been run for both the SP and DP versions of TUFLOW using the Classic solution scheme on CPU hardware. This same test has been performed on a number of CPU chips.

CPU	SP Runtime (mins)	DP Runtime (mins)	% Change
Intel(R) Core(TM) i7-6900K CPU @ 3.20 GHz	90.5	109.3	20.7
Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz	71.7	87.4	21.9
AMD Ryzen Threadripper 2990WX 32-Core Processor	65.8	80.3	22.0
Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40 GHz	158.0	127.2	24.2
Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz	91.3	115.2	26.2
Intel(R) Core(TM) i7-5960X CPU @ 3.00 GHz	101.9	128.8	26.4
Intel(R) Xeon(R) CPU X5680 @ 3.33 GHz	162.1	207.6	28.1
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz	121.4	158.1	30.2
Intel(R) Core(TM) i7-6800K CPU @ 3.40 GHz	90.0	119.1	32.3

TUFLOW HPC on CPU hardware

The table below has runtimes for the benchmark model at 20m cell size. The same model has been run for both the SP and DP versions of TUFLOW using the HPC solution scheme on CPU hardware. This same test has been performed on a number of CPU chips.

CPU	SP Runtime (mins)	DP Runtime (mins)	% Change
Intel(R) Core(TM) i7-6800K CPU @ 3.40 GHz	278.3	291.4	4.7
Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz	216.8	230.9	6.5
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz	298.2	322.6	8.2
Intel(R) Core(TM) i7-5960X CPU @ 3.00 GHz	236.9	260.3	9.9
Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40 GHz	307.2	350.6	12.4
Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz	221.8	254.3	14.7
Intel(R) Core(TM) i7-6900K CPU @ 3.20 GHz	286.0	328.9	15.0
Intel(R) Xeon(R) CPU X5680 @ 3.33 GHz	404.0	466.4	15.5
AMD Ryzen Threadripper 2990WX 32-Core Processor	278.8	347.7	24.7

TUFLOW HPC on GPU hardware

For GPU devices, the quoted performance of GPU devices can be very different for single and double precision calculations. The table below has runtimes for the benchmark model at 20m cell size. The same model has been run for both the SP and DP versions of TUFLOW using the HPC solution scheme on GPU hardware. This same test has been performed on a number of different GPU cards.

GPU Card	SP Runtime (mins)	DP Runtime (mins)	% Change
NVIDIA GeForce GTX 1080 Ti	9.4	14.6	55.4
NVIDIA GeForce GTX 750 Ti	28.6	72.8	60.8
NVIDIA GeForce GTX 1080	11.3	18.3	61.8
NVIDIA GeForce GTX 980	17.7	29.8	68.0
NVIDIA TITAN Xp	5.7	10.6	87.6
NVIDIA GeForce 840M	89.2	180.2	101.9
NVIDIA GeForce RTX 2070	8.9	18.4	107.3
NVIDIA GeForce RTX 2080	7.6	16.1	111.4
NVIDIA GeForce 940MX	71.3	156.2	118.9

Conclusion

Running TUFLOW Classic (CPU hardware only) is consistently giving at least 20% difference between single precision and double precision.

When TUFLOW HPC is used on CPU hardware the differences are ranging from 5% to 25% depending on the processor.

The precision solver that is required for running TUFLOW on GPU hardware will determine the type of GPU card that is best suited for the compute. For any given generation/architecture of cards, the “gaming” cards such as the GTX GeForce and RTX provide excellent single precision performance – typically comparable to that of the “scientific” cards such as the Tesla series. If double precision is required then the scientific cards are substantially faster, but these are also significantly more expensive. The Quadro series cards sit in between for both double precision performance and cost.

Hardware Benchmarking Topic Single Precision VS Double Precision

Contents

Introduction

TUFLOW Classic

TUFLOW HPC on CPU hardware

TUFLOW HPC on GPU hardware

Conclusion

Navigation menu

Hardware Benchmarking Topic Single Precision VS Double Precision

Introduction

TUFLOW Classic

TUFLOW HPC on CPU hardware

TUFLOW HPC on GPU hardware

Conclusion

Navigation menu

Search