Hardware Benchmarking
Benchmark Model
The benchmark model is based on a “challenge” issued prior to the 2012 Flood Managers Association (FMA) Conference in Sacramento, USA. There is more information on the model setup and purpose in the FMA challenge model introduction.
This hardware benchmark is based on the second challenge which involves a coastal river in flood with two ocean outlets. The model has been modified slightly (mainly in terms of the outputs). It is setup to run use both the TUFLOW "classic" (CPU) and TUFLOW GPU (graphics card) solvers for a range of cell sizes.
Cell sizes
Cell Size (m) | Number of cells |
---|---|
30 | 80,887 |
15 | 323,536 |
10 (GPU only) | 727,865 |
The model runs for three days of simulation time (72 hours). The approximate run time for the 30m model on the CPU is likely to be ~20min and for the 15m version approximately 4 hours. Given the runtime for the CPU model at 10m resolution is likely to be > 12 hours, this is skipped in the benchmark (this can also be run with a licence).
To participate in the benchmark, please follow the steps below:
- Download the model from http://www.tuflow.com/Download/TUFLOW/Benchmark_Models/FMA2_GPU_CPU_Benchmark.zip
- Extract the model on a local drive of the computer you would like to benchmark.
- Navigate to the TUFLOW\runs\ folder and run the "Run_Benchmark.bat" file. This checks if you are running a 32 or 64 bit system and then runs the benchmark. This also generates some output files that contain more information on the processor, memory and GPU that you are using.
- Email the _ TUFLOW Simulations.log, cpu.txt, ram.txt and GPU.txt files to support@tuflow.com and we will includes these in the results tables below.
In order to be able to run the GPU model am nVidia graphics card that is CUDA compatible is required. For more information on this please see the release notes.
The computer information is determined in the batch file using the wmic and dxdiag commands.
CPU Results
The following table summarises the runtimes for a range of computers. More will be added when additional results are obtained. The table is ordered based on the combined 30m and 15m runtimes, with the fastest computers at the top of the table.
Runtimes for CPU benchmarks
Processor Name | Processor Frequency (GHz)** | RAM size (GB) | RAM frequency (MHz) | Runtime 30m (mins) | Runtime 15m (mins) | Runtime 10m (mins) | Runtime Combined (mins) | System Name |
---|---|---|---|---|---|---|---|---|
Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz | 4 | 32 | 1333 | 20.5 | 220.4 | N/A | 240.9 | BRA |
Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz | 4 | 16 | 1600 | 22.68 | 244.25 | N/A | 266.93 | RH1 |
Intel(R) Core(TM) i7-5960 XCPU @ 3.00GHz | 3 | 64 | 2133 | 21.23 | 247.55 | N/A | 268.78 | MON |
Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz | 3.4 | 8 | 1600 | 23.9 | 256.7 | N/A | 280.6 | PAR |
Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz | 3.5 | 32 | 2133 | 23.6 | 269.25 | N/A | 292.85 | RH2 |
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz | 2.8 | 8 | 1600 | 26.9 | 284.1 | N/A | 311.05 | EUK |
Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz | 2.9 | 8 | 1600 | 27.65 | 283.71 | N/A | 311.36 | LM2 |
Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz | 3.5 | 32 | 1600 | 28.5 | 285.9 | N/A | 314.4 | RH3 |
Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz | 3.2 | 16 | 1600 | 31.1 | 297.43 | N/A | 328.53 | RH3 |
Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz | 2.7 | 16 | 1600 | 31.7 | 301.5 | N/A | 333.2 | MJS |
Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz | 2.7 | 32 | 1600 | 29.1 | 308.12 | N/A | 337.22 | JT1 |
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz | 3.3 | 64 | 2133 | 29.2 | 317.1 | N/A | 346.3 | EOG |
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz | 3.3 | 64 | 2133 | 33.08 | 317.86 | N/A | 350.94 | JAC |
Intel(R) Xeon(R) CPU E5-2670 V3 @ 2.30GHz | 2.3 | 96 | 2133 | 28.4 | 333.35 | N/A | 361.75 | RK2 |
Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz | 3.4 | 32 | 1600 | 39.0 | 334.4 | N/A | 373.4 | XEO |
Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz | 3.6 | 32 | 1600 | 44.18 | 335.82 | N/A | 380.00 | DCO |
Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.80GHz | 2.8 | 8 | 800 | 47.43 | 343.23 | N/A | 390.66 | AJI |
Intel(R) Core(TM) i5-4300U CPU @ 3.30GHz | 1.9 | 8 | 1600 | 35.63 | 365.81 | N/A | 393.98 | LP1 |
Intel(R) Xeon(R) W3565 CPU @ 3.20GHz | 3.2 | 12 | 1333 | 37.88 | 356.1 | N/A | 401.44 | LP2 |
2 x Intel(R) Xeon(R) X5680 CPU @ 3.33GHz | 3.3 | 64 | 1333 | 40.5 | 368.9 | N/A | 409.35 | WMD |
Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz | 2.2 | 16 | 1333 | 40.3 | 375.33 | N/A | 415.63 | FFN |
2 x Intel(R) Xeon(R) CPU E5-2643 V3 @ 3.40GHz | 3.4 | 128 | 2133 | 40.5 | 377.1 | N/A | 418.14 | XYG |
Intel(R) Xeon(R) E5-2630 CPU @ 2.30GHz | 2.3 | 64 | 1333 | 40.1 | 393.92 | N/A | 434.02 | HUH |
Intel(R) Xeon(R) E5-1603 0 CPU @ 2.80GHz | 2.8 | 16 | 1600 | 40.85 | 395.81 | N/A | 436.66 | LMD |
2 x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.80GHz | 2.3 | 38 | 1333 | 41.3 | 401.12 | N/A | 444.42 | RH5 |
Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz | 2.7 | 8 | 1600 | 39.5 | 420.7 | N/A | 60.2 | HUK |
Intel(R) Core(TM) i7-920 CPU @ 2.67GHz | 2.67 | 12 | 1066 | 45.05 | 420.7 | N/A | 465.75 | REJ |
Intel(R) Xeon(R) CPU W3505 @ 2.53GHz | 2.53 | 4 | 1333 | 49.12 | 453.5 | N/A | 502.62 | JT2 |
GPU Results
The following table summarises the runtimes for a range of computers. More will be added when additional results are obtained. The table is ordered based on the combined 30m, 15m and 10m runtimes with the fastest computers at the top of the table.
The GPU benchmark only uses a single GPU card. TUFLOW GPU can be run across multiple nVidia GPU devices. However, the benefits of these are typically more noticeable for larger models with more than 1 million cells. A number of additional benmarking tests have been completed on a 2m model and multiple GPU cards.
Runtimes for GPU benchmarks
Processor Name | Graphic Card | GPU RAM (GB) | Number of CUDA Cores* | Runtime 30m (mins) | Runtime 15m (mins) | Runtime 10m (mins) | Combined Runtime (mins) | System Name |
---|---|---|---|---|---|---|---|---|
Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz | NVIDIA GeForce GTX 980 | 4 | 2,048 | 1.4 | 7.8 | 24.4 | 33.5 | BRA |
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz | NVIDIA GeForce GTX 980 | 4 | 2,048 | 1.8 | 8.7 | 25.2 | 35.7 | EOG |
Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz | NVIDIA GeForce GTX 980 | 4 | 2,048 | 1.73 | 9.05 | 24.95 | 35.73 | JAC |
Intel(R) Xeon(R) CPU E5-2670 V3 @ 2.30GHz | NVIDIA GeForce GTX 980 | 4 | 2048 | 1.95 | 8.76 | 25.16 | 35.84 | RK2 |
Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz | NVIDIA GeForce GTX TITAN Black | 4 | 2880 | 2.05 | 10.56 | 30.78 | 43.39 | DCO |
2 x Intel(R) Xeon(R) CPU E5-2643 V3 @ 3.40GHz | NVIDIA Quadro K6000 | 4 | 2880 | 2.63 | 11.45 | 32.23 | 46.31 | XYG |
Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz | NVIDIA GeForce GTX 770 | 2 | 1,536 | 1.9 | 11.5 | 36.8 | 50.2 | PAR |
Intel(R) Xeon(R) E5-2630 CPU @ 2.30GHz | NVIDIA GeForce GTX 680 | 2 | 1536 | 2.35 | 12.95 | 41.5 | 56.8 | HUH |
Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz | NVIDIA GeForce GTX 690 | 2 | 1,536 | 2.3 | 13.7 | 43.6 | 59.6 | XEO |
2 x Intel(R) Xeon(R) CPU X5680 @ 3.33GHz | NVIDIA Tesla C2075 | 4 | 448 | 3.4 | 19.1 | 58.4 | 80.85 | WMD |
Intel(R) Core(TM) i7-5960 XCPU @ 3.00GHz | NVIDIA GeForce GTX 750 Ti | 2 | 640 | 2.93 | 18.9 | 61.48 | 83.31 | MON |
Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz | NVIDIA GeForce GTX 750 Ti | 2 | 640 | 4.78 | 18.555 | 60.4 | 83.76 | RH1 |
Intel(R) Core(TM) 2 Quad CPU Q9550 @ 2.80GHz | NVIDIA Quadro 4000 | 4 | 768 | 5.2 | 32.23 | 103.99 | 141.24 | AJI |
Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz | NVIDIA Quadro K3100M | 4 | 768 | 5.2 | 37.42 | 107.33 | 149.95 | JT1 |
Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz | NVIDIA GeForce GTX 560M | 2 | 192 | 6.78 | 46.8 | 154.72 | 208.3 | FFN |
Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz | NVIDIA NVS 5200M | 1 | 96 | 12.7 | 89.3 | 303.2 | 405.2 | MJS |
* it is noted that the number of CUDA cores is not provided as an output from the '''dxdiag''' command and this information has been sourced from the nvidia website.<br> ** The output cpu.txt only provides the 'out of the box' processor speed. If you have overclocked your cpu, then please send these details through to TUFLOW Support so we can add the correct clock speed.
Discussion
The below preliminary results of the benchmark models have been based on the data submitted so far.
More will be added when the tables above are populated.
Average reduction in Runtime from CPU to GPU
- 12.6x reduction in runtime for the 30m model
- 23.8x reduction in runtime for the 15m model
Preliminary CPU Results
The below comparison of the CPU results presents a few interesting points for discussion:
- The runtimes for both models display similar variance as a percentage of the total time across hardware capabilities (26% and 21% relative standard deviation for the 30m and 15m models respectively).
- The runtimes for both the 15m and 30m model show variance largely linked to CPU frequency but not totally. The results are dispersed, perhaps reflecting chip variability, chipset or other systems factors.
- The difference in runtime between the fastest and slowest hardware (~300%) is much less than the difference in average runtime for the 30m and 15m models (970%). Thus, nothing can improve your model runtime like efficient model design!
File:Benchmarking GPU Chart2.jpg
Preliminary GPU Results
- Similar to the CPU results, decreasing the model cell size increases the variability in what runtime you'll get per CUDA cores
- Unlike the CPU results, the variability in runtimes to cards is greater than the change in model cell size. Thus, it could be argued that the runtime of your GPU model is more dependent on the type of card you have than the runtime of your CPU model is on the processor frequency.
- From the preliminary results, the NVIDIA GTX 980 seems a crowd favorite and performs well, returning the top 3 smallest runtimes. It is likely that as model size increases that the Titan Black and K6000 with 2880 cores will result in faster runtimes.