|
|
Line 40: |
Line 40: |
| *Discussion HPC vs. previous GPU solver (GPU Hardware) | | *Discussion HPC vs. previous GPU solver (GPU Hardware) |
| *Discussion HPC vs. QPC (GPU Hardware) (future page - hidden) | | *Discussion HPC vs. QPC (GPU Hardware) (future page - hidden) |
− |
| |
− | =CPU Results=
| |
− | The following table summarises the runtimes for a range of computers. More will be added when additional results are obtained. The table is ordered based on the combined 30m and 15m runtimes, with the fastest computers at the top of the table.
| |
− | <br>
| |
− | '''Runtimes for CPU benchmarks'''
| |
− | {| align="center" class="wikitable"
| |
− |
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;"| Processor Name
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=12.5% | Processor Frequency (GHz)**
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=8% | RAM size (GB)
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=8% | RAM frequency (MHz)
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | Runtime 30m (mins)
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | Runtime 15m (mins)
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | Runtime 10m (mins)
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | Runtime Combined (mins)
| |
− | ! style="background-color:#C5C5C5; font-weight:bold; color:white;" width=8% | System Name
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.80GHz||4.8||64||3200||14.7||133.7|| N/A ||148.4||MRU
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-7700K CPU @ 4.70GHz||4.7||64||3200||14.7||134.8|| N/A ||149.5||RLO4
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.80GHz||4.8||64||2500||14.8||136.0|| N/A ||150.8||RLO3
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.80GHz||4.8||64||2667||14.9||136.1|| N/A ||151.0||RLO2
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.71GHz||4.71||32||2133||15.0||137.4|| N/A ||152.4||DST2
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.70GHz ||4.7||64||2500||14.9||138.7|| N/A ||153.6||ZDO
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.70GHz ||4.7||64||2500||15.2||138.9|| N/A ||154.1||RLO
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.50GHz ||4.5||64||2800||15.4||142.3|| N/A ||157.7||CRY1
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz||4.2||64||2133||15.9||144.9|| N/A ||160.8||CCO
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz||4.0||64||2133||16.7||153.52|| N/A ||170.2||NCO
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.0GHz||4.0||32||2133||17.7||160.0|| N/A ||177.7||DST1
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz||2.9||16||2400||19.5||167.1|| N/A ||186.6||AWR
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1660 v4 @ 3.20GHz ||3.2||32||2400||19.3||183.72|| N/A ||203.0||AGR
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz ||2.7||32||2133||20.7||190.6|| N/A ||211.3||DDU
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz ||2.7||32||2133||21.2||193.2|| N/A ||214.4||MG1
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6560U CPU @ 2.20GHz||2.2||8||1867||24.1||203.5|| N/A ||227.6||RCO
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz ||2.5||16||1600||23.4||212.0|| N/A ||235.4||GHA
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz ||4.0||32||1333||20.5||220.4|| N/A ||240.9||BRA
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz ||2.4||16||1600||25.5||221.9|| N/A ||247.4||NBO
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz ||3.3||64||2133||21.1||239.7|| N/A ||260.8||DAN
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz ||4.0||16||1600||22.7||244.3|| N/A ||266.9||RH1
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-5960 XCPU @ 3.00GHz ||3.0||64||2133||21.2||247.6|| N/A||268.8||MON
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-5820K CPU @ 4.30GHz ||4.3||64||2800||23.8||251.2|| N/A||275.0||CRY2
| |
− | |-
| |
− | |AMD FX(tm)-6350 Six-Core Processor @ 4.50GHz ||4.5||32||1600||35.00||240.6|| N/A||275.6||FYU
| |
− | |-
| |
− | |Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz || 3.4 || 8 || 1600 || 23.9 || 256.7 || N/A||280.6||PAR
| |
− | |-
| |
− | |AMD FX(tm)-9590 Eight-Core Processor @ 4.70GHz || 4.7 || 16 || 1866|| 32.4 || 249.1 || N/A||281.5||DDH3
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz||2.2||128||2400||27.0||259.8|| N/A ||286.8||LM3
| |
− | |-
| |
− | |AMD FX(tm)-9590 Eight-Core Processor @ 4.70GHz || 4.7 || 16 || 1333 || 33.6 || 258.9 || N/A||292.5||DDH1
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz || 3.5 || 32 || 2133 || 23.6 || 269.3 || N/A|| 292.9 ||RH2
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1680 v3 @ 3.20GHz ||3.2||16||2133||24.2||276.1|| N/A ||300.3||RFR2
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz ||3.5||128||2133||24.6||277.1|| N/A ||301.6||PTR
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz ||2.8||16||1867||28.2||276.7|| N/A ||304.9||RFR3
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz || 3.6 || 32 || 1600|| 25.8 || 268.3 || N/A|| 294.1 ||CCA
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1680 v3 @ 3.20GHz || 3.2 || 16 || 2133|| 24.2 || 276.1 || N/A|| 300.3 ||RFR2
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz ||2.8||8||1600||26.9||284.1||N/A||311.1||EUK
| |
− | |-
| |
− | |Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz || 2.9 || 8 || 1600 || 27.7 || 283.7 || N/A||311.4||LM2
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz || 3.5 || 32 || 2133 || 26.1 || 285.3 || N/A||311.4||MG2
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz || 3.1 || 16 || 2133 || 24.9 || 291.7 || N/A ||316.6||MBA
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz || 3.5 || 32 || 1600 || 28.5 || 285.9 || N/A|| 314.4 ||RH3
| |
− | |-
| |
− | |Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz ||3.4||8||1333||29.1||290.0|| N/A ||319.0||RFR4
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-3770 CPU v3 @ 3.40GHz || 3.4 || 16 || 1600 || 29.7 || 296.2 || N/A ||325.9||DDH2
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1650 0 @ 3.20GHz || 3.2 || 16 || 1600 || 31.1 || 297.4 || N/A|| 328.5 ||RH3
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz ||2.7 || 16 || 1600 || 31.7 || 301.5|| N/A ||333.2||MJS
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz || 2.7 || 32 || 1600 || 29.1 || 308.1 || N/A ||337.2||JT1
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz ||3.3||64||2133||29.2||317.1||N/A ||346.3||EOG
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1620 v2 CPU @ 3.70GHz || 3.7 || 16 || 1866 || 31.2|| 319.7 || N/A ||350.9||TBE
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz || 3.3 || 64 || 2133 || 33.1 || 317.9 || N/A ||350.9||JAC
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz ||3.4||16||3401||35.6||320.2||N/A||355.8||UOV
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz || 3.4 || 16 || 1333 || 35.9 || 320.2 || N/A ||356.1||MMO
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz || 3.7 || 32 || 2133 || 29.2 || 327.3 || N/A ||356.5||NCH
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2670 V3 @ 2.30GHz ||2.3||96||2133||28.4||333.4||N/A||361.8||RK2
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz ||3.5||16||2133||30.2||338.4|| N/A ||368.6||RFR1
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz ||3.4||32||1600||39.0||334.4||N/A ||373.4||XEO
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz || 3.6 || 32 || 1600 || 44.2 || 335.8 || N/A|| 380.0 ||DCO
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz || 3.0 || 24 || 1866 || 34.4 || 353.9 || N/A|| 388.4 ||RSU
| |
− | |-
| |
− | |Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.80GHz || 2.8|| 8|| 800 || 47.4 || 343.2 || N/A ||390.7||AJI
| |
− | |-
| |
− | |Intel(R) Core(TM) i5-4300U CPU @ 3.30GHz || 1.9 || 8 || 1600 || 35.6 || 365.8 || N/A ||394.0||LP1
| |
− | |-
| |
− | |Intel(R) Xeon(R) W3565 CPU @ 3.20GHz || 3.2 || 12 || 1333 || 37.9 || 356.1 || N/A ||401.4||LP2
| |
− | |-
| |
− | |2 x Intel(R) Xeon(R) X5680 CPU @ 3.33GHz ||3.3||64||1333||40.5||368.9||N/A ||409.4||WMD
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz || 2.2 || 16 || 1333 || 40.3 || 375.3 || N/A ||415.6||FFN
| |
− | |-
| |
− | |2 x Intel(R) Xeon(R) CPU E5-2643 V3 @ 3.40GHz ||3.4||128||2133||40.5||377.1||N/A||418.1||XYG
| |
− | |-
| |
− | |Intel(R) Xeon(R) E5-2630 CPU @ 2.30GHz || 2.3 || 64 || 1333 || 40.1 || 393.92 || N/A ||434.0||HUH
| |
− | |-
| |
− | |Intel(R) Xeon(R) E5-1603 0 CPU @ 2.80GHz ||2.8||16||1600||40.9||395.8||N/A ||436.7||LMD
| |
− | |-
| |
− | |2 x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.80GHz ||2.3||38||1333||41.3||401.1||N/A||444.4||RH5
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz || 2.7|| 8|| 1600 || 39.5 || 420.7 || N/A ||460.2||HUK
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-920 CPU @ 2.67GHz || 2.67|| 12|| 1066 || 45.1 || 420.7 || N/A ||465.8||REJ
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz || 2.60|| 9.33|| 2597|| 34.0|| 453.9|| N/A ||487.9 || Microsoft Azure NC6 (Cloud)
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU W3505 @ 2.53GHz ||2.53||4||1333||49.1||453.5||N/A||502.6||JT2
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz ||2.4||8|| ||37.6||492.9||N/A||530.5||DAG
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz ||2.6||60||NA||49.5||543.3||N/A||592.7||Amazon Web Services g2.8xlarge (Cloud)
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz||2.6||15||NA||51.4||563.8||N/A||615.2||Amazon Web Services g2.2xlarge (Cloud)
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz ||3.6||16||1600||31.9||669.2||N/A||701.1||RFR5
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU X5650 @ 2.67GHz ||2.67||4|| ||64.3||669.2||N/A||733.5||ADU
| |
− | |-
| |
− | |}
| |
− |
| |
− | =GPU Results=
| |
− | The following table summarises the runtimes for a range of computers. More will be added when additional results are obtained. The table is ordered based on the combined 30m, 15m and 10m runtimes with the fastest computers at the top of the table.
| |
− | <br>
| |
− | The GPU benchmark only uses a single GPU card. TUFLOW GPU can be run across multiple nVidia GPU devices. However, the benefits of these are typically more noticeable for larger models with more than 1 million cells. A number of additional benmarking tests have been completed on a 2m model and multiple GPU cards.
| |
− | <br>
| |
− | '''Runtimes for GPU benchmarks'''
| |
− | {| align="center" class="wikitable"
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=23% | Processor Name
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=15% | Graphic Card
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=5% | GPU RAM (GB)
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=8% | Number of CUDA Cores*
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | Runtime 30m (mins)
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | Runtime 15m (mins)
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | Runtime 10m (mins)
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | Combined Runtime (mins)
| |
− | ! style="background-color:#C5C5C5; font-weight:bold; color:white;" width=10% | System Name
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz ||NVIDIA GeForce GTX 1080||8||2,560||1.0||5.2||16.1||22.2||CCO
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz ||NVIDIA GeForce GTX 1080||8||2,560||1.0||5.2||16.1||22.3||NCO
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.71GHz ||NVIDIA GeForce GTX 1080||8||2,560||1.2||5.5||16.4||23.1||DST2
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.70GHz ||NVIDIA GeForce GTX 1080||8||2,560||1.2||5.6||16.5||23.3||RLO
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz ||NVIDIA GeForce GTX 980 Ti||6||2,816||1.3||6.0||17.4||24.7||ARN
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz ||NVIDIA GeForce GTX 1080||8||2,560||1.3||6.0||17.5||24.8||DST1
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz ||NVIDIA GeForce GTX TITAN X||12||3,072||1.3||6.2||18.1||25.6||DAN
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1630 v3 @ 3.70GHz ||NVIDIA GeForce GTX TITAN X||12||3,072||1.7||7.0||19.6||28.4||NCH
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.80GHz ||NVIDIA GeForce GTX 1060||6||1,280||1.3||7.8||25.0||34.1||MRU
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz ||NVIDIA GeForce GTX 980||4||2,048||1.4||7.8||24.4||33.5||BRA
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz ||NVIDIA GeForce GTX 980||4||2,048||1.8||8.4||25.4||35.5||PTR
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz ||NVIDIA GeForce GTX 980||4||2,048||1.8||8.7||25.2||35.7||EOG
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz || NVIDIA GeForce GTX 980 || 4 || 2,048|| 1.7 || 9.1 || 25.0 ||35.7||JAC
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2670 V3 @ 2.30GHz ||NVIDIA GeForce GTX 980||4||2048||2.0||8.8||25.2||35.8||RK2
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz ||NVIDIA Tesla K80||24||4,992||1.5||8.9||28.3||38.7||LM3
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz ||NVIDIA Tesla K80||24||4,992||1.5||9.1||28.9||39.4||Microsoft Azure NC6 (Cloud)
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz ||NVIDIA GeForce GTX TITAN Black||4||2880||2.1||10.6||30.8||43.4||DCO
| |
− | |-
| |
− | |2 x Intel(R) Xeon(R) CPU E5-2643 V3 @ 3.40GHz ||NVIDIA Quadro K6000||4||2880||2.6||11.5||32.2||46.3||XYG
| |
− | |-
| |
− | |Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz || NVIDIA GeForce GTX 770 || 2 || 1,536 || 1.9 || 11.5 || 36.8||50.2||PAR
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1680 v3 @ 3.20GHz||NVIDIA Quadro M4000||8||1,664||2.3||11.7||36.2||50.2||RFR2
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1660 v4 @ 3.20GHz ||NVIDIA Quadro M4000||8||1,664||2.4||12.0||36.7||51.1||AGR
| |
− | |-
| |
− | |Intel(R) Xeon(R) E5-2630 CPU @ 2.30GHz || NVIDIA GeForce GTX 680 || 2 || 1536 || 2.4 || 13.0 || 41.5 ||56.8||HUH
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E3-1240 V2 @ 3.40GHz ||NVIDIA GeForce GTX 690||2||1,536||2.3||13.7||43.6||59.6||XEO
| |
− | |-
| |
− | |AMD FX(tm)-6350 Six-Core Processor @ 4.50GHz||NVIDIA GeForce GTX 960||2||1,024||2.6||14.0||43.3||59.8||FYU
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz ||NVIDIA Tesla K20c||5||2,496||2.1||13.8||44.5||60.4||MBA
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz||4xNVIDIA K520 GRID GPUs||32||6,144||3.2||17.0||52.7||72.9||Amazon Web Services g2.8xlarge (Cloud)
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz||1xNVIDIA K520 GRID GPUs||8||1,536||3.2||17.1||53.0||73.3||Amazon Web Services g2.2xlarge (Cloud)
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz ||NVIDIA Quadro K4200||4||1,344||2.7||16.4||54.9||74.0||MG2
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz ||NVIDIA Quadro K4200||4||1,344||2.5||16.8||55.1||74.3||CCA
| |
− | |-
| |
− | |2 x Intel(R) Xeon(R) CPU X5680 @ 3.33GHz ||NVIDIA Tesla C2075 ||4||448||3.4||19.1||58.4||80.9||WMD
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz ||NVIDIA GeForce GTX 750 Ti||2||640||2.9||18.9||60.6||82.4||DDH2
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-5960 XCPU @ 3.00GHz ||NVIDIA GeForce GTX 750 Ti||2||640||2.9||18.9||61.5||83.3||MON
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz ||NVIDIA GeForce GTX 750 Ti||2||640||4.8||18.6||60.4||83.8||RH1
| |
− | |-
| |
− | |Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz ||NVIDIA GeForce GTX 660||2||960||3.2||26.4||58.5||88.1||RFR4
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz ||NVIDIA Quadro M1200||4||640||2.9||20.0||72.7||95.6||AWR
| |
− | |-
| |
− | |AMD FX(tm)-9590 Eight-Core Processor @ 4.70GHz ||NVIDIA GeForce GTX 750 ||1||512||3.8||22.7||72.2||98.7||DDH1
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz ||NVIDIA Quadro M1000M ||2||512||4.1||23.9||75.3||103.4||MG1
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz ||NVIDIA Quadro M1000M ||2||512||4.2||24.3||75.2||103.7||DDU
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.30GHz ||NVIDIA Quadro K4000 ||3||768||4.4||27.6||88.2||120.2||RSU
| |
− | |-
| |
− | |Intel(R) Core(TM) 2 Quad CPU Q9550 @ 2.80GHz ||NVIDIA Quadro 4000||4||256||5.2||32.2||104.0||141.2||AJI
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz ||NVIDIA Quadro K3100M||4||768||5.2||37.4||107.3||150.0||JT1
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz || NVIDIA Quadro K2000 || 2 || 384 || 6.8 || 46.7 || 151.8 ||205.3||UOV
| |
− | |-
| |
− | | Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz || NVIDIA Quadro K2000 || 2 || 384 || 6.8 || 46.1 || 151.8 ||204.7||MMO
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-2670QM CPU @ 2.20GHz || NVIDIA GeForce GTX 560M || 2 || 192 || 6.8 || 46.8 || 154.7 ||208.3||FFN
| |
− | |-
| |
− | |ntel(R) Core(TM) i7-5820K CPU @ 3.30GHz || NVIDIA GeForce GT 730 || 2 || 384 || 12.4 || 87.8 || 293.6||393.7||CRY2
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz || NVIDIA NVS 5200M || 1 || 96 || 12.7 || 89.3 || 303.2||405.2||MJS
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU 4.70GHz || NVIDIA GeForce GT 730 || 2 || 384 || 12.6 || 93.7 || 316.1 ||422.4||ZDO
| |
− | |-
| |
− | |Intel(R) Xeon(R) CPU E5-1620 v2 CPU @ 3.70GHz || NVIDIA Quadro K600 || 1 || 192 || 14.2 || 101.7 || 338.4 ||454.3||TBE
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.80GHz|| NVIDIA GeForce GT 710 || 2 || 192 || 19.0 || 139.8 || 470.8 ||629.6||RLO3
| |
− | |-
| |
− | |Intel(R) Core(TM) i7-6700K CPU @ 4.80GHz || NVIDIA GeForce GT 710 || 2 || 192 || 19.0 || 139.7 || 471.1 ||629.8||RLO2
| |
− | |-
| |
− | |}
| |
− | <pre> * it is noted that the number of CUDA cores is not provided as an output from the '''dxdiag''' command and this information has been sourced from the nvidia website.
| |
− | ** The output cpu.txt only provides the 'out of the box' processor speed. If you have overclocked your cpu, then please send these details through to TUFLOW Support so we can add the correct clock speed. </pre>
| |
− |
| |
− | =Discussion=
| |
− | The below preliminary results of the benchmark models have been based on the data submitted so far.
| |
− |
| |
− | '''Preliminary CPU Results'''
| |
− |
| |
− | The below comparison of the CPU results presents a few interesting points for discussion:
| |
− | *The runtimes for both models display similar variance as a percentage of the total time across hardware capabilities (36% and 39% relative standard deviation for the 30m and 15m models respectively).
| |
− | *The runtimes for both the 15m and 30m model show variance largely linked to CPU frequency but not totally. The results are dispersed, perhaps reflecting chip variability, chipset or other systems factors.
| |
− | *The difference in runtime between the fastest and slowest hardware (~440-500%) is much less than the difference in average runtime for the 30m and 15m models (1,110%). Thus, nothing can improve your model runtime like efficient model design!
| |
− |
| |
− | [[File:CPUfrq5.png | 800px ]]
| |
− |
| |
− | <br>
| |
− | '''Preliminary GPU Results'''
| |
− |
| |
− | *Similar to the CPU results, decreasing the model cell size increases the variability in what runtime you'll get per CUDA cores
| |
− | *Unlike the CPU results, the variability in runtimes to cards is greater than the change in model cell size. Thus, it could be argued that the runtime of your GPU model is more dependent on the type of card you have than the runtime of your CPU model is on the processor frequency.
| |
− | *From the results received so far, the NVIDIA GTX 980 seems a crowd favorite and performs well. It is likely that as model size increases that the Titan Black and K6000 with 2880 cores will result in faster runtimes. The NVIDIA GeForce GTX 1080 is topping the table.
| |
− |
| |
− | [[File:GPUvsCUDA5.png | 800px ]]
| |
− |
| |
− | <br>
| |
− | '''Average reduction in Runtime from CPU to GPU'''
| |
− | When comparing the CPU and GPU runtimes for the 15 and 30 m models on average the following runtime improvments are achieved:
| |
− | *11.0x reduction in runtime for the 30m model (80,000 cells)
| |
− | *20.4x reduction in runtime for the 15m model (325,000 cells)
| |
− | These results highlight the relationship between GPU/CPU runtime reduction relative the number of the cells in a model. The reduction ratio increases with the size of the model (number of cells). Up to a 100x reduction in runtime have been recorded using a 18,000,000 cell GPU model refer [[Hardware_Benchmarking#Large_Model_GPU_Benchmarking| Large Model GPU Benchmarking]].
| |
− | <br>
| |
− | [[File:CPUvsGPU5.png | 800px ]]
| |
− |
| |
− | <br>
| |
− |
| |
− | =Large Model GPU Benchmarking=
| |
− | In addition to the benchmarking completed on the 10m, 15m and 30m models, a number of tests were completed by running the FMA Demo Model 2 at a 2m resolution on up to four GPU cards.The 2m model has approximately 18.2 M cells and was simulated for the following test cases:
| |
− | <li> Run with 1 x NVIDIA Geforce GTX 680 GPU card
| |
− | <li> Run with 2 x NVIDIA Geforce GTX 680 GPU cards
| |
− | <li> Run with 3 x NVIDIA Geforce GTX 680 GPU cards
| |
− | <li> Run with 4 x NVIDIA Geforce GTX 680 GPU cards
| |
− | <li> Run with CPU Only</li>
| |
− | The five runs detailed above were also re-run on the 10m grid.
| |
− | <br>
| |
− | === Explanation of Tabulated Results===
| |
− | The large model benchmarking results are summarised in the below table. The contents of each column is detailed as follows:
| |
− | <li> 2m Runtime: Total time for the 2m model to complete.
| |
− | <li> 10m Runtime: Total time for the 10m model to complete.
| |
− | <li> 2m Runtime (realtime (mins) / simtime (hour)): Number of minutes in 'real time' to run 60mins of model time. For example, with 1 GPU card it takes 51.3 mins to run 60 mins of model time. With four GPU cards it takes 22.4 mins to run 60 mins of model time.
| |
− | <li> 10m Runtime (realtime (mins) / simtime (hour)): Number of minutes in 'real time' to run 60 mins of model time.
| |
− | <li> 2m CPU/GPU Speedup Factor: How much faster the GPU/Multi GPU runs are compared to the CPU only for the 2m model.
| |
− | <li> 10m CPU/GPU Speedup Factor: How much faster the GPU/Multi GPU runs are compared to the CPU only for the 10m model.
| |
− | <li> 2m MultiGPU Speedup Factor: How much faster the Multi GPU runs complete compared to when only a single GPU card is used
| |
− | <li> 10m MultiGPU Speedup Factor: How much faster the Multi GPU runs complete compared to when only a single GPU card is used.
| |
− | <br>
| |
− | '''Runtimes for GPU benchmarks'''
| |
− | {| align="center" class="wikitable"
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=20% | Run ID
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=15% | 2m Runtime (min)
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=8% | 10m Runtime (min)
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=8% | 2m Runtime (realtime (mins) / simtime (hour))
| |
− |
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | 10m Runtime (realtime (mins) / simtime (hour))
| |
− |
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | 2m CPU/GPU Speedup Factor
| |
− |
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | 10m CPU/GPU Speedup Factor
| |
− |
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | 2m MulitGPU Speedup Factor
| |
− |
| |
− | ! style="background-color:#005581; font-weight:bold; color:white;" width=10% | 10m MulitGPU Speedup Factor
| |
− |
| |
− | |-
| |
− | |1 x NVIDIA Geforce GTX 680 GPU ||513.2||4.6||51.3||0.5||44||18||1||1
| |
− | |-
| |
− | |2 x NVIDIA Geforce GTX 680 GPU ||318.5||3.5||31.8||0.01||71||23||1.6||1.3
| |
− | |-
| |
− | |3 x NVIDIA Geforce GTX 680 GPU ||230.6||3.2||23.1||0.01||98||26||2.2||1.4
| |
− | |-
| |
− | |4 x NVIDIA Geforce GTX 680 GPU ||223.7||3.6||22.4||0.01||101||23||2.3||1.3
| |
− | |-
| |
− | |CPU Only ||23478.3||81.5||2347.8||0.14||NA||NA||NA||NA
| |
− | |}
| |
− |
| |
− | ===Discussion===
| |
− | The results of the large model GPU testing indicate:
| |
− |
| |
− | <li> GPU 44-101 times faster than CPU for 2 m grid dependent on number of GPU cards
| |
− | <li> GPU 18-26 times faster than CPU for 10 m grid dependent on number of GPU cards
| |
− | <li> Using Multiple GPUs 1.6 to 2.3 times faster than 1 GPU for 2m model
| |
− | <li> Using Multiple GPUs 1.3 to 1.4 times faster than 1 GPU for 10m model
| |
− | <li> GPU performance increases with increasing model size as does the use of multiple GPUs where initialisation isn’t the major factor in run times.</li>
| |
− |
| |
− | If you have done any testing on much larger models, then we would love to hear how you have gone!!! Please send in details to support@tuflow.com.
| |
− |
| |
− | ===General Comments on GPU Model Memory Requirements===
| |
− | Without infiltration, you can model about 15 million cells per GB of GPU RAM, with infiltration it is about 12 million cells per GB. A card with 6 GB of RAM allows about 75 million cells. However, as the pre and post processing is handled by the TUFLOW engine, such a model would also require significant amounts of motherboard RAM as well. You can also run models across multiple GPU cards allowing for even larger models to be simulated. For example it is possible to run 180 million cells with infiltration losses over four GTX680 cards (4 GB each).
| |
− |
| |
− |
| |
− | {{Tips Navigation
| |
− | |uplink=[[TUFLOW_Benchmarking | Back to TUFLOW Benchmarking]]
| |
− | }}
| |