Content deleted Content added

Inline

Revision as of 10:13, 11 June 2025

This page provides general hardware advice for running TUFLOW models on GPU or CPU.

Introduction

We often get asked about the optimum computing setup to run TUFLOW models. While every model is different and will interact differently with your hardware there is some general advice that we can offer. In the sections below you will find more detailed advice on GPU and CPU but generally:

The amount of RAM in the computer will be the limiter for the size of model you can run. This applies to CPU RAM (TUFLOW Classic, TUFLOW FV and TUFLOW HPC with Hardware == CPU) and also GPU RAM (TUFLOW HPC and TUFLOW FV with Hardware == GPU).
The processing speed of the CPU, the architecture, cache size, speed and number of processors play a role.
For GPU simulations, the number of CUDA cores, the core speed, GPU card architecture, memory speed and interfacing with the motherboard PCI lanes and CPU are all important.
The system must be well cooled to avoid throttling (meaning reduction of clock speeds to reduce heating).

For information on minimum and recommended system requirement, see System Requirements.

To discover your computer's NVIDIA GPU hardware, see NVIDIA GPU Hardware and Usage.

The TUFLOW Software Suite

The TUFLOW Software suite has a range of solvers. Each interact differently with your hardware so pairing the correct solver (or the range of solvers you want to run) and hardware is an important consideration. A brief summary of each solver's needs is provided as follows:

TUFLOW Classic: A single model run can only use the CPU and cannot be run across multiple CPU cores or GPU hardware. In general terms: The maximum model size is dependent on the available CPU RAM and the runtime is driven by the CPU speed, architecture and cache size.
TUFLOW HPC - Run on CPU Hardware: A single model run uses the CPU and is parallelised to run across multiple cores. In general terms: The maximum model size is dependent on the available CPU RAM and the runtime is driven by the CPU speed, the number of cores available to be run in parallel, architecture and cache size.
TUFLOW HPC - Run on GPU Hardware: A single model run uses the GPU(s) for computation. In general terms: The maximum model size is dependent on the available GPU and CPU RAM and the runtime is driven by the CUDA core speed, the number of CUDA cores available and the GPU architecture. GPU performance is complex and is not easily inferred from GPU clock speed and number of cores, it is also very dependent on the ‘generation’ or architecture of GPU. As TUFLOW HPC requires some data exchange between GPU and CPU, the motherboard bus speeds and CPU speeds also play a role but typically a much lesser role compared to the GPU CUDA compute.
TUFLOW FV - Run on CPU Hardware: A single model run uses CPU and is parallelised to run across multiple cores. In general terms: The maximum model size is dependent on the available CPU RAM and the runtime is determined by the CPU speed, the number of cores available to be run in parallel, chip architecture and cache size.
TUFLOW FV - Run on GPU Hardware: A single model run uses the GPU(s) for computation. In general terms: The maximum model size is dependent on the available GPU and CPU RAM and the runtime is driven by the CUDA core speed, the number of CUDA cores available and the GPU architecture. GPU performance is complex and is not easily inferred from GPU clock speed and number of cores, it is also very dependent on the ‘generation’ or architecture of GPU. As TUFLOW FV requires some data exchange between GPU and CPU, the motherboard bus speeds and CPU speeds also play a role but typically a much lesser role compared to the GPU CUDA compute.

On our Hardware Benchmarking page you can compare recently run combinations of GPU, CPU and RAM with the system you are planning to purchase. We recommend that if building a computer that you seek advise from an appropriate computer hardware vendor who can advise on the compatibility and optimisation of your setup.

GPU Advice

TUFLOW HPC on GPU Hardware is typically our fastest solver for 1D/2D pipe and floodplain simulations.

TUFLOW HPC supports CUDA enabled NVIDIA GPU cards. For list of supported CUDA enabled graphics cards please visit the NVIDIA website.
To discover your computer's NVIDIA GPU hardware, see NVIDIA GPU Hardware and Usage.
TUFLOW HPC on GPU Hardware can be run in either single or double precision. However, for the vast majority of flood applications single precision is sufficient. We typically run our models on single precision. If you are unsure we recommend running with both the single and double precision solvers and comparing your results.

The precision solver you require will determine the type of GPU card that is best suited for your compute. For any given generation/architecture of cards, the “gaming” cards such as the GTX GeForce and RTX provide excellent single precision performance – typically comparable to that of the “scientific” cards such as the Tesla series. If double precision is required then the scientific cards are substantially faster, but these are also significantly more expensive. The Quadro series cards sit in between for both double precision performance and cost. When checking the specifications of the card it should provide you with a breakdown of the single and double precision throughput in flops. Single precision compute is typically sufficient for TUFLOW HPC modelling.

GPU RAM

RAM is the computer memory required to store all of the model data used during the computation. A computer has CPU RAM which is located on the motherboard and accessed from the CPU, and it has GPU RAM which is located on the GPU device and accessed from the GPU. The two memory storage systems are physically separate. The amount of GPU RAM is one of two factors that will determine the size of the model that can be run (the other being CPU RAM). As a rule of thumb, approximately 5 million cells can be run per gigabyte (GB) of GPU RAM depending on the model features, e.g. a model with infiltration requires more memory due to the extra variables needed for the infiltration calculation.

CPU RAM

TUFLOW HPC on GPU hardware still uses the CPU to compute and store data (in CPU RAM) during model initialisation and for all 1D calculations. While we are working on improving our CPU RAM usage, currently we tend to find that CPU RAM is often the limiter to the size of the model domain you can run, particularly if using running over multiple GPU cards. During initialisation and simulation a model will typically require 4-6 times the amount of CPU RAM relative to GPU RAM. As an example, a model that utilises 11GB of GPU RAM (typical memory for high-end gaming card, and corresponds to about a 50 million cell model) the CPU RAM required during initialisation will typically be in range 44GB to 66GB. A model that fully utilises two 11 GB GPUs (i.e. a 100 million cell model) may require as much as 128GB of CPU RAM during initialisation.

CUDA Cores, GPU Clock speed, and FLOPs

One way of reporting a GPU card's throughput is in Floating Point Operations per second (FLOPs). The more FLOPs, the more calculations that can get crunched per second and the faster the model should run. For any given generation of GPU, FLOPs are approximately proportional to number of CUDA cores times the GPU clock speed. However, there have been significant improvements in GPU architecture since the inception of CUDA, and this has contributed to increases in overall FLOPs performance beyond just the increases in cores and clock speed that have occurred over this time.

Multiple GPUs

TUFLOW can use multiple GPU cards on a machine to run a single model (TUFLOW FV can currently use a single GPU only). This is useful for models that are too large for a single GPU, or for running a model as quickly as possible. In general terms the run time benefit of using multiple cards increases with model size.

TUFLOW HPC-GPU does not support SLI for inter-GPU communications.
It does (as of build 2020-01-AA) auto detect and utilise peer-to-peer access over NVLink or PCI bus on the motherboard. Note that not all GPUs support peer-to-peer access.
- PCI bus - this method requires cards that supports TCC driver mode and all cards must be in TCC driver mode. As TUFLOW primarily relies on GPU CUDA capabilities, the impact of using higher or lower PCI slot option is minimal.
- NVLink - high-end compute cards can have up to 8 cards talking to each other through a high-spec NVLink, but many of the less expensive cards are limited to only having two connected together over a dual socket NVLink.
Models may still be run across multiple GPUs even if a NVLink is not present and the GPUs do not support peer-to-peer access. In this case HPC reverts to exchanging the domain boundary data between the GPUs via the CPU. The memory bandwidth between the GPU and the main system is not a critical bottleneck for TUFLOW.
When using multiple GPUs it is best to use cards of similar memory and performance. While it is possible (as of build 2020-01-AA) to re-balance a model over multiple GPUs, we do not recommend using cards with vastly disparate performance.
Sufficient cooling and power supply should be considered if multiple cards are used. When installed in adjacent PCI slots, the preference is to use rear vented cards rather than side vented to avoid blowing hot air onto the neighbouring cards (which could lead to overheating).

GPU Performance Comparison

Extensive GPU hardware speed comparison testing has been completed using TUFLOW's standardised hardware benchmarking dataset. Details for the benchmarking are available via the Hardware Benchmarking page of the Wiki. Review the GPU benchmarking runtime results table to compare the speed performance of different cards. If your GPU card is not listed in the result dataset please download and run the benchmarking dataset, and provide the result summary to support@tuflow.com. We will add the details to the runtime results table.

External videocard benchmark websites can be used to compare GPU cards, for example, PassMark Software - Video Card (GPU) Benchmarks is an excellent performance guide.

CPU Advice

In general terms a more recent architecture, higher clock speed CPU with a large cache will perform better than a slower clock speed chip. This section discusses CPU RAM, RAM speed, Processor frequency, Multi-core processing and hyper-threading.

CPU RAM

The amount of CPU RAM will determine the size of the model that can be run or a number of models that can be run at one time. Faster RAM will result in quicker runtimes, however this is usually a secondary consideration to chip speed, cache size and architecture.

CPU Cores

TUFLOW HPC - Run on GPU Hardware: The parallel processing is being done on the GPU card. However, TUFLOW HPC-GPU still uses the CPU for model initialisation and for 1D calculations. If multiple GPU cards are used, TUFLOW will use the equivalent number of CPU threads for controlling the GPUs and migrating data. So for a machine dedicated to HPC-GPU modelling, the number of CPU cores should be higher than the number of installed GPUs.
TUFLOW HPC - Run on CPU Hardware: HPC model can also be run on multiple CPU cores. For the comparison of simulation speed, please refer to HPC on CPU vs GPU.
TUFLOW Classic: TUFLOW Classic simulation can only use one CPU core due to the implicit nature of the numerical solution. More CPU cores will enable running more simulations at the same time most efficiently.

Hyperthreading

https://fvwiki.tuflow.com/index.php?title=TUFLOW_FV_Parallel_Computing

Processor Frequency and RAM Frequency

The frequency directly affects the run times. In general, the higher the frequency, the faster the model runs.

CPU Performance Comparison

Extensive CPU hardware speed comparison testing has been completed using TUFLOW's standardised hardware benchmarking dataset. Details for the benchmarking are available via the Hardware Benchmarking page of the Wiki. Review the CPU benchmarking runtime results table to compare the speed performance of different chips. If your chip is not listed in the result dataset please download and run the benchmarking dataset, and provide the result summary to support@tuflow.com. We will add the details to the runtime results table.

Storage Advice

Solid state hard drives are preferred for temporary storage as they are faster to write to than traditional hard drives. Large data files can then be transferred to a more permanent location.

Up
TUFLOW Main Page

@@ Line 11: / Line 11: @@
 For information on minimum and recommended system requirement, see <u>[[System_Requirements | System Requirements]]</u>.
-To discover your computer's NVIDIA GPU hardware and usage, see [[/wiki.tuflow.com/Console Window GPU Usage|<u>NVIDIA GPU usage</u>]]. <br>
+To discover your computer's NVIDIA GPU hardware, see <u>[[Console_Window_GPU_Usage | NVIDIA GPU Hardware and Usage]]</u>.<br>
 =The TUFLOW Software Suite=
 The TUFLOW Software suite has a range of solvers. Each interact differently with your hardware so pairing the correct solver (or the range of solvers you want to run) and hardware is an important consideration. A brief summary of each solver's needs is provided as follows:<br>
-* TUFLOW Classic: A single model run can only use the CPU and cannot be run across multiple CPU cores or GPU hardware. In general terms: The maximum model size is dependent on the available CPU RAM and the runtime is driven by the CPU speed, architecture and cache size.
+*TUFLOW Classic: A single model run can only use the CPU and cannot be run across multiple CPU cores or GPU hardware. In general terms: The maximum model size is dependent on the available CPU RAM and the runtime is driven by the CPU speed, architecture and cache size.
 * TUFLOW HPC - Run on CPU Hardware: A single model run uses the CPU and is parallelised to run across multiple cores. In general terms: The maximum model size is dependent on the available CPU RAM and the runtime is driven by the CPU speed, the number of cores available to be run in parallel, architecture and cache size.
-* TUFLOW HPC - Run on GPU Hardware: A single model run uses the GPU(s) for computation. In general terms: The maximum model size is dependent on the available GPU and CPU RAM and the runtime is driven by the CUDA core speed, the number of CUDA cores available and the GPU architecture. GPU performance is complex and is not easily inferred from GPU clock speed and number of cores, it is also very dependent on the ‘generation’ or architecture of GPU. As TUFLOW HPC requires some data exchange between GPU and CPU, the motherboard bus speeds and CPU speeds also play a role but typically a much lesser role compared to the GPU CUDA compute.
+*TUFLOW HPC - Run on GPU Hardware: A single model run uses the GPU(s) for computation. In general terms: The maximum model size is dependent on the available GPU and CPU RAM and the runtime is driven by the CUDA core speed, the number of CUDA cores available and the GPU architecture. GPU performance is complex and is not easily inferred from GPU clock speed and number of cores, it is also very dependent on the ‘generation’ or architecture of GPU. As TUFLOW HPC requires some data exchange between GPU and CPU, the motherboard bus speeds and CPU speeds also play a role but typically a much lesser role compared to the GPU CUDA compute.
-* TUFLOW FV - Run on CPU Hardware: A single model run uses CPU and is parallelised to run across multiple cores. In general terms: The maximum model size is dependent on the available CPU RAM and the runtime is determined by the CPU speed, the number of cores available to be run in parallel, chip architecture and cache size.
+*TUFLOW FV - Run on CPU Hardware: A single model run uses CPU and is parallelised to run across multiple cores. In general terms: The maximum model size is dependent on the available CPU RAM and the runtime is determined by the CPU speed, the number of cores available to be run in parallel, chip architecture and cache size.
-* TUFLOW FV - Run on GPU Hardware: A single model run uses the GPU(s) for computation. In general terms: The maximum model size is dependent on the available GPU and CPU RAM and the runtime is driven by the CUDA core speed, the number of CUDA cores available and the GPU architecture. GPU performance is complex and is not easily inferred from GPU clock speed and number of cores, it is also very dependent on the ‘generation’ or architecture of GPU. As TUFLOW FV requires some data exchange between GPU and CPU, the motherboard bus speeds and CPU speeds also play a role but typically a much lesser role compared to the GPU CUDA compute.<br>
+*TUFLOW FV - Run on GPU Hardware: A single model run uses the GPU(s) for computation. In general terms: The maximum model size is dependent on the available GPU and CPU RAM and the runtime is driven by the CUDA core speed, the number of CUDA cores available and the GPU architecture. GPU performance is complex and is not easily inferred from GPU clock speed and number of cores, it is also very dependent on the ‘generation’ or architecture of GPU. As TUFLOW FV requires some data exchange between GPU and CPU, the motherboard bus speeds and CPU speeds also play a role but typically a much lesser role compared to the GPU CUDA compute.<br>
 On our <u>[[Hardware_Benchmarking_-_Results#CPU_Results | Hardware Benchmarking]]</u> page you can compare recently run combinations of GPU, CPU and RAM with the system you are planning to purchase. We recommend that if building a computer that you seek advise from an appropriate computer hardware vendor who can advise on the compatibility and optimisation of your setup.<br>
@@ Line 27: / Line 27: @@
 TUFLOW HPC on GPU Hardware is typically our fastest solver for 1D/2D pipe and floodplain simulations.
 * TUFLOW HPC supports CUDA enabled NVIDIA GPU cards. For list of supported CUDA enabled graphics cards please visit the <u>[https://developer.nvidia.com/cuda-gpus NVIDIA website]</u>.
-* To discover your computer's NVIDIA GPU hardware and usage, see [[/wiki.tuflow.com/Console Window GPU Usage|<u>NVIDIA GPU usage</u>]].
+*To discover your computer's NVIDIA GPU hardware, see <u>[[Console_Window_GPU_Usage | NVIDIA GPU Hardware and Usage]]</u>.
-* TUFLOW HPC on GPU Hardware can be run in either single or double precision. However, for the vast majority of flood applications single precision is sufficient. We typically run our models on single precision. If you are unsure we recommend running with both the single and double precision solvers and comparing your results.
+*TUFLOW HPC on GPU Hardware can be run in either single or double precision. However, for the vast majority of flood applications single precision is sufficient. We typically run our models on single precision. If you are unsure we recommend running with both the single and double precision solvers and comparing your results.
 The precision solver you require will determine the type of GPU card that is best suited for your compute. For any given generation/architecture of cards, the “gaming” cards such as the GTX GeForce and RTX provide excellent single precision performance – typically comparable to that of the “scientific” cards such as the Tesla series. If double precision is required then the scientific cards are substantially faster, but these are also significantly more expensive. The Quadro series cards sit in between for both double precision performance and cost. When checking the specifications of the card it should provide you with a breakdown of the single and double precision throughput in flops. Single precision compute is typically sufficient for TUFLOW HPC modelling.
@@ Line 38: / Line 38: @@
 TUFLOW HPC on GPU hardware still uses the CPU to compute and store data (in CPU RAM) during model initialisation and for all 1D calculations. While we are working on improving our CPU RAM usage, currently we tend to find that CPU RAM is often the limiter to the size of the model domain you can run, particularly if using running over multiple GPU cards. During initialisation and simulation a model will typically require 4-6 times the amount of CPU RAM relative to GPU RAM. As an example, a model that utilises 11GB of GPU RAM (typical memory for high-end gaming card, and corresponds to about a 50 million cell model) the CPU RAM required during initialisation will typically be in range 44GB to 66GB. A model that fully utilises two 11 GB GPUs (i.e. a 100 million cell model) may require as much as 128GB of CPU RAM during initialisation.
-===CUDA Cores, GPU Clock speed, and FLOPs===
+===CUDA Cores, GPU Clock speed, and FLOPs ===
 One way of reporting a GPU card's throughput is in Floating Point Operations per second (FLOPs). The more FLOPs, the more calculations that can get crunched per second and the faster the model should run. For any given generation of GPU, FLOPs are approximately proportional to number of CUDA cores times the GPU clock speed. However, there have been significant improvements in GPU architecture since the inception of CUDA, and this has contributed to increases in overall FLOPs performance beyond just the increases in cores and clock speed that have occurred over this time.
 ===Multiple GPUs===
 TUFLOW can use multiple GPU cards on a machine to run a single model (TUFLOW FV can currently use a single GPU only). This is useful for models that are too large for a single GPU, or for running a model as quickly as possible. In general terms the run time benefit of using multiple cards increases with model size.
-* TUFLOW HPC-GPU does not support SLI for inter-GPU communications.
+*TUFLOW HPC-GPU does not support SLI for inter-GPU communications.
-* It does (as of build 2020-01-AA) auto detect and utilise peer-to-peer access over NVLink or PCI bus on the motherboard. Note that not all GPUs support peer-to-peer access.
+*It does (as of build 2020-01-AA) auto detect and utilise peer-to-peer access over NVLink or PCI bus on the motherboard. Note that not all GPUs support peer-to-peer access.
-** PCI bus - this method requires cards that supports TCC driver mode and all cards must be in TCC driver mode. As TUFLOW primarily relies on GPU CUDA capabilities, the impact of using higher or lower PCI slot option is minimal.
+**PCI bus - this method requires cards that supports TCC driver mode and all cards must be in TCC driver mode. As TUFLOW primarily relies on GPU CUDA capabilities, the impact of using higher or lower PCI slot option is minimal.
-** NVLink - high-end compute cards can have up to 8 cards talking to each other through a high-spec NVLink, but many of the less expensive cards are limited to only having two connected together over a dual socket NVLink.
+**NVLink - high-end compute cards can have up to 8 cards talking to each other through a high-spec NVLink, but many of the less expensive cards are limited to only having two connected together over a dual socket NVLink.
-* Models may still be run across multiple GPUs even if a NVLink is not present and the GPUs do not support peer-to-peer access. In this case HPC reverts to exchanging the domain boundary data between the GPUs via the CPU. The memory bandwidth between the GPU and the main system is not a critical bottleneck for TUFLOW.
+*Models may still be run across multiple GPUs even if a NVLink is not present and the GPUs do not support peer-to-peer access. In this case HPC reverts to exchanging the domain boundary data between the GPUs via the CPU. The memory bandwidth between the GPU and the main system is not a critical bottleneck for TUFLOW.
-* When using multiple GPUs it is best to use cards of similar memory and performance. While it is possible (as of build 2020-01-AA) to re-balance a model over multiple GPUs, we do not recommend using cards with vastly disparate performance.
+*When using multiple GPUs it is best to use cards of similar memory and performance. While it is possible (as of build 2020-01-AA) to re-balance a model over multiple GPUs, we do not recommend using cards with vastly disparate performance.
-* Sufficient cooling and power supply should be considered if multiple cards are used. When installed in adjacent PCI slots, the preference is to use rear vented cards rather than side vented to avoid blowing hot air onto the neighbouring cards (which could lead to overheating).
+*Sufficient cooling and power supply should be considered if multiple cards are used. When installed in adjacent PCI slots, the preference is to use rear vented cards rather than side vented to avoid blowing hot air onto the neighbouring cards (which could lead to overheating).
 ===GPU Performance Comparison===
@@ Line 64: / Line 64: @@
 Faster RAM will result in quicker runtimes, however this is usually a secondary consideration to chip speed, cache size and architecture.
-===CPU Cores===
+===CPU Cores ===
-* TUFLOW HPC - Run on GPU Hardware: The parallel processing is being done on the GPU card. However, TUFLOW HPC-GPU still uses the CPU for model initialisation and for 1D calculations. If multiple GPU cards are used, TUFLOW will use the equivalent number of CPU threads for controlling the GPUs and migrating data. So for a machine dedicated to HPC-GPU modelling, the number of CPU cores should be higher than the number of installed GPUs.
+*TUFLOW HPC - Run on GPU Hardware: The parallel processing is being done on the GPU card. However, TUFLOW HPC-GPU still uses the CPU for model initialisation and for 1D calculations. If multiple GPU cards are used, TUFLOW will use the equivalent number of CPU threads for controlling the GPUs and migrating data. So for a machine dedicated to HPC-GPU modelling, the number of CPU cores should be higher than the number of installed GPUs.
-* TUFLOW HPC - Run on CPU Hardware: HPC model can also be run on multiple CPU cores. For the comparison of simulation speed, please refer to [[Hardware_Benchmarking_Topic_HPC_on_CPU_vs_GPU | HPC on CPU vs GPU]].
+*TUFLOW HPC - Run on CPU Hardware: HPC model can also be run on multiple CPU cores. For the comparison of simulation speed, please refer to [[Hardware_Benchmarking_Topic_HPC_on_CPU_vs_GPU | HPC on CPU vs GPU]].
-* TUFLOW Classic: TUFLOW Classic simulation can only use one CPU core due to the implicit nature of the numerical solution. More CPU cores will enable running more simulations at the same time most efficiently.
+*TUFLOW Classic: TUFLOW Classic simulation can only use one CPU core due to the implicit nature of the numerical solution. More CPU cores will enable running more simulations at the same time most efficiently.
 ===Hyperthreading===

Hardware Selection Advice: Difference between revisions

Revision as of 10:13, 11 June 2025

Contents

Introduction

The TUFLOW Software Suite

GPU Advice

GPU RAM

CPU RAM

CUDA Cores, GPU Clock speed, and FLOPs

Multiple GPUs

GPU Performance Comparison

CPU Advice

CPU RAM

CPU Cores

Hyperthreading

Processor Frequency and RAM Frequency

CPU Performance Comparison

Storage Advice

Navigation menu

Hardware Selection Advice: Difference between revisions

Revision as of 10:13, 11 June 2025

Introduction

The TUFLOW Software Suite

GPU Advice

GPU RAM

CPU RAM

CUDA Cores, GPU Clock speed, and FLOPs

Multiple GPUs

GPU Performance Comparison

CPU Advice

CPU RAM

CPU Cores

Hyperthreading

Processor Frequency and RAM Frequency

CPU Performance Comparison

Storage Advice

Navigation menu

Search