Difference between revisions of "Configure CUDA device selection"

From Tuflow
Jump to navigation Jump to search
m (fix syntax error)
(rewrite for a most consistent style)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
The computer you use to run TUFLOW may have multiple GPUs. These can be multiple NVIDIA GPUs with CUDA-capabilities, which you may want to use to accelerate running your models. Or they can be additional GPUs for other purposes like rendering the interactive desktop for users of the computer, or other computational tasks. A common occurrence on modern motherboards is the availability of an integrated GPU.
+
Computers running TUFLOW may have multiple GPUs. These may be multiple NVIDIA GPUs with CUDA capabilities, used to accelerate simulation runs. Alternatively, they can be GPUs used for purposes such as rendering the interactive desktop or handling other computational tasks. A common occurrence on modern motherboards is the availability of an integrated GPU.
  
Generally, we recommend using a GPU you don't use for TUFLOW modelling as your primary GPU for rendering the desktop, if needed. If you don't have an additional GPU available, you can use one of the NVIDIA GPUs, be we would then recommend using the most capable card as the primary card for running your models, and the secondary card as the primary GPU for rendering the desktop.
+
Generally, it is recommended to use a GPU that is not used for TUFLOW modelling as the primary GPU for rendering the desktop, if needed. If there is no additional GPU available, one of the NVIDIA GPUs can be used, in which case it is recommended to use the most capable card for running your models and the less capable one for rendering the desktop.
  
TUFLOW allows you to select a specific GPU for its compute, using command line options like <code>-pu0</code> for the first GPU, <code>-pu1</code> for the second, etc. (see [[HPC Running and Converting Models]])   
+
TUFLOW allows selection of a specific GPU for computation using command-line options such as -pu0 for the first GPU, -pu1 for the second, and so on. (See [[HPC Running and Converting Models]].)   
  
However, you may find that what TUFLOW considers the first or second GPU does not match your expectations based on what you see in tools like the Windows Device Manager, Task Manager, or the output from <code>nvidia-smi</code> on the command line. Another common problem is that the GPUs you want to use are not actually #0 and #1 and you may have trouble selecting the cards you prefer, in the order you prefer them in.
+
However, what TUFLOW considers the first or second GPU may not match the order shown in tools such as Windows Device Manager, Task Manager, or the output of <code>nvidia-smi</code> on the command line. Another common problem is that the needed GPUs are not actually in the expected order and may cause difficulty selecting GPUs in the preferred order.
  
To this end, you can set an environment variable called <code>CUDA_VISIBLE_DEVICES</code>, which limits the devices that will be visible to CUDA-capable applications like TUFLOW, as well as specifying the order they will appear in. The rest of this article will explain how to go about that. As an example, we'll use a Windows computer that has 2 NVIDIA GPUs, and an on-board AMD GPU. In Windows, you can list all the available GPUs using a Powershell command like this:
+
To this end, an environment variable called <code>CUDA_VISIBLE_DEVICES</code> limits the devices that will be visible to CUDA-capable applications like TUFLOW, as well as specifying the order they will appear in. The remainder of this article outlines how to configure that setting. As an example, a Windows computer is used that has 2 NVIDIA GPUs, and an on-board AMD GPU. In Windows, all available GPUs can be listed using a PowerShell command like this:
 
<syntaxhighlight lang="powershell">
 
<syntaxhighlight lang="powershell">
 
Get-CimInstance -Namespace root\cimv2 -ClassName Win32_VideoController | Select-Object DeviceID, Name
 
Get-CimInstance -Namespace root\cimv2 -ClassName Win32_VideoController | Select-Object DeviceID, Name
 
</syntaxhighlight>
 
</syntaxhighlight>
(you can run PowerShell commands by opening PowerShell from the Windows Start Menu and pasting a command there)
+
(PowerShell commands can be run by opening PowerShell from the Windows Start Menu and pasting a command there)
  
The output for the example computer looks like this (note that even virtual adapters like a Remote Desktop adapter will show):
+
The output for the example computer is as follows (note that virtual adapters, such as a Remote Desktop adapter, will also appear):
 
<pre>
 
<pre>
 
DeviceID        Name
 
DeviceID        Name
Line 22: Line 22:
 
VideoController4 NVIDIA GeForce RTX 4090
 
VideoController4 NVIDIA GeForce RTX 4090
 
</pre>
 
</pre>
In this case, we only need 'VideoController3' and 'VideoController4' to be visible to CUDA-enabled applications like TUFLOW. We can get more details on those by running the following command (from either PowerShell, Command Prompt, or a Linux shell):
+
In this case, only 'VideoController3' and 'VideoController4' need to be visible to CUDA-enabled applications like TUFLOW. More details on those can be obtained by running the following command (from either PowerShell, Command Prompt, or a Linux shell):
<syntaxhighlight>
+
<syntaxhighlight lang="batch">
 
nvidia-smi --query-gpu=name,uuid --format=csv,noheader,nounits
 
nvidia-smi --query-gpu=name,uuid --format=csv,noheader,nounits
 
</syntaxhighlight>
 
</syntaxhighlight>
And the output looks like this:
+
And the output is as follows:
 
<pre>
 
<pre>
 
NVIDIA GeForce RTX 4090, GPU-5060f556-4eb4-7155-4020-abadcb2fd735
 
NVIDIA GeForce RTX 4090, GPU-5060f556-4eb4-7155-4020-abadcb2fd735
 
NVIDIA GeForce RTX 4090, GPU-f3825978-37f8-b933-5327-583196d560cd
 
NVIDIA GeForce RTX 4090, GPU-f3825978-37f8-b933-5327-583196d560cd
 
</pre>
 
</pre>
The tool won't list the AMD card, but up to and including version 2025.1 of TUFLOW, that card may still interfere with your GPU selection order. Also, from this readout, it is not at all clear which card is which and the order here may not match the order you expect from tools like Task Manager ('GPU 0', 'GPU 1', etc.).
+
The tool does not list the AMD card, but up to and including version 2025.1 of TUFLOW, that card may still interfere with the GPU selection order. Also, from this readout, it is not at all clear which card is which and the order here may not match the order you expect from tools like Task Manager ('GPU 0', 'GPU 1', etc.).
  
This is what we will solve by setting the environment variable <code>CUDA_VISIBLE_DEVICES</code>. There are two possible formats. It can either have a value like <code=>0,1</code> or a more explicit value like <code>GPU-5060f556-4eb4-7155-4020-abadcb2fd735,GPU-f3825978-37f8-b933-5327-583196d560cd</code> using the identifiers from the <code>nvidia-smi</code> output.  
+
This issue can be resolved by setting the environment variable <code>CUDA_VISIBLE_DEVICES</code>. There are two possible formats. It can either have a value like <code>0,1</code> or a more explicit value like <code>GPU-5060f556-4eb4-7155-4020-abadcb2fd735,GPU-f3825978-37f8-b933-5327-583196d560cd</code> using the identifiers from the <code>nvidia-smi</code> output.  
  
The short format just affects the default order. If you find using <code>-pu0</code> with TUFLOW selects the GPU you'd consider #1 and vice versa, you could set <code>CUDA_VISIBLE_DEVICES</code> to <code>1,0</code>, to reverse the default order. However, this order may change as you install new hardware or reinstall existing hardware, so the recommendation is to use the explicit values in the long format.
+
The short format just affects the default order. If using <code>-pu0</code> with TUFLOW selects the GPU considered #1 and vice versa, setting <code>CUDA_VISIBLE_DEVICES</code> to <code>1,0</code> reverses the default order. However, this order may change as new hardware is installed, or existing hardware reinstalled. The recommendation is to use the explicit values in the long format.
  
You can either set the value of the environment variable at the start of scripts you use to run your models, like batch files, PowerShell scripts, or Linux shell scripts, or you can set it globally so that it automatically applies to all running applications.
+
The value of the environment variable can either be set at the start of scripts used to run models, like batch files, PowerShell scripts, or Linux shell scripts, or globally so that it automatically applies to all running applications.
  
In a batch file or from the Command Prompt use this (note there are no quotes around the values, replace the values with the identifiers for your GPUs):
+
In a batch file or from the Command Prompt use this (note there are no quotes around the values, replace the values with the identifiers for detected GPUs):
 
<syntaxhighlight lang="dos">
 
<syntaxhighlight lang="dos">
 
SET CUDA_VISIBLE_DEVICES=GPU-5060f556-4eb4-7155-4020-abadcb2fd735,GPU-f3825978-37f8-b933-5327-583196d560cd
 
SET CUDA_VISIBLE_DEVICES=GPU-5060f556-4eb4-7155-4020-abadcb2fd735,GPU-f3825978-37f8-b933-5327-583196d560cd
 
</syntaxhighlight>
 
</syntaxhighlight>
In a PowerShell script or from the PowerShell prompt use this (note the quotes around the values, replace the values with the identifiers for your GPUs):
+
In a PowerShell script or from the PowerShell prompt use this (note the quotes around the values, replace the values with the identifiers for detected GPUs):
 
<syntaxhighlight lang="powershell">
 
<syntaxhighlight lang="powershell">
 
$env:CUDA_VISIBLE_DEVICES = "GPU-5060f556-4eb4-7155-4020-abadcb2fd735,GPU-f3825978-37f8-b933-5327-583196d560cd"
 
$env:CUDA_VISIBLE_DEVICES = "GPU-5060f556-4eb4-7155-4020-abadcb2fd735,GPU-f3825978-37f8-b933-5327-583196d560cd"
 
</syntaxhighlight>
 
</syntaxhighlight>
  
If you prefer to set the value globally, you can either set it for a single user account by finding "Edit environment variables ''for your account''" in the Windows Start menu and entering the values without quotes, or you can set it for all users on the machine by finding "Edit the ''system'' environment variables" in the Windows Start menu and doing the same in the 'System Variables' section. Note that you need to be an administrator to be able to do the latter.  
+
If a globally set value is preferred, it can either be set for a single user account by finding "Edit environment variables ''for your account''" in the Windows Start menu and entering the values without quotes, or it can be set for all users on the machine by finding "Edit the ''system'' environment variables" in the Windows Start menu and doing the same in the 'System Variables' section. Note that administrator rights are required (elevation) to be able to do the latter.  
  
'''Warning:''' setting the value globally affects all CUDA-capable applications, not just TUFLOW. Please ensure that no other applications need the CUDA-capabilities of the GPUs you're leaving out or use a local value in your scripts or batch files instead.
+
'''Warning:''' setting the value globally affects all CUDA-capable applications, not just TUFLOW. Please ensure that no other applications need the CUDA capabilities of the GPUs that are left out or use a local value in scripts or batch files instead.

Latest revision as of 13:51, 18 June 2025

Computers running TUFLOW may have multiple GPUs. These may be multiple NVIDIA GPUs with CUDA capabilities, used to accelerate simulation runs. Alternatively, they can be GPUs used for purposes such as rendering the interactive desktop or handling other computational tasks. A common occurrence on modern motherboards is the availability of an integrated GPU.

Generally, it is recommended to use a GPU that is not used for TUFLOW modelling as the primary GPU for rendering the desktop, if needed. If there is no additional GPU available, one of the NVIDIA GPUs can be used, in which case it is recommended to use the most capable card for running your models and the less capable one for rendering the desktop.

TUFLOW allows selection of a specific GPU for computation using command-line options such as -pu0 for the first GPU, -pu1 for the second, and so on. (See HPC Running and Converting Models.)

However, what TUFLOW considers the first or second GPU may not match the order shown in tools such as Windows Device Manager, Task Manager, or the output of nvidia-smi on the command line. Another common problem is that the needed GPUs are not actually in the expected order and may cause difficulty selecting GPUs in the preferred order.

To this end, an environment variable called CUDA_VISIBLE_DEVICES limits the devices that will be visible to CUDA-capable applications like TUFLOW, as well as specifying the order they will appear in. The remainder of this article outlines how to configure that setting. As an example, a Windows computer is used that has 2 NVIDIA GPUs, and an on-board AMD GPU. In Windows, all available GPUs can be listed using a PowerShell command like this:

Get-CimInstance -Namespace root\cimv2 -ClassName Win32_VideoController | Select-Object DeviceID, Name

(PowerShell commands can be run by opening PowerShell from the Windows Start Menu and pasting a command there)

The output for the example computer is as follows (note that virtual adapters, such as a Remote Desktop adapter, will also appear):

DeviceID         Name
--------         ----
VideoController1 AMD Radeon(TM) Graphics
VideoController2 Microsoft Remote Display Adapter
VideoController3 NVIDIA GeForce RTX 4090
VideoController4 NVIDIA GeForce RTX 4090

In this case, only 'VideoController3' and 'VideoController4' need to be visible to CUDA-enabled applications like TUFLOW. More details on those can be obtained by running the following command (from either PowerShell, Command Prompt, or a Linux shell):

nvidia-smi --query-gpu=name,uuid --format=csv,noheader,nounits

And the output is as follows:

NVIDIA GeForce RTX 4090, GPU-5060f556-4eb4-7155-4020-abadcb2fd735
NVIDIA GeForce RTX 4090, GPU-f3825978-37f8-b933-5327-583196d560cd

The tool does not list the AMD card, but up to and including version 2025.1 of TUFLOW, that card may still interfere with the GPU selection order. Also, from this readout, it is not at all clear which card is which and the order here may not match the order you expect from tools like Task Manager ('GPU 0', 'GPU 1', etc.).

This issue can be resolved by setting the environment variable CUDA_VISIBLE_DEVICES. There are two possible formats. It can either have a value like 0,1 or a more explicit value like GPU-5060f556-4eb4-7155-4020-abadcb2fd735,GPU-f3825978-37f8-b933-5327-583196d560cd using the identifiers from the nvidia-smi output.

The short format just affects the default order. If using -pu0 with TUFLOW selects the GPU considered #1 and vice versa, setting CUDA_VISIBLE_DEVICES to 1,0 reverses the default order. However, this order may change as new hardware is installed, or existing hardware reinstalled. The recommendation is to use the explicit values in the long format.

The value of the environment variable can either be set at the start of scripts used to run models, like batch files, PowerShell scripts, or Linux shell scripts, or globally so that it automatically applies to all running applications.

In a batch file or from the Command Prompt use this (note there are no quotes around the values, replace the values with the identifiers for detected GPUs):

SET CUDA_VISIBLE_DEVICES=GPU-5060f556-4eb4-7155-4020-abadcb2fd735,GPU-f3825978-37f8-b933-5327-583196d560cd

In a PowerShell script or from the PowerShell prompt use this (note the quotes around the values, replace the values with the identifiers for detected GPUs):

$env:CUDA_VISIBLE_DEVICES = "GPU-5060f556-4eb4-7155-4020-abadcb2fd735,GPU-f3825978-37f8-b933-5327-583196d560cd"

If a globally set value is preferred, it can either be set for a single user account by finding "Edit environment variables for your account" in the Windows Start menu and entering the values without quotes, or it can be set for all users on the machine by finding "Edit the system environment variables" in the Windows Start menu and doing the same in the 'System Variables' section. Note that administrator rights are required (elevation) to be able to do the latter.

Warning: setting the value globally affects all CUDA-capable applications, not just TUFLOW. Please ensure that no other applications need the CUDA capabilities of the GPUs that are left out or use a local value in scripts or batch files instead.