How to Install and Setup CUDA

Sources:

  1. CUDA Toolkit 12.4 Update 1 Downloads
  2. cuDNN Downloads

Introduction

CUDA is a programming model and computing toolkit developed by NVIDIA. It enables you to perform compute-intensive operations faster by parallelizing tasks across GPUs. CUDA is the dominant API used for deep learning although other options are available, such as OpenCL. PyTorch provides support for CUDA in the torch.cuda library.

Note: While this section introduces two ways to install CUDA. If you simply want to run CUDA code, you don't need to install CUDA. In fact, PyTorch has a built-in CUDA runtime (See the next section.) so that you can just install PyTorch (CUDA version) and you can run CUDA code!

  • Note: Only CUDA runtime is included by PyTorch. The toolchain, such as nvcc, is not included.

CUDA and the cudatoolkit refer to the same thing.

Commands

CUDA has both a driver API and a runtime API, and their API versions can be entirely different.

  • Get the driver API version

    1
    nvidia-smi
    • This command points to the GPU driver, and it’s this CUDA version you need when installing Pytorch.
  • Get the runtime API version, you need to do Environment Setup first.

    1
    nvcc --version
  • Monitor graphic card resource:

    1
    watch -n 0.1 nvidia-smi
  • See your cuda runtime installation:

    1
    which nvcc

    or

    1
    ldconfig -p | grep cuda

CUDA (or CUDA Toolkit) installation

NOTE:

  • Some GPUs,such as NVIDIA SXM GPUs, are NOT installed via PCIe, so they can't be detected by sudo lspci either.
  • This section is for normal GPUs, i.e., NVIDIA PCIe GPUs. For cuda installation of NVIDIA datacenter GPUs such as Tesla V100 SXM, please refer to the next section.

Before installation, check your OS version

1
lsb_release -a

and GPU infomation:

1
2
sudo update-pciids
sudo lspci | grep -i vga

update-pciids command updates the local copy of the PCI database.

lspci command checks a local copy of the PCI database to identify the PCI devices it detects.

See CUDA Toolkit 12.5 Downloads for full instructions.a

For instance, the installation instructions for x86_64 Ubuntu22.04 is:

  1. Base Installer:

    1
    2
    3
    4
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    sudo apt-get update
    sudo apt-get -y install cuda-toolkit-12-5
  2. Driver Installer (the legacy kernel module flavor):

    1
    sudo apt-get install -y cuda-drivers

    If you want, install the open kernel module flavor rather than the legacy one using:

    1
    2
    sudo apt-get install -y nvidia-kernel-open-545
    sudo apt-get install -y cuda-drivers-545
  3. Check if CUDA driver is running:

    1
    nvidia-smi

Old cuda

CUDA Toolkit 12.1 Downloads

  1. Base Installer:

    1
    2
    3
    4
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.debsudo 
    dpkg -i cuda-keyring_1.0-1_all.deb
    sudo apt-get update
    sudo apt-get -y install cuda-12-1 # select the version you want
  2. Driver Installer (the legacy kernel module flavor):

    1
    sudo apt-get install -y cuda-drivers
  3. Environment Setup: edit your shell config file, e.g., ~/.zshrc

    1
    2
    3
    export PATH=/usr/local/cuda-12.1/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64\
    ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Install via conda

You can install CUDA and its toolchain(like nvcc) via conda:

1
conda install cuda -c nvidia

CUDA installed by conda is local to the conda environment.

CUDA installation for datacenter GPUs

  1. Go to NVIDIA Driver Downloads webpage to select the appropriate driver for your NVIDIA product. You'll download a XX.run file.
  2. Use sudo bash XX.run to run that file. Follow the graphical instructions, you'll install CUDA Toolkit successfully.

Post-installation actions

After installation, we can (optionally) do Environment Setup. Add this path to the PATH variable:

1
2
export PATH=/usr/local/cuda-12.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

After that, you can use

1
nvcc -v

Uninstall CUDA completely

Source: How to remove cuda completely from ubuntu?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
sudo apt-get purge nvidia*

# Below nvidia-* and libnvidia-* removes drivers also. Better to remove everything and reinstall. The libcudnn8* removed cuDNN.
sudo apt-get --purge remove cuda-* nvidia-* gds-tools-* libcublas-* libcufft-* libcufile-* libcurand-* libcusolver-* libcusparse-* libnpp-* libnvidia-* libnvjitlink-* libnvjpeg-* nsight* nvidia-* libnvidia-* libcudnn8*

# Also run below which gets rid of CUDA 10 and prior stuff.
sudo apt-get --purge remove "*cublas*" "*cufft*" "*curand*" "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*"

# Cleanup uninstall
sudo apt-get autoremove
sudo apt-get autoclean

# remove cuda directories
sudo rm -rf /usr/local/cuda*

# remove from dpkg
sudo dpkg -r cuda
sudo dpkg -r $(dpkg -l | grep '^ii cudnn' | awk '{print $2}')

cudNN installation

See cuDNN Downloads for full instructions.

Base installer:

1
2
3
4
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cudnn

To install for CUDA 11, perform the above configuration but install the CUDA 11 specific package:

1
sudo apt-get -y install cudnn-cuda-11

To install for CUDA 12, perform the above configuration but install the CUDA 11 specific package:

1
sudo apt-get -y install cudnn-cuda-12

Nvidia Container Toolkit installation

The NVIDIA Container Toolkit enables users to build and run GPU-accelerated containers. These containers have built in CUDA support, you can testify it by running nvidia-smi in them.

For images with CUDA support: see this page.

For images with CUDA and cuDNN support: see this page.

Take Ubuntu as an example, to install Nvidia Container Toolkit (->Source):

  1. Configure the production repository:

    1
    2
    3
    4
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

    Optionally, configure the repository to use experimental packages:

    1
    sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
  2. Update the packages list from the repository:

    1
    sudo apt-get update
  3. Install the NVIDIA Container Toolkit packages:

    1
    sudo apt-get install -y nvidia-container-toolkit
  4. Configure the container runtime by using the nvidia-ctk command:

    1
    sudo nvidia-ctk runtime configure --runtime=docker

    The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.

  5. Restart the Docker daemon:

    1
    sudo systemctl restart docker

Now you can run container with --gpus all to enable the CUDA support in the container, e.g.,

1
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Explanation

GPU-Util:

It is a sampled measurement over a time period. For a given time period, it reports what percentage of time one or more GPU kernel(s) was active (i.e. running).

It doesn't tell you anything about how many SMs were used, or how "busy" the code was, or what it was doing exactly, or in what way it may have been using memory.

The above claim(s) can be verified without too much difficulty using a microbenchmarking-type exercise (see below).

Based on the Nvidia docs, The sample period may be between 1 second and 1/6 second depending on the product. However, the period shouldn't make much difference on how you interpret the result.

Also, the word "Volatile" does not pertain to this data item in nvidia-smi. You are misreading the output format.

-->Source