3. Ubuntu安装Nvidia驱动

中

1. Preliminary Preparation

1.1 Check GPU Information

lspci | grep -i nvidia

1.2 Configure the Kernel

dnf install -y gcc dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r)

The installed version must match the current kernel version.

1.3 Disable nouveau

# Check nouveau
lsmod | grep nouveau

# Disable nouveau
cat >  /etc/modprobe.d/blacklist.conf << EOF
blacklist nouveau
options nouveau modeset=0
EOF

1.4 Update initramfs

# AlmaLinux/RockyLinux
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)

# Ubuntu
sudo update-initramfs -u

After completing this step, reboot the operating system before proceeding to the next step.

2. Install the Driver

2.1 Download the Driver

Download the driver for your corresponding graphics card from NVIDIA Driver Downloads. It is recommended to use the .run executable file. If you plan to install the CUDA Toolkit (which includes the driver), you can skip this step.

2.2 Install the Driver

bash NVIDIA-Linux-x86_64-470.256.02.run
or
bash NVIDIA-Linux-x86_64-470.256.02.run --kernel-source-path=/usr/src/kernels/$(uname -r) -k $(uname -r)

2.3 Verify Installation

nvidia-smi

If GPU-related information is returned, the installation was successful.

3. Install the CUDA Toolkit

3.1 Download the CUDA Installer

Visit CUDA and select the operating system and version that matches your GPU. Since the CUDA Toolkit includes the driver, you can skip Step 2 and proceed directly with the CUDA Toolkit installation. Driver versions corresponding to the CUDA Toolkit.

3.2 Install CUDA

bash cuda_11.4.0_470.256.02_linux.run

If the driver is already installed, be sure to deselect the driver installation option; otherwise, the installation may fail.

3.3 Verify Installation

/usr/local/cuda/bin/nvcc -V

If CUDA version information is returned, the installation was successful.

4. Install nvidia-fabricmanager

4.1 Add Software Repository

# AlmaLinux/RockyLinux
# Add the repository corresponding to your system version
dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo

# Ubuntu
# Add the repository corresponding to your system version
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin

mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600

wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub

apt-key add 7fa2af80.pub

rm 7fa2af80.pub

echo "deb http://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list

4.2 Install nvidia-fabric-manager

# AlmaLinux/RockyLinux
dnf module enable -y nvidia-driver:470
dnf install -y nvidia-fabric-manager:470.256.02 nvidia-fabric-manager-devel-0:470.256.02

# Ubuntu
apt-get update
apt-get -y install nvidia-fabricmanager-470=470.256.02-1

4.3 Start the Service

systemctl start nvidia-fabricmanager
systemctl status nvidia-fabricmanager
systemctl enable nvidia-fabricmanager

4.4 Verify

nvidia-smi topo -m

If the returned results contain NV*, it indicates NVLink connections exist between GPUs. If all expected GPUs have NVLink connections and there are no error messages, then NVLink should be functioning normally.

Buy me a coke 🥤