3. Ubuntu安装Nvidia驱动 中
1. Preliminary Preparation
1.1 Check GPU Information
lspci | grep -i nvidia
1.2 Configure the Kernel
dnf install -y gcc dkms kernel-devel-$(uname -r) kernel-headers-$(uname -r)
- The installed version must match the current kernel version.
1.3 Disable nouveau
# Check nouveau
lsmod | grep nouveau
# Disable nouveau
cat > /etc/modprobe.d/blacklist.conf << EOF
blacklist nouveau
options nouveau modeset=0
EOF
1.4 Update initramfs
# AlmaLinux/RockyLinux
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
# Ubuntu
sudo update-initramfs -u
- After completing this step, reboot the operating system before proceeding to the next step.
2. Install the Driver
2.1 Download the Driver
Download the driver for your corresponding graphics card from NVIDIA Driver Downloads. It is recommended to use the .run executable file. If you plan to install the CUDA Toolkit (which includes the driver), you can skip this step.
2.2 Install the Driver
bash NVIDIA-Linux-x86_64-470.256.02.run
or
bash NVIDIA-Linux-x86_64-470.256.02.run --kernel-source-path=/usr/src/kernels/$(uname -r) -k $(uname -r)
2.3 Verify Installation
nvidia-smi
If GPU-related information is returned, the installation was successful.
3. Install the CUDA Toolkit
3.1 Download the CUDA Installer
Visit CUDA and select the operating system and version that matches your GPU. Since the CUDA Toolkit includes the driver, you can skip Step 2 and proceed directly with the CUDA Toolkit installation. Driver versions corresponding to the CUDA Toolkit.
3.2 Install CUDA
bash cuda_11.4.0_470.256.02_linux.run
- If the driver is already installed, be sure to deselect the driver installation option; otherwise, the installation may fail.
3.3 Verify Installation
/usr/local/cuda/bin/nvcc -V
If CUDA version information is returned, the installation was successful.
4. Install nvidia-fabricmanager
4.1 Add Software Repository
# AlmaLinux/RockyLinux
# Add the repository corresponding to your system version
dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
# Ubuntu
# Add the repository corresponding to your system version
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
apt-key add 7fa2af80.pub
rm 7fa2af80.pub
echo "deb http://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
4.2 Install nvidia-fabric-manager
# AlmaLinux/RockyLinux
dnf module enable -y nvidia-driver:470
dnf install -y nvidia-fabric-manager:470.256.02 nvidia-fabric-manager-devel-0:470.256.02
# Ubuntu
apt-get update
apt-get -y install nvidia-fabricmanager-470=470.256.02-1
4.3 Start the Service
systemctl start nvidia-fabricmanager
systemctl status nvidia-fabricmanager
systemctl enable nvidia-fabricmanager
4.4 Verify
nvidia-smi topo -m
If the returned results contain NV*, it indicates NVLink connections exist between GPUs. If all expected GPUs have NVLink connections and there are no error messages, then NVLink should be functioning normally.