Hello everyone,

Today the nvidia driver on my server stopped working out of nowhere. Yesterday it was working and today it’s not. I didn’t do anything in yesterday or today.

Today my Plex container stopped working because there was a problem with the nvidia card I was using for transcoding. It’s a GTX 1650.

I tried running nvidia-smi and it said Failed to initialize NVML: Driver/library version mismatch. After I tried upgrading my system because it was a months ago I upgraded, maybe it will help. It didn’t. I tried some rebooting because some sources said it solves the issue but it persisted.

It’s driver reinstall time. Purged the driver with apt purge nvidia* then installed driver with ubuntu-drivers install --gpgpu nvidia:525-server. After reboot nvidia-smi gives the error NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running..

lsmod | grep nvidia shows nothing and /proc/driver/nvidia/version doesn’t exists. I tried starting nvidia-persistenced with systemctl but it gives this error:

Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 113 has read and write permissions for those files.

/dev/nvidia* doesn’t exist.

I’m very noobish when it comes to nvidia and linux it was a pain to set it up initially and I was hoping that it wouldn’t go wrong someday. But here I am unfortunatelly. I don’t really know what logs should I show you or what commands should I run to troubleshoot so every tip is appreciated and I will provide logs and things like that if needed.

System info:

  • Ubuntu Server 22.04
  • kernel: 5.15.0-76-generic
  • theoretically installed nvidia driver: nvidia-driver-525-server

Solution

I was using the ubuntu-drivers utility to install the driver but turns out it’s not that great. After installing with the manual method from https://help.ubuntu.com/community/NvidiaDriversInstallation using the command apt install linux-modules-nvidia-${DRIVER_BRANCH}${SERVER}-${LINUX_FLAVOUR} it’s working again.

  • @germanatlas
    link
    English
    11 year ago

    Had a similar issue, downloading the GPUs exact driver from nvidia, installing it and restarting worked.