Running WhisperX on Ubuntu 24.04: Environment Setup with Docker

I’ve been into local LLMs lately, and on a whim, I decided to try “speaker diarization & subtitle creation with WhisperX.” However, when I tried to run it on my Ubuntu 24.04 environment with an RTX 4080, I got an error that libcudnn_cnn_infer.so.8 was missing. After some research, it seems that the combination of CUDA 11.8 + cuDNN 8 cannot be installed directly on Ubuntu 24.04.

After trying various things, I found that creating an environment based on Ubuntu 22.04 with Docker was the smoothest solution. I’m leaving this as a memo of the setup procedure.

Installing the NVIDIA Container Toolkit

To use the GPU from Docker, first set up the NVIDIA Container Toolkit. The procedure is as per the official documentation ⧉. For reference, here are the commands:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1
sudo apt-get install -y \
      nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}

Getting the Base Image

Get the latest image of Ubuntu 22.04 (Jammy).

docker pull ubuntu:jammy-20250730

Creating the Dockerfile

Create the following Dockerfile. It installs CUDA 11.8 + cuDNN 8, and then WhisperX and its dependencies. CUDA 12 is also installed as it is required by cublas.

1
FROM ubuntu:jammy-20250730
2

3
# Update the system in the container to the latest state
4
RUN apt update && apt install -y --no-install-recommends \
5
    build-essential \
6
    software-properties-common \
7
    wget \
8
    gnupg \
9
    git \
10
    ffmpeg \
11
    python3-pip \
12
    ca-certificates
13

14
# Add NVIDIA CUDA repository & install CUDA 11.8 + cuDNN 8 + CUDA 12.3
15
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb && \
16
    dpkg -i cuda-keyring_1.0-1_all.deb && \
17
    apt update && \
18
    apt -y install cuda-11-8 libcudnn8 libcudnn8-dev cuda-12-3
19

20
# Set path
21
ENV PATH="/usr/local/cuda-11.8/bin:${PATH}"
22
ENV LD_LIBRARY_PATH="/usr/local/cuda-11.8/lib64:/usr/local/cuda-12.3/lib64:${LD_LIBRARY_PATH}"
23

24
# Install WhisperX & pyannote.audio
25
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 && \
26
    pip install whisperx && \
27
    pip install "pyannote.audio"
28

29
# Default command when the container starts
30
CMD ["/bin/bash"]

Building the Image

docker build -t ubuntu:whisperx .

Running WhisperX

Prepare an audio file on the host and run it from the container.

Example: /home/me/Documents/makesrt/audio.wav

docker run \
  -v /home/me/Documents/makesrt:/root/makesrt \
  --gpus all \
  --rm \
  -w /root/makesrt \
  ubuntu:whisperx \
  whisperx "audio.wav" \
    --compute_type "float16" \
    --device "cuda" \
    --language "en" \
    --diarize \
    --hf_token <HUGGINGFACE_TOKEN> \
    --output_dir "./output" \
    --output_format "all"

Option Explanations

-v: Share a directory between the host and the container
-w: Working directory
--gpus all: Use all GPUs
--diarize: Enable speaker diarization (requires Hugging Face Token)
--compute_type "float16": Accelerate on GPUs like RTX 4080

This will run WhisperX on the GPU. It’s much faster than CPU.

Summary

When trying to run WhisperX on Ubuntu 24.04, you will run into version constraints with CUDA/cuDNN.

I think there are various ways to do this, but this time I solved it by creating a Docker image based on Ubuntu 22.04.

Does not pollute the local environment
Can fully utilize the GPU
Highly reproducible in other environments

Conclusion: Docker is justice.

Running WhisperX on Ubuntu 24.04: Environment Setup with Docker