Creating a Docker image for running CUDA and llama.cpp
- Installed nvidia-container toolkit
- Create Dockerfile based on examples and edit as needed
- Edited Dockerfile so CUDA and Ubuntu versions matched local system
Install nvidia-container-toolkit
sudo wget -qO /etc/apt/keyrings/nvidia-container-toolkit.asc https://nvidia.github.io/libnvidia-container/gpgkey
echo "deb [signed-by=/etc/apt/keyrings/nvidia-container-toolkit.asc] https://nvidia.github.io/libnvidia-container/stable/deb/amd64 /" | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt install -y nvidia-container-toolkit
sudo service docker restart
Verify toolkit install
Run a sample image to check status. 8.6 is NVIDIAs identifier for compute capability: card used, features available, etc
docker run -it --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Ampere" with compute capability 8.6
> Compute 8.6 CUDA device: [NVIDIA GeForce RTX 3060]
28672 bodies, total time for 10 iterations: 22.530 ms
= 364.883 billion interactions per second
= 7297.664 single-precision GFLOP/s at 20 flops per interaction
Edit Dockerfile
Run "cat /etc/*release" to get Ubuntu version and nvidia-smi to see CUDA version. Then edit Dockerfile. Working version below but haven't cleaned up into final version.
# Use a base image with CUDA 12.9 and Ubuntu 24.04
FROM nvidia/cuda:12.9.0-devel-ubuntu24.04
# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PATH="/usr/local/cuda-12.9/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/cuda-12.9/lib64:${LD_LIBRARY_PATH}"
ENV CUDACXX="/usr/local/cuda-12.9/bin/nvcc"
# Update and install necessary packages
RUN apt-get update && apt-get install -y \
build-essential \
git \
cmake \
curl libcurl4-openssl-dev \
libssl-dev \
python3 \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
# Clone llama.cpp repository
WORKDIR /app
RUN git clone --recursive https://github.com/ggerganov/llama.cpp.git
# Build llama.cpp with CUDA support
WORKDIR /app/llama.cpp
RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_BUILD_TESTS=OFF ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined .
RUN cmake --build build --config Release -j$(nproc)