Creating a Docker image for running CUDA and llama.cpp

22 November 2025
Local LLM

Installed nvidia-container toolkit
Create Dockerfile based on examples and edit as needed
Edited Dockerfile so CUDA and Ubuntu versions matched local system

Install nvidia-container-toolkit

sudo wget -qO /etc/apt/keyrings/nvidia-container-toolkit.asc https://nvidia.github.io/libnvidia-container/gpgkey
echo "deb [signed-by=/etc/apt/keyrings/nvidia-container-toolkit.asc] https://nvidia.github.io/libnvidia-container/stable/deb/amd64 /" | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt install -y nvidia-container-toolkit
sudo service docker restart

Verify toolkit install

Run a sample image to check status. 8.6 is NVIDIAs identifier for compute capability: card used, features available, etc

docker run -it --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Ampere" with compute capability 8.6

> Compute 8.6 CUDA device: [NVIDIA GeForce RTX 3060]
28672 bodies, total time for 10 iterations: 22.530 ms
= 364.883 billion interactions per second
= 7297.664 single-precision GFLOP/s at 20 flops per interaction

Edit Dockerfile

Run "cat /etc/*release" to get Ubuntu version and nvidia-smi to see CUDA version. Then edit Dockerfile. Working version below but haven't cleaned up into final version.

# Use a base image with CUDA 12.9 and Ubuntu 24.04
FROM nvidia/cuda:12.9.0-devel-ubuntu24.04

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PATH="/usr/local/cuda-12.9/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/cuda-12.9/lib64:${LD_LIBRARY_PATH}"
ENV CUDACXX="/usr/local/cuda-12.9/bin/nvcc"


# Update and install necessary packages
RUN apt-get update && apt-get install -y \
    build-essential \
    git \
    cmake \
    curl libcurl4-openssl-dev \
    libssl-dev \
    python3 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

# Clone llama.cpp repository
WORKDIR /app
RUN git clone --recursive https://github.com/ggerganov/llama.cpp.git

# Build llama.cpp with CUDA support
WORKDIR /app/llama.cpp
RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_CUDA=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DLLAMA_BUILD_TESTS=OFF ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined .  
RUN cmake --build build --config Release -j$(nproc)

← Previous
Using Aider for connecting to llama.cpp
Next →
Recapping early Aider installation and agent-assisted specs