Skip to content

llama-cpp-pythondocker

Getting started with llama-cpp-python Docker image

Get started with llama-cpp-python in minutes using the official Docker image. Learn how to download models, start the container, and more.

Official Docker image

llama-cpp-python provides the repository ghcr.io/abetlen/llama-cpp-python which supports two architectures: amd64 and arm64.

Here are image details:

  • Base image: Python 3 (Debian)
  • Working directory: /app
  • Environment variables: HOST=0.0.0.0, PORT=8000
  • Exposed port: 8000
  • Default command: /bin/sh /app/docker/simple/run.sh

The script /app/docker/simple/run.sh builds the application and runs a Uvicorn server listening on $HOST and $PORT.

Quick start

Download models

To download a GGUF file from the Hugging Face Hub, follow these steps:

  1. Go to the Hugging Face Hub (https://huggingface.co/models).
  2. Search for the model you want to download (e.g. lmstudio-community/Llama-3.2-1B-Instruct-GGUF).
  3. Look for the GGUF file you want to download (e.g. Llama-3.2-1B-Instruct-Q4_K_M.gguf).
  4. Click the "Download" icon.

Start the container

To start the container, run the following command:

sh
docker run --rm -it -p 8000:8000 -v /path/to/models:/models -e MODEL=/models/Llama-3.2-1B-Instruct-Q4_K_M.gguf ghcr.io/abetlen/llama-cpp-python:latest

The command starts the server accessible at http://localhost:8000. Replace /path/to/models with the actual path to your models directory.

Docker Compose

You can also use Docker Compose to define and run a container with the configuration file (compose.yaml).

yaml
services:
  llama-cpp-python:
    image: ghcr.io/abetlen/llama-cpp-python:latest
    ports:
      - 8000:8000
    volumes:
      - /path/to/models:/models
    environment:
      MODEL: /models/Llama-3.2-1B-Instruct-Q4_K_M.gguf