Getting started with `llama-cpp-python` Docker image

Get started with llama-cpp-python in minutes using the official Docker image. Learn how to download models, start the container, and more.

Official Docker image

llama-cpp-python provides the repository ghcr.io/abetlen/llama-cpp-python which supports two architectures: amd64 and arm64.

Here are image details:

Base image: Python 3 (Debian)
Working directory: /app
Environment variables: HOST=0.0.0.0, PORT=8000
Exposed port: 8000
Default command: /bin/sh /app/docker/simple/run.sh

The script /app/docker/simple/run.sh builds the application and runs a Uvicorn server listening on $HOST and $PORT.

Quick start

Download models

To download a GGUF file from the Hugging Face Hub, follow these steps:

Go to the Hugging Face Hub (https://huggingface.co/models).
Search for the model you want to download (e.g. lmstudio-community/Llama-3.2-1B-Instruct-GGUF).
Look for the GGUF file you want to download (e.g. Llama-3.2-1B-Instruct-Q4_K_M.gguf).
Click the "Download" icon.

Start the container

To start the container, run the following command:

docker run --rm -it -p 8000:8000 -v /path/to/models:/models -e MODEL=/models/Llama-3.2-1B-Instruct-Q4_K_M.gguf ghcr.io/abetlen/llama-cpp-python:latest

The command starts the server accessible at http://localhost:8000. Replace /path/to/models with the actual path to your models directory.

Docker Compose

You can also use Docker Compose to define and run a container with the configuration file (compose.yaml).

yaml

services:
  llama-cpp-python:
    image: ghcr.io/abetlen/llama-cpp-python:latest
    ports:
      - 8000:8000
    volumes:
      - /path/to/models:/models
    environment:
      MODEL: /models/Llama-3.2-1B-Instruct-Q4_K_M.gguf

Install llama-cpp-python with ease

Getting started with llama-cpp-python Docker image ​

Official Docker image ​

Quick start ​

Download models ​

Start the container ​

Docker Compose ​

Related posts ​