Appearance
Getting started with llama-cpp-python Docker image
Get started with llama-cpp-python in minutes using the official Docker image. Learn how to download models, start the container, and more.
Official Docker image
llama-cpp-python provides the repository ghcr.io/abetlen/llama-cpp-python
which supports two architectures: amd64
and arm64
.
Here are image details:
- Base image: Python 3 (Debian)
- Working directory:
/app
- Environment variables:
HOST=0.0.0.0
,PORT=8000
- Exposed port:
8000
- Default command:
/bin/sh /app/docker/simple/run.sh
The script /app/docker/simple/run.sh
builds the application and runs a Uvicorn server listening on $HOST
and $PORT
.
Quick start
Download models
To download a GGUF file from the Hugging Face Hub, follow these steps:
- Go to the Hugging Face Hub (https://huggingface.co/models).
- Search for the model you want to download (e.g.
lmstudio-community/Llama-3.2-1B-Instruct-GGUF
). - Look for the GGUF file you want to download (e.g.
Llama-3.2-1B-Instruct-Q4_K_M.gguf
). - Click the "Download" icon.
Start the container
To start the container, run the following command:
sh
docker run --rm -it -p 8000:8000 -v /path/to/models:/models -e MODEL=/models/Llama-3.2-1B-Instruct-Q4_K_M.gguf ghcr.io/abetlen/llama-cpp-python:latest
The command starts the server accessible at http://localhost:8000
. Replace /path/to/models
with the actual path to your models directory.
Docker Compose
You can also use Docker Compose to define and run a container with the configuration file (compose.yaml
).
yaml
services:
llama-cpp-python:
image: ghcr.io/abetlen/llama-cpp-python:latest
ports:
- 8000:8000
volumes:
- /path/to/models:/models
environment:
MODEL: /models/Llama-3.2-1B-Instruct-Q4_K_M.gguf