Appearance
Install llama-cpp-python with ease
Discover the ease of installing llama-cpp-python, a versatile Python package, and leveraging its multiple backends for optimized performance.
Introduction
llama-cpp-python is a Python package that provides bindings for the llama.cpp library, allowing for low-level access to the C API and high-level Python APIs for text completion. It also includes an OpenAI-compatible web server for local code completion, function calling, vision API support, and multi-model support.
Quick start
To quickly install the llama-cpp-python package, you can use the following command:
sh
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
This command installs a pre-built wheel of the package from GitHub releases, which is specifically built for CPU-only environments. For improved performance, consider using the backends.
Integrating backends
Some common backends are supported to allow the llama-cpp-python library to be run on different hardware platforms and accelerate computations using various acceleration technologies.
CUDA
A GPU-based backend that uses NVIDIA's CUDA architecture for accelerated computations, ideal for systems with NVIDIA graphics cards.
Use pre-built wheel:
sh
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/<cuda-version> # cu121, cu122, cu123, cu124, cu125
Build locally:
sh
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
Metal
A GPU-based backend that uses Apple's Metal API for macOS systems, providing accelerated computations on Apple devices.
Use pre-built wheel:
sh
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/metal
Build locally:
sh
CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python
hipBLAS (ROCm)
A GPU-based backend that uses the hipBLAS library for AMD graphics cards, allowing for accelerated computations on AMD systems.
sh
CMAKE_ARGS="-DGGML_HIPBLAS=on" pip install llama-cpp-python
Vulkan
A GPU-based backend that uses the Vulkan API for cross-platform, vendor-agnostic accelerated computations on a wide range of graphics cards.
sh
CMAKE_ARGS="-DGGML_VULKAN=on" pip install llama-cpp-python
SYCL
A backend that uses the SYCL (pronounced "sickle") programming model for heterogeneous computing, allowing for accelerated computations on a variety of devices, including CPUs, FPGAs, and Intel integrated GPUs (iGPU).
sh
CMAKE_ARGS="-DGGML_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
Troubleshooting
If you encounter any issues during the installation process, here are some steps to help you resolve them:
- Check the installation logs: If the installation fails, try adding the
--verbose
flag to thepip install
command to see the full CMake build log. This can help you identify the source of the issue. - Verify dependencies: Make sure you have all the required dependencies installed, including the necessary C++ compiler and libraries.