Appearance
Introduction to Llama Stack
Get an introduction to Llama Stack, the standardized platform for generative AI applications. Learn about its features, APIs, and integrations.
What is Llama Stack?
Llama Stack is a set of building blocks that enables developers to create generative AI applications. It provides a standardized API interface and developer experience that is certified by Meta, allowing developers to build applications that can operate in various environments, including on-prem, cloud, single-node, and on-device.
APIs
Llama Stack provides APIs that can be split into two broad categories:
- APIs focused on application development:
- Inference
- Safety
- Memory
- Agentic System
- Evaluation
- APIs focused on model development:
- Evaluation
- Post Training
- Synthetic Data Generation
- Reward Scoring
Each API is a collection of REST endpoints.
API Providers
API Provider is an entity that provides the actual implementation backing the API. It can be:
- A provider that implements the API using open-source libraries (e.g. torch, vLLama, TensorRT)
- A relay to a remote REST service (e.g. cloud providers or dedicated inference providers that serve these APIs)
In other words, an API Provider is an entity that makes the API real by providing the actual implementation or connecting to a remote service that provides the implementation.
Llama Stack supports the following API providers:
- Meta reference
- Fireworks
- AWS Bedrock
- Together
- Ollama
- TGI
- Chroma
- PG Vector
- PyTorch ExecuTorch
Distributions
A Distribution in Llama Stack is where APIs and Providers are assembled together to provide a complete solution to the end application developer. It allows you to mix-and-match providers, some of which can be backed by local code and some by remote services. This allows for flexibility and scalability in developing generative AI applications.
In other words, a Distribution is a packaged solution that combines APIs and Providers to provide a complete and customizable solution for developers.
Llama Stack supports the following distributions:
- Meta reference: A distribution that provides a standard implementation of the Llama Stack APIs, using Meta's reference implementation.
- Meta reference (quantized): A variant of the Meta Reference distribution that uses quantized models for efficient inference.
- Ollama: A distribution that uses the Ollama API provider, which is designed for single-node environments.
- TGI: A distribution that uses the TGI API provider, which supports both hosted and single-node environments.
- Together: A distribution that uses the Together API provider, which provides a hosted solution for Llama Stack.
- Fireworks: A distribution that uses the Fireworks API provider, which provides a hosted solution for Llama Stack.
Client SDK
Llama Stack provides client SDKs for connecting to the Llama Stack server in various programming languages, including:
- Python: llama-stack-client-python
- Swift: llama-stack-client-swift
- Node: llama-stack-client-node
- Kotlin: llama-stack-client-kotlin
These client SDKs allow developers to easily integrate Llama Stack into their applications and use its APIs.