Introduction to Llama Stack

Get an introduction to Llama Stack, the standardized platform for generative AI applications. Learn about its features, APIs, and integrations.

What is Llama Stack?

Llama Stack is a set of building blocks that enables developers to create generative AI applications. It provides a standardized API interface and developer experience that is certified by Meta, allowing developers to build applications that can operate in various environments, including on-prem, cloud, single-node, and on-device.

APIs

Llama Stack provides APIs that can be split into two broad categories:

APIs focused on application development:
- Inference
- Safety
- Memory
- Agentic System
- Evaluation
APIs focused on model development:
- Evaluation
- Post Training
- Synthetic Data Generation
- Reward Scoring

Each API is a collection of REST endpoints.

API Providers

API Provider is an entity that provides the actual implementation backing the API. It can be:

A provider that implements the API using open-source libraries (e.g. torch, vLLama, TensorRT)
A relay to a remote REST service (e.g. cloud providers or dedicated inference providers that serve these APIs)

In other words, an API Provider is an entity that makes the API real by providing the actual implementation or connecting to a remote service that provides the implementation.

Llama Stack supports the following API providers:

Meta reference
Fireworks
AWS Bedrock
Together
Ollama
TGI
Chroma
PG Vector
PyTorch ExecuTorch

Distributions

A Distribution in Llama Stack is where APIs and Providers are assembled together to provide a complete solution to the end application developer. It allows you to mix-and-match providers, some of which can be backed by local code and some by remote services. This allows for flexibility and scalability in developing generative AI applications.

In other words, a Distribution is a packaged solution that combines APIs and Providers to provide a complete and customizable solution for developers.

Llama Stack supports the following distributions:

Meta reference: A distribution that provides a standard implementation of the Llama Stack APIs, using Meta's reference implementation.
Meta reference (quantized): A variant of the Meta Reference distribution that uses quantized models for efficient inference.
Ollama: A distribution that uses the Ollama API provider, which is designed for single-node environments.
TGI: A distribution that uses the TGI API provider, which supports both hosted and single-node environments.
Together: A distribution that uses the Together API provider, which provides a hosted solution for Llama Stack.
Fireworks: A distribution that uses the Fireworks API provider, which provides a hosted solution for Llama Stack.

Client SDK

Llama Stack provides client SDKs for connecting to the Llama Stack server in various programming languages, including:

These client SDKs allow developers to easily integrate Llama Stack into their applications and use its APIs.

Running Llama Stack: A step-by-step guide

Introduction to Llama Stack ​

What is Llama Stack? ​

APIs ​

API Providers ​

Distributions ​

Client SDK ​

Related posts ​