Skip to main content

Command Palette

Search for a command to run...

Deploying LocalAI Self-Hosted AI Model Management Platform on Ubuntu 24.04

Step-by-step guide to deploy LocalAI with Docker Compose and Traefik on Ubuntu 24.04, exposing an OpenAI-compatible API behind automatic HTTPS for self-hosted LLM serving.

Updated
3 min read
Deploying LocalAI Self-Hosted AI Model Management Platform on Ubuntu 24.04
S
A Developer Advocate with a focus on improving the developer experience through clear communication, technical enablement, and community engagement.
A
DevOps Engineer with experience in Kubernetes, automation, cloud infrastructure, and observability. I work in Developer Relations, contribute to technical documentation, and collaborate on engineering-focused projects.

LocalAI is an open-source platform for running Large Language Models locally with an OpenAI-compatible API, so you can swap it in behind existing OpenAI client code without paying per-token or sending data off-server. This guide deploys LocalAI using Docker Compose with Traefik handling automatic HTTPS, persistent model and cache directories, and a working chat-completion test, following self-hosted AI model management practices documented in Vultr Docs.


Set Up the Directory Structure

1. Create the project directories:

mkdir -p ~/localai/{models,cache}
cd ~/localai

models/ holds downloaded model files; cache/ persists between restarts.

2. Create the environment file:

nano .env
DOMAIN=localai.example.com
LETSENCRYPT_EMAIL=admin@example.com

Deploy with Docker Compose

1. Add your user to the Docker group:

sudo usermod -aG docker $USER
newgrp docker

2. Create the Compose manifest:

nano docker-compose.yaml
services:
  traefik:
    image: traefik:v3.6
    container_name: traefik
    restart: unless-stopped
    environment:
      DOCKER_API_VERSION: "1.44"
    command:
      - "--providers.docker=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.web.http.redirections.entrypoint.to=websecure"
      - "--entrypoints.web.http.redirections.entrypoint.scheme=https"
      - "--certificatesresolvers.le.acme.httpchallenge=true"
      - "--certificatesresolvers.le.acme.httpchallenge.entrypoint=web"
      - "--certificatesresolvers.le.acme.email=${LETSENCRYPT_EMAIL}"
      - "--certificatesresolvers.le.acme.storage=/letsencrypt/acme.json"
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./letsencrypt:/letsencrypt

  localai:
    image: localai/localai:latest-aio-cpu
    container_name: localai
    restart: unless-stopped
    volumes:
      - ./models:/models:cached
      - ./cache:/cache:cached
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20s
      retries: 5
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.localai.rule=Host(`${DOMAIN}`)"
      - "traefik.http.routers.localai.entrypoints=websecure"
      - "traefik.http.routers.localai.tls=true"
      - "traefik.http.routers.localai.tls.certresolver=le"
      - "traefik.http.services.localai.loadbalancer.server.port=8080"

Swap localai/localai:latest-aio-cpu for a GPU variant (latest-aio-gpu-nvidia-cuda-12) if the host has an NVIDIA GPU.

3. Set the models directory permissions and start the stack:

sudo chmod -R 755 ~/localai/models
docker compose up -d
docker compose ps

Verify the API

1. Check readiness:

curl -i https://localai.example.com/readyz

A 200 OK confirms Traefik is routing to LocalAI.

2. List the available models:

curl https://localai.example.com/v1/models

3. Run a chat completion:

curl -X POST https://localai.example.com/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gpt-4",
      "messages": [
        {"role": "user", "content": "Explain what LocalAI does in one sentence."}
      ],
      "max_tokens": 60
    }'

LocalAI returns a response in the OpenAI completion shape.


Access the Dashboard

Open https://localai.example.com in a browser to browse the model gallery, install new models, and run inference from the UI.


Next Steps

LocalAI is running and served securely over HTTPS. From here you can:

  • Install additional models from the gallery for domain-specific tasks

  • Point any OpenAI SDK at the LocalAI base URL by changing OPENAI_API_BASE

  • Run a GPU variant for image generation, embeddings, and faster LLM inference

For the full guide with additional tips, visit the original article on Vultr Docs.

The Self-Hosted Stack

Part 1 of 50

The Self-Hosted Stack is a developer-focused series exploring open-source tools you can deploy, run, and manage on your own infrastructure. From AI platforms and databases to developer tools, observability stacks, and authentication systems, each guide walks through deploying production-ready open-source software on Vultr cloud infrastructure.

More from this blog

V

Vultr

81 posts

Vultr is a global cloud infrastructure provider trusted by developers and businesses in 185+ countries. We publishe hands-on guides spanning Linux administration, server configuration, DevOps, networking, open source stacks, AI code agents, and Vultr product walkthroughs, all tested against real cloud environments and built for engineers who ship.