Milvus is an open-source vector database designed for high-performance similarity search and AI applications. When deployed with GPU acceleration, Milvus can significantly speed up indexing and search tasks, perfect for machine learning workloads.

In this guide, you’ll learn how to deploy Milvus with GPU support on Ubuntu 24.04 using Docker, configure it for GPU-accelerated FAISS indexing, and test it with a Python script.

Prerequisites

  • An Ubuntu 24.04 server with an NVIDIA GPU.
  • A non-root user or a user with sudo privileges.
  • NVIDIA drivers are installed on your server.

Step 1: Install Required Dependencies

In this step, we install essential packages like Python, Docker Compose, and set up a Python virtual environment.

apt install python3 python3-dev python3-venv docker-compose -y

Create and activate the virtual environment:

python3 -m venv venv
source venv/bin/activate

Upgrade pip to the latest version.

pip install --upgrade pip

Restart Docker to ensure all changes are applied:

systemctl restart docker

This sets up a clean Python environment and prepares your system for Dockerized deployment.

Step 2: Verify Docker GPU Runtime

Check if your Docker installation is configured to support GPU access.

docker info | grep -i runtimes

Output.

Runtimes: nvidia runc io.containerd.runc.v2

If NVIDIA is missing, install the NVIDIA Container Toolkit to enable GPU support in Docker containers.

Step 3: Clone the Milvus Repository

Milvus provides a Docker-based deployment setup. We’ll use their official repository.

Clone the Milvus repository.

git clone https://github.com/milvus-io/milvus.git

Navigate to the standalone directory.

cd milvus/deployments/docker/standalone

This directory contains Docker Compose files for deploying Milvus in standalone mode.

Step 4: Configure Docker Compose for GPU Acceleration

In this section, we will create a docker-compose.yml with a GPU-optimized configuration.

nano docker-compose.yml

Remove the default configuration and add the following configuration:

version: '3.5'

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.18
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://etcd:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.3.9-gpu
    command: ["milvus", "run", "standalone"]
    security_opt:
      - seccomp:unconfined
    environment:
      MINIO_REGION: us-east-1
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
      CUDA_VISIBLE_DEVICES: "0"
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

networks:
  default:
    name: milvus

This setup ensures Milvus and its dependencies, like etcd and MinIO, run correctly with GPU support.

Step 5: Launch Milvus with Docker Compose

Start the Milvus standalone stack using Docker Compose.

docker compose up -d

Check the container status.

docker ps

Output.

CONTAINER ID   IMAGE                                      COMMAND                  CREATED         STATUS                   PORTS                                                                                          NAMES
0e15ab0d6d9a   milvusdb/milvus:v2.3.9-gpu                 "/tini -- milvus run…"   4 minutes ago   Up 4 minutes (healthy)   0.0.0.0:9091->9091/tcp, [::]:9091->9091/tcp, 0.0.0.0:19530->19530/tcp, [::]:19530->19530/tcp   milvus-standalone
d3c681c4ef63   minio/minio:RELEASE.2023-03-20T20-16-18Z   "/usr/bin/docker-ent…"   4 minutes ago   Up 4 minutes (healthy)   0.0.0.0:9000-9001->9000-9001/tcp, [::]:9000-9001->9000-9001/tcp                                milvus-minio
d59b8cd4bb7b   quay.io/coreos/etcd:v3.5.18                "etcd -advertise-cli…"   4 minutes ago   Up 4 minutes (healthy)   2379-2380/tcp

You should see three containers: milvus-standalone, milvus-minio, and milvus-etcd, all marked healthy.

Step 6: Verify GPU Access Inside the Container

Now we’ll enter the Milvus container to confirm it can see the host GPU.

docker exec -it milvus-standalone bash

After you connect to the container, you will get the following shell.

root@0e15ab0d6d9a:/milvus# 

Run the below command to see the host GPU.

nvidia-smi

Output.

Wed Jun 25 14:32:10 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40-12Q                 On  |   00000000:06:00.0 Off |                    0 |
| N/A   N/A    P8             N/A /  N/A  |       1MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Exit the container once verified.

exit

Step 7: Create a Python Script to Test GPU FAISS Indexing

Let’s run a quick demo to validate that GPU-based indexing works using the Milvus Python SDK.

Create a new Python script.

nano milvus_faiss_gpu_demo.py

Add the below configuration.

from pymilvus import connections, utility, Collection, FieldSchema, CollectionSchema, DataType
import random
import numpy as np

# Step 1: Connect to Milvus
connections.connect(alias="default", host="localhost", port="19530")
print("✅ Connected to Milvus")

# Step 2: Create a collection with FLOAT_VECTOR field
collection_name = "demo_gpu_vectors"

# Delete old collection if exists
if utility.has_collection(collection_name):
    Collection(collection_name).drop()
    print(f"⚠️ Dropped existing collection: {collection_name}")

# Define schema
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
]

schema = CollectionSchema(fields, description="GPU FAISS test collection")
collection = Collection(name=collection_name, schema=schema)
print(f"✅ Created collection: {collection_name}")

# Step 3: Insert 1,000 random vectors
num_vectors = 1000
vectors = np.random.random((num_vectors, 128)).astype("float32")
ids = [i for i in range(num_vectors)]
data = [ids, vectors]

collection.insert(data)
print(f"✅ Inserted {num_vectors} vectors")

# Step 4: Create GPU FAISS index
index_params = {
    "index_type": "IVF_FLAT",      # GPU-accelerated
    "metric_type": "L2",
    "params": {"nlist": 128}
}

collection.create_index(field_name="embedding", index_params=index_params)
print("✅ Index created with IVF_FLAT (GPU enabled)")

# Step 5: Load collection into memory
collection.load()
print("✅ Collection loaded into memory")

# Step 6: Perform a similarity search
search_vectors = [vectors[42]]  # Use a vector from dataset
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}

results = collection.search(
    search_vectors,
    "embedding",
    search_params,
    limit=5,
    output_fields=["id"]
)

print("\n🔍 Top 5 Similar Vectors:")
for hit in results[0]:
    print(f"ID: {hit.id}, Distance: {hit.distance:.4f}")

This script connects to Milvus, inserts 1000 random vectors, creates a GPU-accelerated FAISS index, and performs a similarity search.

Step 8: Install the Milvus Python SDK and Run the Script

Now, install the Python SDK client.

pip install pymilvus

Run the script.

python3 milvus_faiss_gpu_demo.py

After the successful connection, you will get the following output.

✅ Connected to Milvus
✅ Created collection: demo_gpu_vectors
✅ Inserted 1000 vectors
✅ Index created with IVF_FLAT (GPU enabled)
✅ Collection loaded into memory

🔍 Top 5 Similar Vectors:
ID: 42, Distance: 0.0000
ID: 165, Distance: 15.2311
ID: 437, Distance: 15.4424
ID: 921, Distance: 15.5928
ID: 549, Distance: 15.6674

This confirms that GPU-based indexing and similarity search are working as expected.

Conclusion

You have successfully deployed Milvus with GPU support on Ubuntu 24.04. By using the FAISS IVF_FLAT index on GPU, you can perform high-speed similarity searches across thousands (or millions) of vectors with impressive performance. This setup is ideal for real-time recommendation engines, semantic search, and AI applications.