Table of Contents
- Prerequisites
- Step 1: Install Required Dependencies
- Step 2: Verify Docker GPU Runtime
- Step 3: Clone the Milvus Repository
- Step 4: Configure Docker Compose for GPU Acceleration
- Step 5: Launch Milvus with Docker Compose
- Step 6: Verify GPU Access Inside the Container
- Step 7: Create a Python Script to Test GPU FAISS Indexing
- Step 8: Install the Milvus Python SDK and Run the Script
- Conclusion
Milvus is an open-source vector database designed for high-performance similarity search and AI applications. When deployed with GPU acceleration, Milvus can significantly speed up indexing and search tasks, perfect for machine learning workloads.
In this guide, you’ll learn how to deploy Milvus with GPU support on Ubuntu 24.04 using Docker, configure it for GPU-accelerated FAISS indexing, and test it with a Python script.
Prerequisites
- An Ubuntu 24.04 server with an NVIDIA GPU.
- A non-root user or a user with sudo privileges.
- NVIDIA drivers are installed on your server.
Step 1: Install Required Dependencies
In this step, we install essential packages like Python, Docker Compose, and set up a Python virtual environment.
apt install python3 python3-dev python3-venv docker-compose -y
Create and activate the virtual environment:
python3 -m venv venv
source venv/bin/activate
Upgrade pip to the latest version.
pip install --upgrade pip
Restart Docker to ensure all changes are applied:
systemctl restart docker
This sets up a clean Python environment and prepares your system for Dockerized deployment.
Step 2: Verify Docker GPU Runtime
Check if your Docker installation is configured to support GPU access.
docker info | grep -i runtimes
Output.
Runtimes: nvidia runc io.containerd.runc.v2
If NVIDIA is missing, install the NVIDIA Container Toolkit to enable GPU support in Docker containers.
Step 3: Clone the Milvus Repository
Milvus provides a Docker-based deployment setup. We’ll use their official repository.
Clone the Milvus repository.
git clone https://github.com/milvus-io/milvus.git
Navigate to the standalone directory.
cd milvus/deployments/docker/standalone
This directory contains Docker Compose files for deploying Milvus in standalone mode.
Step 4: Configure Docker Compose for GPU Acceleration
In this section, we will create a docker-compose.yml with a GPU-optimized configuration.
nano docker-compose.yml
Remove the default configuration and add the following configuration:
version: '3.5'
services:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.18
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://etcd:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
healthcheck:
test: ["CMD", "etcdctl", "endpoint", "health"]
interval: 30s
timeout: 20s
retries: 3
minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
ports:
- "9001:9001"
- "9000:9000"
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
command: minio server /minio_data --console-address ":9001"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.3.9-gpu
command: ["milvus", "run", "standalone"]
security_opt:
- seccomp:unconfined
environment:
MINIO_REGION: us-east-1
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
CUDA_VISIBLE_DEVICES: "0"
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
interval: 30s
start_period: 90s
timeout: 20s
retries: 3
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- "etcd"
- "minio"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
networks:
default:
name: milvus
This setup ensures Milvus and its dependencies, like etcd and MinIO, run correctly with GPU support.
Step 5: Launch Milvus with Docker Compose
Start the Milvus standalone stack using Docker Compose.
docker compose up -d
Check the container status.
docker ps
Output.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0e15ab0d6d9a milvusdb/milvus:v2.3.9-gpu "/tini -- milvus run…" 4 minutes ago Up 4 minutes (healthy) 0.0.0.0:9091->9091/tcp, [::]:9091->9091/tcp, 0.0.0.0:19530->19530/tcp, [::]:19530->19530/tcp milvus-standalone
d3c681c4ef63 minio/minio:RELEASE.2023-03-20T20-16-18Z "/usr/bin/docker-ent…" 4 minutes ago Up 4 minutes (healthy) 0.0.0.0:9000-9001->9000-9001/tcp, [::]:9000-9001->9000-9001/tcp milvus-minio
d59b8cd4bb7b quay.io/coreos/etcd:v3.5.18 "etcd -advertise-cli…" 4 minutes ago Up 4 minutes (healthy) 2379-2380/tcp
You should see three containers: milvus-standalone, milvus-minio, and milvus-etcd, all marked healthy.
Step 6: Verify GPU Access Inside the Container
Now we’ll enter the Milvus container to confirm it can see the host GPU.
docker exec -it milvus-standalone bash
After you connect to the container, you will get the following shell.
root@0e15ab0d6d9a:/milvus#
Run the below command to see the host GPU.
nvidia-smi
Output.
Wed Jun 25 14:32:10 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A40-12Q On | 00000000:06:00.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 1MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Exit the container once verified.
exit
Step 7: Create a Python Script to Test GPU FAISS Indexing
Let’s run a quick demo to validate that GPU-based indexing works using the Milvus Python SDK.
Create a new Python script.
nano milvus_faiss_gpu_demo.py
Add the below configuration.
from pymilvus import connections, utility, Collection, FieldSchema, CollectionSchema, DataType
import random
import numpy as np
# Step 1: Connect to Milvus
connections.connect(alias="default", host="localhost", port="19530")
print("✅ Connected to Milvus")
# Step 2: Create a collection with FLOAT_VECTOR field
collection_name = "demo_gpu_vectors"
# Delete old collection if exists
if utility.has_collection(collection_name):
Collection(collection_name).drop()
print(f"⚠️ Dropped existing collection: {collection_name}")
# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
]
schema = CollectionSchema(fields, description="GPU FAISS test collection")
collection = Collection(name=collection_name, schema=schema)
print(f"✅ Created collection: {collection_name}")
# Step 3: Insert 1,000 random vectors
num_vectors = 1000
vectors = np.random.random((num_vectors, 128)).astype("float32")
ids = [i for i in range(num_vectors)]
data = [ids, vectors]
collection.insert(data)
print(f"✅ Inserted {num_vectors} vectors")
# Step 4: Create GPU FAISS index
index_params = {
"index_type": "IVF_FLAT", # GPU-accelerated
"metric_type": "L2",
"params": {"nlist": 128}
}
collection.create_index(field_name="embedding", index_params=index_params)
print("✅ Index created with IVF_FLAT (GPU enabled)")
# Step 5: Load collection into memory
collection.load()
print("✅ Collection loaded into memory")
# Step 6: Perform a similarity search
search_vectors = [vectors[42]] # Use a vector from dataset
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
results = collection.search(
search_vectors,
"embedding",
search_params,
limit=5,
output_fields=["id"]
)
print("\n🔍 Top 5 Similar Vectors:")
for hit in results[0]:
print(f"ID: {hit.id}, Distance: {hit.distance:.4f}")
This script connects to Milvus, inserts 1000 random vectors, creates a GPU-accelerated FAISS index, and performs a similarity search.
Step 8: Install the Milvus Python SDK and Run the Script
Now, install the Python SDK client.
pip install pymilvus
Run the script.
python3 milvus_faiss_gpu_demo.py
After the successful connection, you will get the following output.
✅ Connected to Milvus
✅ Created collection: demo_gpu_vectors
✅ Inserted 1000 vectors
✅ Index created with IVF_FLAT (GPU enabled)
✅ Collection loaded into memory
🔍 Top 5 Similar Vectors:
ID: 42, Distance: 0.0000
ID: 165, Distance: 15.2311
ID: 437, Distance: 15.4424
ID: 921, Distance: 15.5928
ID: 549, Distance: 15.6674
This confirms that GPU-based indexing and similarity search are working as expected.
Conclusion
You have successfully deployed Milvus with GPU support on Ubuntu 24.04. By using the FAISS IVF_FLAT index on GPU, you can perform high-speed similarity searches across thousands (or millions) of vectors with impressive performance. This setup is ideal for real-time recommendation engines, semantic search, and AI applications.