Vector search has become a critical component of modern AI applications, enabling efficient similarity searches in high-dimensional spaces. Zilliz is a powerful vector database that provides scalable, high-performance search capabilities, particularly when leveraging GPU acceleration.
In this guide, we’ll walk through setting up and using Zilliz on Ubuntu 24.04 GPU instances for optimal vector search performance.
Prerequisites
Before getting started, ensure you have:
- An Ubuntu 24.04 server with NVIDIA GPU(s).
- NVIDIA drivers installed.
- A non-root user with sudo privileges.
Step 1: Set Up Your GPU Environment
First, install the necessary dependencies.
apt update -y
apt install -y python3-pip python3-venv docker-compose
Step 2: Install Zilliz with GPU Support
Zilliz offers several deployment options. For GPU instances, we recommend using Milvus (the open-source version of Zilliz) with GPU acceleration:
1. Create a directory for your Zilliz deployment.
mkdir zilliz-gpu && cd zilliz-gpu
2. Create a docker-compose.yaml file for GPU-enabled Milvus.
nano docker-compose.yaml
Add the below content:
version: '3.5'
services:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/minio:/minio_data
command: minio server /minio_data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.3.3-gpu
command: ["milvus", "run", "standalone"]
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- "etcd"
- "minio"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
3. Start the containers.
docker-compose up -d
4. Verify the services are running.
docker ps
Output.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
66ac3ea90205 milvusdb/milvus:v2.3.3-gpu "/tini -- milvus run…" 11 minutes ago Up 11 minutes 0.0.0.0:9091->9091/tcp, [::]:9091->9091/tcp, 0.0.0.0:19530->19530/tcp, [::]:19530->19530/tcp milvus-standalone
cd2e62004369 quay.io/coreos/etcd:v3.5.5 "etcd -advertise-cli…" 11 minutes ago Up 11 minutes 2379-2380/tcp milvus-etcd
e09e5cb48212 minio/minio:RELEASE.2023-03-20T20-16-18Z "/usr/bin/docker-ent…" 11 minutes ago Up 11 minutes (healthy) 9000/tcp milvus-minio
Step 3: Install Python Client and Dependencies
1. Create a Python virtual environment.
python3 -m venv zilliz-env
source zilliz-env/bin/activate
2. Install the necessary Python packages.
pip install pymilvus torch torchvision sentence-transformers
Step 4: Connect to Zilliz and Create a Collection
1. First, create a Python script to import all necessary Python modules, including the Milvus client and SentenceTransformer for embedding generation.
nano zilliz_demo.py
Add the below code:
#!/usr/bin/env python3
from pymilvus import connections, utility, FieldSchema, CollectionSchema, DataType, Collection
from sentence_transformers import SentenceTransformer
import time
import torch
import pymilvus
Explanation:
- pymilvus is used to interact with the Milvus vector database.
- SentenceTransformer loads a pre-trained transformer model for sentence embeddings.
- torch detects if GPU is available for faster inference.
2. Verify a connection to Milvus server.
def verify_connection():
"""Verify connection to Milvus server"""
try:
# Test connection
connections.connect("default", host="localhost", port="19530")
print("✓ Successfully connected to Milvus server")
# Check server status
print(f"Server status: {utility.get_server_version()}")
print(f"List of collections: {utility.list_collections()}")
return True
except Exception as e:
print(f"Failed to connect to Milvus: {e}")
return False
Explanation:
- Connects to a Milvus instance running on localhost:19530.
- Prints server version and any existing collections to confirm successful connection.
3. Creates a new Milvus collection if it doesn’t exist, generates embeddings for a set of documents, and inserts them into the collection.
def create_collection():
"""Create a collection with sample data"""
# Configuration
collection_name = "document_embeddings"
dim = 384 # Dimension of SBERT embeddings
# Verify connection first
if not verify_connection():
return
# Initialize model (using GPU if available)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")
model = SentenceTransformer('all-MiniLM-L6-v2', device=device)
# Create collection if it doesn't exist
if not utility.has_collection(collection_name):
print(f"Creating collection: {collection_name}")
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=500),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim)
]
schema = CollectionSchema(fields, description="Document embeddings collection")
collection = Collection(collection_name, schema)
# Create index for efficient search
index_params = {
"index_type": "IVF_PQ",
"metric_type": "L2",
"params": {"nlist": 128, "m": 16, "nbits": 8}
}
collection.create_index("embedding", index_params)
print("Collection created with IVF_PQ index")
else:
collection = Collection(collection_name)
print(f"Collection {collection_name} already exists")
# Sample data
documents = [
"Zilliz provides scalable vector search solutions",
"Ubuntu 24.04 offers improved GPU support",
"Vector databases are essential for AI applications",
"GPU acceleration significantly improves vector search performance"
]
# Generate embeddings
print("Generating embeddings...")
start_time = time.time()
embeddings = model.encode(documents)
print(f"Embeddings generated in {time.time() - start_time:.2f} seconds")
# Prepare data for insertion - CORRECTED FORMAT
entities = [
documents, # text field
embeddings.tolist() # embedding field
]
# Insert data - now properly formatted
print("Inserting data...")
mr = collection.insert(entities) # Removed extra list wrapper
# Wait for collection to load
collection.load()
print(f"Inserted {len(documents)} documents")
return collection
Explanation:
- Initializes the SentenceTransformer model (GPU-accelerated if available).
- Defines the collection schema with fields: auto-generated id, text, and embedding.
- Creates an IVF_PQ index for efficient vector search.
- Encodes 4 sample sentences into 384-dim embeddings using SBERT.
- Inserts the documents and loads the collection for querying.
4. Encodes a search query into an embedding and performs an ANN (Approximate Nearest Neighbor) search in the Milvus collection.
def perform_search(collection):
"""Perform a sample vector search"""
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model = SentenceTransformer('all-MiniLM-L6-v2', device=device)
search_terms = ["scalable database solutions"]
print("\nGenerating search embedding...")
search_embeddings = model.encode(search_terms)
search_params = {
"metric_type": "L2",
"params": {"nprobe": 10}
}
print("Executing search...")
results = collection.search(
data=[search_embeddings[0].tolist()],
anns_field="embedding",
param=search_params,
limit=3,
output_fields=["text"]
)
print("\nSearch results:")
for i, hits in enumerate(results):
print(f"Search term: '{search_terms[i]}'")
for hit in hits:
print(f" • Score: {hit.distance:.4f} - Text: {hit.entity.get('text')}")
Explanation:
- Re-loads the same SBERT model to encode the search term.
- Uses collection.search() with vector data, targeting the “embedding” field, and retrieves the top 3 matches.
- Outputs a similarity score and matched text.
5. Create the script’s entry point. It verifies the connection, creates the collection and data, and performs a sample vector search.
if __name__ == "__main__":
print("=== Zilliz/Milvus Vector Search Demo ===")
print(f"PyMilvus version: {pymilvus.__version__}")
# Verify connection first
if verify_connection():
collection = create_collection()
if collection:
perform_search(collection)
print("\nDemo completed")
Explanation:
- First print PyMilvus version.
- If the connection is successful, it proceeds to create the collection and run the search query.
- Wraps the entire logic in a clean main block for clarity and modularity.
6. Run the full Python script.
python3 zilliz_demo.py
Output.
=== Zilliz/Milvus Vector Search Demo ===
PyMilvus version: 2.5.10
✓ Successfully connected to Milvus server
Server status: v2.3.3-gpu
List of collections: ['document_embeddings']
✓ Successfully connected to Milvus server
Server status: v2.3.3-gpu
List of collections: ['document_embeddings']
Using device: cuda
Collection document_embeddings already exists
Generating embeddings...
Embeddings generated in 0.30 seconds
Inserting data...
Inserted 4 documents
Generating search embedding...
Executing search...
Search results:
Search term: 'scalable database solutions'
• Score: 1.1304 - Text: Zilliz provides scalable vector search solutions
• Score: 1.1836 - Text: Vector databases are essential for AI applications
• Score: 1.6474 - Text: GPU acceleration significantly improves vector search performance
Demo completed
Conclusion
Setting up Zilliz (Milvus) on Ubuntu 24.04 GPU instances provides a powerful platform for scalable vector search applications. By leveraging GPU acceleration, you can achieve significant performance improvements for similarity search operations. This setup is particularly valuable for AI applications requiring real-time vector search capabilities with large datasets.