Vector search has become a critical component of modern AI applications, enabling efficient similarity searches in high-dimensional spaces. Zilliz is a powerful vector database that provides scalable, high-performance search capabilities, particularly when leveraging GPU acceleration.

In this guide, we’ll walk through setting up and using Zilliz on Ubuntu 24.04 GPU instances for optimal vector search performance.

Prerequisites

Before getting started, ensure you have:

  • An Ubuntu 24.04 server with NVIDIA GPU(s).
  • NVIDIA drivers installed.
  • A non-root user with sudo privileges.

Step 1: Set Up Your GPU Environment

First, install the necessary dependencies.

apt update -y
apt install -y python3-pip python3-venv docker-compose

Step 2: Install Zilliz with GPU Support

Zilliz offers several deployment options. For GPU instances, we recommend using Milvus (the open-source version of Zilliz) with GPU acceleration:

1. Create a directory for your Zilliz deployment.

mkdir zilliz-gpu && cd zilliz-gpu

2. Create a docker-compose.yaml file for GPU-enabled Milvus.

nano docker-compose.yaml

Add the below content:

version: '3.5'

services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/minio:/minio_data
    command: minio server /minio_data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.3.3-gpu
    command: ["milvus", "run", "standalone"]
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

3. Start the containers.

docker-compose up -d

4. Verify the services are running.

docker ps

Output.

CONTAINER ID   IMAGE                                      COMMAND                  CREATED          STATUS                    PORTS                                                                                          NAMES
66ac3ea90205   milvusdb/milvus:v2.3.3-gpu                 "/tini -- milvus run…"   11 minutes ago   Up 11 minutes             0.0.0.0:9091->9091/tcp, [::]:9091->9091/tcp, 0.0.0.0:19530->19530/tcp, [::]:19530->19530/tcp   milvus-standalone
cd2e62004369   quay.io/coreos/etcd:v3.5.5                 "etcd -advertise-cli…"   11 minutes ago   Up 11 minutes             2379-2380/tcp                                                                                  milvus-etcd
e09e5cb48212   minio/minio:RELEASE.2023-03-20T20-16-18Z   "/usr/bin/docker-ent…"   11 minutes ago   Up 11 minutes (healthy)   9000/tcp                                                                                       milvus-minio

Step 3: Install Python Client and Dependencies

1. Create a Python virtual environment.

python3 -m venv zilliz-env
source zilliz-env/bin/activate

2. Install the necessary Python packages.

pip install pymilvus torch torchvision sentence-transformers

Step 4: Connect to Zilliz and Create a Collection

1. First, create a Python script to import all necessary Python modules, including the Milvus client and SentenceTransformer for embedding generation.

nano zilliz_demo.py

Add the below code:

#!/usr/bin/env python3
from pymilvus import connections, utility, FieldSchema, CollectionSchema, DataType, Collection
from sentence_transformers import SentenceTransformer
import time
import torch
import pymilvus

Explanation:

  • pymilvus is used to interact with the Milvus vector database.
  • SentenceTransformer loads a pre-trained transformer model for sentence embeddings.
  • torch detects if GPU is available for faster inference.

2. Verify a connection to Milvus server.

def verify_connection():
    """Verify connection to Milvus server"""
    try:
        # Test connection
        connections.connect("default", host="localhost", port="19530")
        print("✓ Successfully connected to Milvus server")
        
        # Check server status
        print(f"Server status: {utility.get_server_version()}")
        print(f"List of collections: {utility.list_collections()}")
        
        return True
    except Exception as e:
        print(f"Failed to connect to Milvus: {e}")
        return False

Explanation:

  • Connects to a Milvus instance running on localhost:19530.
  • Prints server version and any existing collections to confirm successful connection.

3. Creates a new Milvus collection if it doesn’t exist, generates embeddings for a set of documents, and inserts them into the collection.

def create_collection():
    """Create a collection with sample data"""
    # Configuration
    collection_name = "document_embeddings"
    dim = 384  # Dimension of SBERT embeddings
    
    # Verify connection first
    if not verify_connection():
        return

    # Initialize model (using GPU if available)
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    print(f"Using device: {device}")
    model = SentenceTransformer('all-MiniLM-L6-v2', device=device)

    # Create collection if it doesn't exist
    if not utility.has_collection(collection_name):
        print(f"Creating collection: {collection_name}")
        
        fields = [
            FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
            FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=500),
            FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim)
        ]
        
        schema = CollectionSchema(fields, description="Document embeddings collection")
        collection = Collection(collection_name, schema)
        
        # Create index for efficient search
        index_params = {
            "index_type": "IVF_PQ",
            "metric_type": "L2",
            "params": {"nlist": 128, "m": 16, "nbits": 8}
        }
        
        collection.create_index("embedding", index_params)
        print("Collection created with IVF_PQ index")
    else:
        collection = Collection(collection_name)
        print(f"Collection {collection_name} already exists")

    # Sample data
    documents = [
        "Zilliz provides scalable vector search solutions",
        "Ubuntu 24.04 offers improved GPU support",
        "Vector databases are essential for AI applications",
        "GPU acceleration significantly improves vector search performance"
    ]
    
    # Generate embeddings
    print("Generating embeddings...")
    start_time = time.time()
    embeddings = model.encode(documents)
    print(f"Embeddings generated in {time.time() - start_time:.2f} seconds")
    
    # Prepare data for insertion - CORRECTED FORMAT
    entities = [
        documents,  # text field
        embeddings.tolist()  # embedding field
    ]
    
    # Insert data - now properly formatted
    print("Inserting data...")
    mr = collection.insert(entities)  # Removed extra list wrapper
    
    # Wait for collection to load
    collection.load()
    print(f"Inserted {len(documents)} documents")
    
    return collection

Explanation:

  • Initializes the SentenceTransformer model (GPU-accelerated if available).
  • Defines the collection schema with fields: auto-generated id, text, and embedding.
  • Creates an IVF_PQ index for efficient vector search.
  • Encodes 4 sample sentences into 384-dim embeddings using SBERT.
  • Inserts the documents and loads the collection for querying.

4. Encodes a search query into an embedding and performs an ANN (Approximate Nearest Neighbor) search in the Milvus collection.

def perform_search(collection):
    """Perform a sample vector search"""
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model = SentenceTransformer('all-MiniLM-L6-v2', device=device)
    search_terms = ["scalable database solutions"]
    
    print("\nGenerating search embedding...")
    search_embeddings = model.encode(search_terms)
    
    search_params = {
        "metric_type": "L2",
        "params": {"nprobe": 10}
    }
    
    print("Executing search...")
    results = collection.search(
        data=[search_embeddings[0].tolist()], 
        anns_field="embedding", 
        param=search_params, 
        limit=3,
        output_fields=["text"]
    )
    
    print("\nSearch results:")
    for i, hits in enumerate(results):
        print(f"Search term: '{search_terms[i]}'")
        for hit in hits:
            print(f"  • Score: {hit.distance:.4f} - Text: {hit.entity.get('text')}")

Explanation:

  • Re-loads the same SBERT model to encode the search term.
  • Uses collection.search() with vector data, targeting the “embedding” field, and retrieves the top 3 matches.
  • Outputs a similarity score and matched text.

5. Create the script’s entry point. It verifies the connection, creates the collection and data, and performs a sample vector search.

if __name__ == "__main__":
    print("=== Zilliz/Milvus Vector Search Demo ===")
    print(f"PyMilvus version: {pymilvus.__version__}")
    
    # Verify connection first
    if verify_connection():
        collection = create_collection()
        if collection:
            perform_search(collection)
    
    print("\nDemo completed")

Explanation:

  • First print PyMilvus version.
  • If the connection is successful, it proceeds to create the collection and run the search query.
  • Wraps the entire logic in a clean main block for clarity and modularity.

6. Run the full Python script.

python3 zilliz_demo.py

Output.

=== Zilliz/Milvus Vector Search Demo ===
PyMilvus version: 2.5.10
✓ Successfully connected to Milvus server
Server status: v2.3.3-gpu
List of collections: ['document_embeddings']
✓ Successfully connected to Milvus server
Server status: v2.3.3-gpu
List of collections: ['document_embeddings']
Using device: cuda
Collection document_embeddings already exists
Generating embeddings...
Embeddings generated in 0.30 seconds
Inserting data...
Inserted 4 documents

Generating search embedding...
Executing search...

Search results:
Search term: 'scalable database solutions'
  • Score: 1.1304 - Text: Zilliz provides scalable vector search solutions
  • Score: 1.1836 - Text: Vector databases are essential for AI applications
  • Score: 1.6474 - Text: GPU acceleration significantly improves vector search performance

Demo completed

Conclusion

Setting up Zilliz (Milvus) on Ubuntu 24.04 GPU instances provides a powerful platform for scalable vector search applications. By leveraging GPU acceleration, you can achieve significant performance improvements for similarity search operations. This setup is particularly valuable for AI applications requiring real-time vector search capabilities with large datasets.