Vector Databases: Core Infrastructure of the AI Era

A vector database specializes in storing and retrieving high-dimensional vector data. It is a key component for building RAG, semantic search, recommendation systems, and other AI applications. This article covers principles and best practices in detail.

Vector Database Core Concepts

What is a Vector Database?

Vector databases are designed to store and retrieve vector embeddings generated by AI models to represent the semantic features of text, images, and other data types.

Key Features

  • High-dimensional storage: Supports hundreds to thousands of dimensions
  • Similarity search: Cosine similarity, Euclidean distance, etc.
  • Real-time indexing: Supports dynamic insert and update
  • Scalability: Handles billions of vectors
  • Hybrid search: Combine vector and scalar filters

Popular Vector Database Comparison

DatabaseTypePerformanceEase of useCostUse cases
PineconeSaaS⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐💰💰💰Rapid prototyping, small to medium scale
MilvusOpen source⭐⭐⭐⭐⭐⭐⭐⭐💰Large-scale production
WeaviateOpen source + Cloud⭐⭐⭐⭐⭐⭐⭐⭐💰💰Full-stack AI applications
QdrantOpen source + Cloud⭐⭐⭐⭐⭐⭐⭐⭐⭐💰💰High-performance workloads
ChromaOpen source⭐⭐⭐⭐⭐⭐⭐⭐💰Dev/test, small projects

Vector Indexing Algorithms

Core Techniques for Efficient Retrieval

HNSW (Hierarchical Navigable Small World)

  • • Multi-layer graph structure
  • • Query time complexity O(log n)
  • • High recall (> 95%)
  • • Higher memory usage

IVF (Inverted File Index)

  • • Vector space clustering
  • • GPU acceleration support
  • • Memory efficient
  • • Suitable for large-scale data

LSH (Locality Sensitive Hashing)

  • • Probabilistic algorithm
  • • Extremely fast queries
  • • Lower accuracy
  • • Good for approximate search

Annoy (Developed by Spotify)

  • • Random projection trees
  • • Memory-mapped files
  • • Static index
  • • Best for read-only scenarios

Hands-on: Using Different Vector Databases

Pinecone Example

import pinecone
from openai import OpenAI

# Initialize
pinecone.init(api_key="your-pinecone-key")
index = pinecone.Index("my-index")
openai_client = OpenAI(api_key="your-openai-key")

# Create embedding
def create_embedding(text):
    response = openai_client.embeddings.create(
        model="text-embedding-ada-002",
        input=text
    )
    return response.data[0].embedding

# Upsert data
docs = [
    {"id": "doc1", "text": "Python is a programming language"},
    {"id": "doc2", "text": "JavaScript is used for web development"}
]

for doc in docs:
    embedding = create_embedding(doc["text"])
    index.upsert([(doc["id"], embedding, {"text": doc["text"]})])

# Query
query = "What is Python?"
query_embedding = create_embedding(query)
results = index.query(
    vector=query_embedding,
    top_k=5,
    include_metadata=True
)

for match in results.matches:
    print(f"ID: {match.id}, Score: {match.score}")
    print(f"Text: {match.metadata['text']}")

Milvus Example

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
import numpy as np

# Connect to Milvus
connections.connect(host='localhost', port='19530')

# Define schema
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=5000)
]
schema = CollectionSchema(fields, description="Document embeddings")

# Create collection
collection = Collection(name="documents", schema=schema)

# Create index
index_params = {
    "metric_type": "L2",
    "index_type": "IVF_FLAT",
    "params": {"nlist": 1024}
}
collection.create_index(field_name="embedding", index_params=index_params)

# Insert data
entities = [
    [1, 2, 3],  # IDs
    [np.random.rand(1536).tolist() for _ in range(3)],  # embeddings
    ["text1", "text2", "text3"]  # texts
]
collection.insert(entities)

# Load into memory
collection.load()

# Search
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
results = collection.search(
    data=[query_embedding],
    anns_field="embedding",
    param=search_params,
    limit=5,
    output_fields=["text"]
)

for hit in results[0]:
    print(f"ID: {hit.id}, Distance: {hit.distance}")
    print(f"Text: {hit.entity.get('text')}")

Chroma Example (Lightweight)

import chromadb
from chromadb.utils import embedding_functions

# Initialize
client = chromadb.PersistentClient(path="/path/to/db")
embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
    api_key="your-key",
    model_name="text-embedding-ada-002"
)

# Create or get collection
collection = client.get_or_create_collection(
    name="my_collection",
    embedding_function=embedding_fn
)

# Add documents
collection.add(
    documents=["Python programming", "Java development", "Data analysis"],
    metadatas=[
        {"source": "doc1", "type": "programming"},
        {"source": "doc2", "type": "programming"},
        {"source": "doc3", "type": "data"}
    ],
    ids=["id1", "id2", "id3"]
)

# Query
results = collection.query(
    query_texts=["programming language"],
    n_results=2,
    where={"type": "programming"}  # Metadata filter
)

print(f"Documents: {results['documents']}")
print(f"Distances: {results['distances']}")

Vector Database Selection Guide

How to choose the right vector database?

Scenario 1: Rapid Prototyping

Needs: Easy to use, quick start

Recommendation: Chroma (local dev) or Pinecone (managed cloud)

Scenario 2: Production Deployment

Needs: High performance, scalability, stability

Recommendation: Milvus (self-hosted) or Pinecone Cloud (managed)

Scenario 3: Hybrid Search

Needs: Vector + full-text + filters

Recommendation: Weaviate or Elasticsearch with vector search

Scenario 4: Cost-sensitive

Needs: Open-source, low resource usage

Recommendation: Chroma, Qdrant, or pgvector (PostgreSQL extension)

Performance Optimization Tips

Improve Vector Search Performance

Index Optimization

  • ✅ Choose the right index type
  • ✅ Tune index parameters (nlist, nprobe)
  • ✅ Rebuild indexes periodically
  • ✅ Use quantization to reduce memory

Query Optimization

  • ✅ Batch queries to reduce overhead
  • ✅ Pre-filter to shrink search space
  • ✅ Cache hot query results
  • ✅ Async concurrent queries

Data Optimization

  • ✅ Dimensionality reduction (PCA)
  • ✅ Sharding and partitioning
  • ✅ Periodically clean invalid data
  • ✅ Use binary quantization

System Optimization

  • ✅ GPU acceleration (where applicable)
  • ✅ Memory and cache tuning
  • ✅ Load balancing and replicas
  • ✅ Monitoring and alerting

Vector DB + LLM Integration

Build a Complete AI Application Stack

class VectorRAGSystem:
    """Vector DB + LLM full RAG system"""
    
    def __init__(self, vector_db, llm_client):
        self.vector_db = vector_db
        self.llm = llm_client
        
    def add_knowledge(self, documents):
        """Add knowledge to vector DB"""
        for doc in documents:
            # Generate embedding
            embedding = self.llm.create_embedding(doc.text)
            
            # Store in vector DB
            self.vector_db.insert({
                'id': doc.id,
                'vector': embedding,
                'metadata': {
                    'text': doc.text,
                    'source': doc.source,
                    'timestamp': doc.timestamp
                }
            })
    
    def answer_question(self, question):
        """Answer based on vector retrieval"""
        # 1) Embed question
        question_vector = self.llm.create_embedding(question)
        
        # 2) Vector search
        results = self.vector_db.search(
            vector=question_vector,
            limit=5
        )
        
        # 3) Build context
        context = "
".join([r.metadata['text'] for r in results])
        
        # 4) LLM generate answer
        prompt = f"""
        Answer the question based on the following information:
        
        Information: {context}
        
        Question: {question}
        """
        
        answer = self.llm.generate(prompt)
        
        return {
            'answer': answer,
            'sources': [r.metadata['source'] for r in results]
        }

Start Using Vector Databases

Vector databases are foundational infrastructure for intelligent AI applications. Combined with LLM APIs, you can build powerful semantic search, QA systems, recommendation engines, and more.

Get Started Now