Vector Databases: Core Infrastructure of the AI Era
A vector database specializes in storing and retrieving high-dimensional vector data. It is a key component for building RAG, semantic search, recommendation systems, and other AI applications. This article covers principles and best practices in detail.
Vector Database Core Concepts
What is a Vector Database?
Vector databases are designed to store and retrieve vector embeddings generated by AI models to represent the semantic features of text, images, and other data types.
Key Features
- • High-dimensional storage: Supports hundreds to thousands of dimensions
- • Similarity search: Cosine similarity, Euclidean distance, etc.
- • Real-time indexing: Supports dynamic insert and update
- • Scalability: Handles billions of vectors
- • Hybrid search: Combine vector and scalar filters
Popular Vector Database Comparison
| Database | Type | Performance | Ease of use | Cost | Use cases |
|---|---|---|---|---|---|
| Pinecone | SaaS | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 💰💰💰 | Rapid prototyping, small to medium scale |
| Milvus | Open source | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | 💰 | Large-scale production |
| Weaviate | Open source + Cloud | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 💰💰 | Full-stack AI applications |
| Qdrant | Open source + Cloud | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | 💰💰 | High-performance workloads |
| Chroma | Open source | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | 💰 | Dev/test, small projects |
Vector Indexing Algorithms
Core Techniques for Efficient Retrieval
HNSW (Hierarchical Navigable Small World)
- • Multi-layer graph structure
- • Query time complexity O(log n)
- • High recall (> 95%)
- • Higher memory usage
IVF (Inverted File Index)
- • Vector space clustering
- • GPU acceleration support
- • Memory efficient
- • Suitable for large-scale data
LSH (Locality Sensitive Hashing)
- • Probabilistic algorithm
- • Extremely fast queries
- • Lower accuracy
- • Good for approximate search
Annoy (Developed by Spotify)
- • Random projection trees
- • Memory-mapped files
- • Static index
- • Best for read-only scenarios
Hands-on: Using Different Vector Databases
Pinecone Example
import pinecone
from openai import OpenAI
# Initialize
pinecone.init(api_key="your-pinecone-key")
index = pinecone.Index("my-index")
openai_client = OpenAI(api_key="your-openai-key")
# Create embedding
def create_embedding(text):
response = openai_client.embeddings.create(
model="text-embedding-ada-002",
input=text
)
return response.data[0].embedding
# Upsert data
docs = [
{"id": "doc1", "text": "Python is a programming language"},
{"id": "doc2", "text": "JavaScript is used for web development"}
]
for doc in docs:
embedding = create_embedding(doc["text"])
index.upsert([(doc["id"], embedding, {"text": doc["text"]})])
# Query
query = "What is Python?"
query_embedding = create_embedding(query)
results = index.query(
vector=query_embedding,
top_k=5,
include_metadata=True
)
for match in results.matches:
print(f"ID: {match.id}, Score: {match.score}")
print(f"Text: {match.metadata['text']}")Milvus Example
from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
import numpy as np
# Connect to Milvus
connections.connect(host='localhost', port='19530')
# Define schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True),
FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1536),
FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=5000)
]
schema = CollectionSchema(fields, description="Document embeddings")
# Create collection
collection = Collection(name="documents", schema=schema)
# Create index
index_params = {
"metric_type": "L2",
"index_type": "IVF_FLAT",
"params": {"nlist": 1024}
}
collection.create_index(field_name="embedding", index_params=index_params)
# Insert data
entities = [
[1, 2, 3], # IDs
[np.random.rand(1536).tolist() for _ in range(3)], # embeddings
["text1", "text2", "text3"] # texts
]
collection.insert(entities)
# Load into memory
collection.load()
# Search
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
results = collection.search(
data=[query_embedding],
anns_field="embedding",
param=search_params,
limit=5,
output_fields=["text"]
)
for hit in results[0]:
print(f"ID: {hit.id}, Distance: {hit.distance}")
print(f"Text: {hit.entity.get('text')}")Chroma Example (Lightweight)
import chromadb
from chromadb.utils import embedding_functions
# Initialize
client = chromadb.PersistentClient(path="/path/to/db")
embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
api_key="your-key",
model_name="text-embedding-ada-002"
)
# Create or get collection
collection = client.get_or_create_collection(
name="my_collection",
embedding_function=embedding_fn
)
# Add documents
collection.add(
documents=["Python programming", "Java development", "Data analysis"],
metadatas=[
{"source": "doc1", "type": "programming"},
{"source": "doc2", "type": "programming"},
{"source": "doc3", "type": "data"}
],
ids=["id1", "id2", "id3"]
)
# Query
results = collection.query(
query_texts=["programming language"],
n_results=2,
where={"type": "programming"} # Metadata filter
)
print(f"Documents: {results['documents']}")
print(f"Distances: {results['distances']}")Vector Database Selection Guide
How to choose the right vector database?
Scenario 1: Rapid Prototyping
Needs: Easy to use, quick start
Recommendation: Chroma (local dev) or Pinecone (managed cloud)
Scenario 2: Production Deployment
Needs: High performance, scalability, stability
Recommendation: Milvus (self-hosted) or Pinecone Cloud (managed)
Scenario 3: Hybrid Search
Needs: Vector + full-text + filters
Recommendation: Weaviate or Elasticsearch with vector search
Scenario 4: Cost-sensitive
Needs: Open-source, low resource usage
Recommendation: Chroma, Qdrant, or pgvector (PostgreSQL extension)
Performance Optimization Tips
Improve Vector Search Performance
Index Optimization
- ✅ Choose the right index type
- ✅ Tune index parameters (nlist, nprobe)
- ✅ Rebuild indexes periodically
- ✅ Use quantization to reduce memory
Query Optimization
- ✅ Batch queries to reduce overhead
- ✅ Pre-filter to shrink search space
- ✅ Cache hot query results
- ✅ Async concurrent queries
Data Optimization
- ✅ Dimensionality reduction (PCA)
- ✅ Sharding and partitioning
- ✅ Periodically clean invalid data
- ✅ Use binary quantization
System Optimization
- ✅ GPU acceleration (where applicable)
- ✅ Memory and cache tuning
- ✅ Load balancing and replicas
- ✅ Monitoring and alerting
Vector DB + LLM Integration
Build a Complete AI Application Stack
class VectorRAGSystem:
"""Vector DB + LLM full RAG system"""
def __init__(self, vector_db, llm_client):
self.vector_db = vector_db
self.llm = llm_client
def add_knowledge(self, documents):
"""Add knowledge to vector DB"""
for doc in documents:
# Generate embedding
embedding = self.llm.create_embedding(doc.text)
# Store in vector DB
self.vector_db.insert({
'id': doc.id,
'vector': embedding,
'metadata': {
'text': doc.text,
'source': doc.source,
'timestamp': doc.timestamp
}
})
def answer_question(self, question):
"""Answer based on vector retrieval"""
# 1) Embed question
question_vector = self.llm.create_embedding(question)
# 2) Vector search
results = self.vector_db.search(
vector=question_vector,
limit=5
)
# 3) Build context
context = "
".join([r.metadata['text'] for r in results])
# 4) LLM generate answer
prompt = f"""
Answer the question based on the following information:
Information: {context}
Question: {question}
"""
answer = self.llm.generate(prompt)
return {
'answer': answer,
'sources': [r.metadata['source'] for r in results]
}Start Using Vector Databases
Vector databases are foundational infrastructure for intelligent AI applications. Combined with LLM APIs, you can build powerful semantic search, QA systems, recommendation engines, and more.
Get Started Now