Faiss 向量搜索库

Faiss 是 Meta(Facebook)开源的高效相似度搜索和聚类库，专为大规模向量搜索设计。它支持十亿级向量的搜索，是工业界最常用的向量搜索解决方案之一。

概述

为什么选择 Faiss

特性	说明
Meta开源	Facebook AI Research 开发，经过大规模验证
极致性能	C++实现，支持 GPU 加速，速度极快
算法丰富	支持多种 ANN 算法和量化方法
内存优化	多种压缩技术，降低内存占用
灵活组合	索引可组合，适应不同场景
多语言支持	Python、C++、Go、Rust 等绑定

适用场景

大规模搜索：十亿级向量的高效检索
推荐系统：实时相似度计算
图像检索：以图搜图、内容去重
研究实验：快速验证向量搜索算法
嵌入式部署：轻量级部署到边缘设备

快速开始

1. 安装 Faiss

# CPU 版本
pip install faiss-cpu

# GPU 版本 (需要 CUDA)
pip install faiss-gpu

# 从源码编译 (获取最新功能)
git clone https://github.com/facebookresearch/faiss.git
cd faiss
cmake -B build . -DFAISS_ENABLE_GPU=OFF
make -C build -j faiss
make -C build -j swigfaiss

2. 第一个示例

import faiss
import numpy as np

# 生成示例数据 (10000 条 128 维向量)
np.random.seed(42)
dimension = 128
db_size = 10000
query_size = 5

# 随机生成数据库向量 (需要归一化用于余弦相似度)
db_vectors = np.random.random((db_size, dimension)).astype('float32')
db_vectors = db_vectors / np.linalg.norm(db_vectors, axis=1, keepdims=True)

# 随机生成查询向量
query_vectors = np.random.random((query_size, dimension)).astype('float32')
query_vectors = query_vectors / np.linalg.norm(query_vectors, axis=1, keepdims=True)

# 创建索引 (使用 L2 距离)
index = faiss.IndexFlatL2(dimension)

# 添加向量到索引
index.add(db_vectors)
print(f"索引中向量数量: {index.ntotal}")

# 搜索 Top-5
k = 5
distances, indices = index.search(query_vectors, k)

print("查询结果:")
for i in range(query_size):
    print(f"查询 {i}: 最近邻索引 {indices[i]}, 距离 {distances[i]}")

核心概念

索引类型

Faiss 提供多种索引类型，适应不同规模和性能需求：

1. 精确搜索索引

# IndexFlatL2 - 欧几里得距离精确搜索
index_l2 = faiss.IndexFlatL2(dimension)

# IndexFlatIP - 内积精确搜索 (余弦相似度需归一化)
index_ip = faiss.IndexFlatIP(dimension)

2. 近似搜索索引 (IVF)

# IVF 索引 (倒排文件)
nlist = 100  # 聚类中心数
quantizer = faiss.IndexFlatL2(dimension)
index_ivf = faiss.IndexIVFFlat(quantizer, dimension, nlist)

# 需要训练
index_ivf.train(db_vectors)
index_ivf.add(db_vectors)

# 搜索时可调整 nprobe (搜索的聚类数)
index_ivf.nprobe = 10  # 默认 1，越大越精确

3. 乘积量化索引 (PQ)

# PQ 索引 (压缩存储)
m = 16  # 子向量数
nbits = 8  # 每个子向量编码位数
index_pq = faiss.IndexPQ(dimension, m, nbits)

index_pq.train(db_vectors)
index_pq.add(db_vectors)

# 原始 128 维 float32 = 512 字节
# PQ 压缩后 = 16 字节 (压缩比 32:1)

4. HNSW 索引

# HNSW 索引 (图索引，搜索速度快)
M = 16  # 每个节点的连接数
index_hnsw = faiss.IndexHNSWFlat(dimension, M)

# 设置构建和搜索参数
index_hnsw.hnsw.efConstruction = 200
index_hnsw.hnsw.efSearch = 64

index_hnsw.add(db_vectors)

5. 组合索引

# IVF + PQ (大规模数据的最佳选择)
quantizer = faiss.IndexFlatL2(dimension)
index_ivfpq = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, nbits)

index_ivfpq.train(db_vectors)
index_ivfpq.add(db_vectors)

# 调整搜索参数
index_ivfpq.nprobe = 10

索引选择指南

数据规模	推荐索引	内存占用	搜索速度	精度
< 10万	IndexFlatL2	高	快	100%
10万-100万	IndexIVFFlat	中	快	>95%
100万-1000万	IndexIVFPQ	低	较快	>90%
> 1000万	IndexIVFPQ + GPU	低	很快	>85%
内存受限	IndexPQ	很低	中等	>80%

数据操作

添加和删除向量

import faiss
import numpy as np

# 创建可增量添加的索引
index = faiss.IndexFlatL2(dimension)

# 批量添加
batch_size = 1000
for i in range(0, len(vectors), batch_size):
    batch = vectors[i:i+batch_size]
    index.add(batch)

# 带 ID 的索引 (支持删除)
index_with_ids = faiss.IndexFlatL2(dimension)
index_with_ids = faiss.IndexIDMap(index_with_ids)

# 添加时指定 ID
ids = np.arange(len(vectors), dtype='int64')
index_with_ids.add_with_ids(vectors, ids)

# 删除向量 (通过 ID)
ids_to_remove = np.array([1, 3, 5], dtype='int64')
index_with_ids.remove_ids(faiss.IDSelectorBatch(ids_to_remove))

搜索操作

# 基本搜索
k = 10
distances, indices = index.search(query_vectors, k)

# 范围搜索 (返回距离小于阈值的向量)
radius = 0.5
distances, indices = index.range_search(query_vectors, radius)

# 批量搜索优化
# Faiss 自动并行处理批量查询
batch_queries = np.random.random((100, dimension)).astype('float32')
distances, indices = index.search(batch_queries, k)

保存和加载索引

# 保存索引
faiss.write_index(index, "my_index.faiss")

# 加载索引
index = faiss.read_index("my_index.faiss")

# 保存到内存 (用于网络传输)
import io
buffer = io.BytesIO()
faiss.write_index(index, buffer)
index_data = buffer.getvalue()

# 从内存加载
buffer = io.BytesIO(index_data)
index = faiss.read_index(buffer)

GPU 加速

使用 GPU 索引

# 检查 GPU
print(f"可用 GPU 数量: {faiss.get_num_gpus()}")

# 创建 GPU 资源
res = faiss.StandardGpuResources()

# 将 CPU 索引转换为 GPU 索引
gpu_index = faiss.index_cpu_to_gpu(res, 0, index)  # 0 是 GPU ID

# 使用 GPU 索引搜索
distances, indices = gpu_index.search(query_vectors, k)

# 多个 GPU
ngpus = faiss.get_num_gpus()
res_list = [faiss.StandardGpuResources() for _ in range(ngpus)]
gpu_index = faiss.index_cpu_to_gpu_multiple_py(res_list, index)

GPU 索引最佳实践

# 大规模数据 GPU 索引配置
co = faiss.GpuClonerOptions()
co.useFloat16 = True  # 使用 float16 节省显存
co.indicesOptions = faiss.INDICES_64_BIT

res = faiss.StandardGpuResources()
gpu_index = faiss.index_cpu_to_gpu(res, 0, index, co)

与嵌入模型集成

OpenAI Embedding

from openai import OpenAI
import faiss
import numpy as np

client = OpenAI()

def get_embeddings(texts):
    response = client.embeddings.create(
        input=texts,
        model="text-embedding-3-small"
    )
    return [item.embedding for item in response.data]

# 准备文档
documents = [
    "Faiss 是 Facebook 开源的向量搜索库",
    "Python 是一种流行的编程语言",
    "机器学习是人工智能的子集"
]

# 生成嵌入
embeddings = get_embeddings(documents)
embeddings = np.array(embeddings).astype('float32')

# 创建索引
dimension = len(embeddings[0])
index = faiss.IndexFlatIP(dimension)  # 内积用于余弦相似度
index.add(embeddings)

# 搜索
query = "什么是 Faiss？"
query_embedding = get_embeddings([query])
query_embedding = np.array(query_embedding).astype('float32')

distances, indices = index.search(query_embedding, 3)

for i, idx in enumerate(indices[0]):
    print(f"{i+1}. [{distances[0][i]:.4f}] {documents[idx]}")

Sentence Transformers

from sentence_transformers import SentenceTransformer
import faiss

# 加载模型
model = SentenceTransformer('all-MiniLM-L6-v2')

# 编码文档
documents = ["文档1内容", "文档2内容", "文档3内容"]
embeddings = model.encode(documents, convert_to_numpy=True)
embeddings = embeddings.astype('float32')

# 归一化 (用于余弦相似度)
faiss.normalize_L2(embeddings)

# 创建索引
dimension = embeddings.shape[1]
index = faiss.IndexFlatIP(dimension)
index.add(embeddings)

RAG 应用完整示例

import faiss
import numpy as np
from openai import OpenAI
import os

class FaissRAG:
    def __init__(self, dimension=1536):
        self.dimension = dimension
        self.index = faiss.IndexFlatIP(dimension)  # 内积索引
        self.documents = []
        self.openai = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
    
    def add_documents(self, documents):
        """添加文档到知识库"""
        # 生成嵌入
        embeddings = self._get_embeddings(documents)
        
        # 归一化
        faiss.normalize_L2(embeddings)
        
        # 添加到索引
        self.index.add(embeddings)
        self.documents.extend(documents)
        
        print(f"已添加 {len(documents)} 篇文档，总计 {len(self.documents)} 篇")
    
    def _get_embeddings(self, texts):
        """获取文本嵌入"""
        response = self.openai.embeddings.create(
            input=texts,
            model="text-embedding-3-small"
        )
        embeddings = [item.embedding for item in response.data]
        return np.array(embeddings).astype('float32')
    
    def search(self, query, top_k=5):
        """搜索相关文档"""
        # 生成查询嵌入
        query_embedding = self._get_embeddings([query])
        faiss.normalize_L2(query_embedding)
        
        # 搜索
        distances, indices = self.index.search(query_embedding, top_k)
        
        results = []
        for i, idx in enumerate(indices[0]):
            if idx < len(self.documents):
                results.append({
                    "content": self.documents[idx],
                    "score": float(distances[0][i]),
                    "index": int(idx)
                })
        
        return results
    
    def answer(self, question, top_k=3):
        """生成回答"""
        # 检索相关文档
        docs = self.search(question, top_k)
        
        # 构建上下文
        context = "\n\n".join([
            f"[文档 {i+1}] {doc['content']}"
            for i, doc in enumerate(docs)
        ])
        
        # 调用 LLM
        prompt = f"""基于以下文档回答问题。如果文档中没有相关信息，请说明。

文档：
{context}

问题：{question}

回答："""
        
        response = self.openai.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "你是一个基于文档回答问题的助手。"},
                {"role": "user", "content": prompt}
            ]
        )
        
        return {
            "answer": response.choices[0].message.content,
            "sources": docs
        }
    
    def save(self, path):
        """保存索引和文档"""
        faiss.write_index(self.index, f"{path}.faiss")
        np.save(f"{path}_docs.npy", self.documents)
    
    def load(self, path):
        """加载索引和文档"""
        self.index = faiss.read_index(f"{path}.faiss")
        self.documents = np.load(f"{path}_docs.npy", allow_pickle=True).tolist()

# 使用示例
rag = FaissRAG()

docs = [
    "Faiss 是 Facebook AI Research 开发的高效相似度搜索库",
    "向量数据库用于存储和检索高维向量数据",
    "RAG 通过检索增强生成提升 LLM 回答质量"
]
rag.add_documents(docs)

result = rag.answer("什么是 Faiss？")
print(f"回答：{result['answer']}")

性能优化

批量处理

# 批量添加比逐条添加快得多
batch_size = 10000
for i in range(0, len(vectors), batch_size):
    batch = vectors[i:i+batch_size]
    index.add(batch)

# 批量查询自动并行
# Faiss 内部使用 OpenMP 并行处理批量查询

内存映射

# 大索引使用内存映射
index = faiss.read_index("large_index.faiss", faiss.IO_FLAG_MMAP)

# 只加载部分到内存
index = faiss.read_index("large_index.faiss", faiss.IO_FLAG_READ_ONLY)

量化优化

# OPQ 预处理 + PQ 量化 (精度更高)
opq = faiss.OPQMatrix(dimension, m)
opq.train(db_vectors)

db_vectors_transformed = opq.apply_py(db_vectors)

quantizer = faiss.IndexFlatL2(dimension)
index = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, nbits)
index.train(db_vectors_transformed)
index.add(db_vectors_transformed)

常见问题

Q: Faiss 与专用向量数据库的区别？

特性	Faiss	专用向量数据库
类型	算法库	完整数据库系统
持久化	需自行实现	内置
分布式	需自行实现	通常支持
功能	核心搜索	完整的数据管理
适用	嵌入式/研究	生产环境

Q: 如何处理十亿级向量？

# 1. 使用 IVF + PQ 索引
nlist = 65536  # 更多聚类中心
m = 64  # 更多子向量
index = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, 8)

# 2. 使用 GPU
res = faiss.StandardGpuResources()
gpu_index = faiss.index_cpu_to_gpu(res, 0, index)

# 3. 分片存储
# 将数据分成多个索引，分别搜索后合并结果

Q: 如何评估搜索质量？

# 计算召回率
def compute_recall(index, query_vectors, ground_truth, k=10):
    """计算召回率"""
    distances, indices = index.search(query_vectors, k)
    
    recall = 0
    for i in range(len(query_vectors)):
        # 计算交集
        correct = len(set(indices[i]) & set(ground_truth[i]))
        recall += correct / len(ground_truth[i])
    
    return recall / len(query_vectors)

# 生成 ground truth (使用精确搜索)
index_flat = faiss.IndexFlatL2(dimension)
index_flat.add(db_vectors)
_, ground_truth = index_flat.search(query_vectors, k)

# 评估近似索引
recall = compute_recall(index_ivf, query_vectors, ground_truth)
print(f"召回率: {recall:.4f}")

资源链接

官方文档: https://faiss.ai/
GitHub: https://github.com/facebookresearch/faiss
论文: https://arxiv.org/abs/1702.08734

概述​

为什么选择 Faiss​

适用场景​

快速开始​

1. 安装 Faiss​

2. 第一个示例​

核心概念​

索引类型​

1. 精确搜索索引​

2. 近似搜索索引 (IVF)​

3. 乘积量化索引 (PQ)​

4. HNSW 索引​

5. 组合索引​

索引选择指南​

数据操作​

添加和删除向量​

搜索操作​

保存和加载索引​

GPU 加速​

使用 GPU 索引​

GPU 索引最佳实践​

与嵌入模型集成​

OpenAI Embedding​

Sentence Transformers​

RAG 应用完整示例​

性能优化​

批量处理​

内存映射​

量化优化​

常见问题​

Q: Faiss 与专用向量数据库的区别？​

Q: 如何处理十亿级向量？​

Q: 如何评估搜索质量？​

资源链接​

概述

为什么选择 Faiss

适用场景

快速开始

1. 安装 Faiss

2. 第一个示例

核心概念

索引类型

1. 精确搜索索引

2. 近似搜索索引 (IVF)

3. 乘积量化索引 (PQ)

4. HNSW 索引

5. 组合索引

索引选择指南

数据操作

添加和删除向量

搜索操作

保存和加载索引

GPU 加速

使用 GPU 索引

GPU 索引最佳实践

与嵌入模型集成

OpenAI Embedding

Sentence Transformers

RAG 应用完整示例

性能优化

批量处理

内存映射

量化优化

常见问题

Q: Faiss 与专用向量数据库的区别？

Q: 如何处理十亿级向量？

Q: 如何评估搜索质量？

资源链接