文档操作
文档(Document)是 Elasticsearch 中最小的数据单元。本章介绍文档的增删改查操作。
创建文档
指定 ID 创建
PUT /articles/_doc/1
{
"title": "Elasticsearch 入门教程",
"content": "本文介绍 Elasticsearch 的基本概念和使用方法",
"author": "张三",
"category": "技术",
"views": 1000,
"tags": ["elasticsearch", "搜索", "教程"],
"status": "published",
"created_at": "2024-01-15 10:30:00"
}
响应:
{
"_index": "articles",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
解释:
_id:文档唯一标识,可自定义或自动生成_version:文档版本号,每次更新自动递增result:操作结果(created/updated/deleted)
自动生成 ID
POST /articles/_doc
{
"title": "Elasticsearch 进阶教程",
"content": "深入讲解 Elasticsearch 的高级特性",
"author": "李四"
}
响应包含自动生成的 _id:
{
"_id": "abc123def456",
"result": "created"
}
创建时检查是否存在
# 使用 _create,如果文档已存在则报错
PUT /articles/_create/1
{
"title": "新文章"
}
# 或使用 op_type 参数
PUT /articles/_doc/1?op_type=create
{
"title": "新文章"
}
获取文档
根据 ID 获取
GET /articles/_doc/1
响应:
{
"_index": "articles",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"title": "Elasticsearch 入门教程",
"content": "本文介绍 Elasticsearch 的基本概念和使用方法",
"author": "张三"
}
}
只获取源数据
GET /articles/_source/1
响应:
{
"title": "Elasticsearch 入门教程",
"content": "本文介绍 Elasticsearch 的基本概念和使用方法",
"author": "张三"
}
获取特定字段
# 只获取 title 和 author 字段
GET /articles/_doc/1?_source=title,author
# 排除某些字段
GET /articles/_doc/1?_source_excludes=content
检查文档是否存在
HEAD /articles/_doc/1
# 存在返回 200 OK
# 不存在返回 404 Not Found
批量获取
GET /_mget
{
"docs": [
{ "_index": "articles", "_id": "1" },
{ "_index": "articles", "_id": "2" },
{ "_index": "products", "_id": "1" }
]
}
# 或简化写法(同一索引)
GET /articles/_mget
{
"ids": ["1", "2", "3"]
}
更新文档
完整替换
使用 PUT 会完全替换文档(乐观锁机制):
PUT /articles/_doc/1
{
"title": "Elasticsearch 完全指南",
"content": "更新后的内容",
"author": "张三"
}
注意:这种方式会完全替换原文档,缺失的字段会被删除。
部分更新
使用 _update 只更新指定字段:
POST /articles/_update/1
{
"doc": {
"views": 1500,
"title": "Elasticsearch 完全指南"
}
}
使用脚本更新
POST /articles/_update/1
{
"script": {
"source": "ctx._source.views += params.increment",
"params": {
"increment": 100
}
}
}
# 条件更新
POST /articles/_update/1
{
"script": {
"source": "if (ctx._source.views > 1000) { ctx._source.status = 'hot' }"
}
}
Upsert 操作
文档存在则更新,不存在则创建:
POST /articles/_update/2
{
"doc": {
"title": "新文章",
"views": 0
},
"doc_as_upsert": true
}
# 或使用 upsert 字段
POST /articles/_update/2
{
"script": {
"source": "ctx._source.views += 1"
},
"upsert": {
"title": "新文章",
"views": 1
}
}
乐观锁控制
使用 if_seq_no 和 if_primary_term 防止并发冲突:
# 先获取当前版本
GET /articles/_doc/1
# 返回 _seq_no: 0, _primary_term: 1
# 更新时指定版本
POST /articles/_update/1?if_seq_no=0&if_primary_term=1
{
"doc": {
"views": 1600
}
}
删除文档
根据 ID 删除
DELETE /articles/_doc/1
响应:
{
"_index": "articles",
"_id": "1",
"_version": 2,
"result": "deleted"
}
根据查询删除
POST /articles/_delete_by_query
{
"query": {
"match": {
"status": "draft"
}
}
}
删除所有文档
POST /articles/_delete_by_query
{
"query": {
"match_all": {}
}
}
批量操作
Bulk API
Bulk API 允许在单次请求中执行多个操作:
POST /_bulk
{"index": {"_index": "articles", "_id": "1"}}
{"title": "文章1", "author": "张三"}
{"index": {"_index": "articles", "_id": "2"}}
{"title": "文章2", "author": "李四"}
{"update": {"_index": "articles", "_id": "1"}}
{"doc": {"views": 100}}
{"delete": {"_index": "articles", "_id": "3"}}
解释:
- 每行必须是有效的 JSON,不能有多余的逗号
- 操作类型:index、create、update、delete
- index/create 需要下一行是文档内容
- update 需要下一行是
doc或script
批量操作的格式
{ "action": { "metadata" }}
{ "document body" }
{ "action": { "metadata" }}
{ "document body" }
批量操作的 Python 示例
from elasticsearch import Elasticsearch, helpers
es = Elasticsearch("http://localhost:9200")
# 准备批量数据
actions = [
{
"_index": "articles",
"_id": i,
"_source": {
"title": f"文章{i}",
"author": "张三",
"views": i * 100
}
}
for i in range(1000)
]
# 执行批量操作
helpers.bulk(es, actions)
文档路由
默认情况下,文档根据 ID 的哈希值分配到分片。
自定义路由
# 写入时指定路由
PUT /articles/_doc/1?routing=user_123
{
"title": "用户文章",
"user_id": "user_123"
}
# 查询时指定路由
GET /articles/_doc/1?routing=user_123
# 搜索时指定路由(只搜索特定分片)
GET /articles/_search?routing=user_123
{
"query": {
"match_all": {}
}
}
解释:使用自定义路由可以确保相关文档存储在同一个分片,提高查询效率。
强制路由
# 创建索引时配置
PUT /articles
{
"mappings": {
"_routing": {
"required": true
}
}
}
文档刷新机制
刷新间隔
默认情况下,文档索引后约 1 秒才能被搜索到。
# 立即刷新
POST /articles/_refresh
# 修改刷新间隔
PUT /articles/_settings
{
"index": {
"refresh_interval": "5s"
}
}
# 禁用自动刷新(批量导入时)
PUT /articles/_settings
{
"index": {
"refresh_interval": "-1"
}
}
写入时刷新
# 写入后立即刷新
PUT /articles/_doc/1?refresh=true
{
"title": "新文章"
}
# 写入后等待刷新完成
PUT /articles/_doc/1?refresh=wait_for
{
"title": "新文章"
}
实战示例
博客文章管理
# 创建文章
POST /articles/_doc
{
"title": "Python 异步编程指南",
"content": "详细介绍 asyncio 的使用方法...",
"author": "张三",
"category": "Python",
"tags": ["python", "asyncio", "教程"],
"status": "published",
"views": 0,
"likes": 0,
"created_at": "2024-01-15 10:30:00",
"updated_at": "2024-01-15 10:30:00"
}
# 更新浏览量
POST /articles/_doc/abc123/_update
{
"script": {
"source": "ctx._source.views += 1"
}
}
# 点赞
POST /articles/_doc/abc123/_update
{
"script": {
"source": "ctx._source.likes += 1"
}
}
# 修改状态
POST /articles/_doc/abc123/_update
{
"doc": {
"status": "featured",
"updated_at": "2024-01-16 15:00:00"
}
}
批量导入数据
# 使用 bulk 批量导入
POST /_bulk
{"index":{"_index":"products"}}
{"name":"iPhone 15","price":6999,"category":"手机"}
{"index":{"_index":"products"}}
{"name":"MacBook Pro","price":14999,"category":"电脑"}
{"index":{"_index":"products"}}
{"name":"iPad Pro","price":7999,"category":"平板"}
{"index":{"_index":"products"}}
{"name":"AirPods Pro","price":1999,"category":"耳机"}
小结
本章我们学习了:
- 创建文档(指定 ID、自动生成 ID)
- 获取文档(单个获取、批量获取、字段过滤)
- 更新文档(完整替换、部分更新、脚本更新)
- 删除文档(按 ID 删除、按查询删除)
- 批量操作(Bulk API)
- 文档路由
- 刷新机制
练习
- 创建一个商品索引,添加 10 个商品文档
- 使用批量操作导入 1000 条数据
- 编写脚本实现浏览量自增
- 实现文章发布后自动更新状态