跳到主要内容

文档操作

文档(Document)是 Elasticsearch 中最小的数据单元。本章介绍文档的增删改查操作。

创建文档

指定 ID 创建

PUT /articles/_doc/1
{
"title": "Elasticsearch 入门教程",
"content": "本文介绍 Elasticsearch 的基本概念和使用方法",
"author": "张三",
"category": "技术",
"views": 1000,
"tags": ["elasticsearch", "搜索", "教程"],
"status": "published",
"created_at": "2024-01-15 10:30:00"
}

响应:

{
"_index": "articles",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}

解释

  • _id:文档唯一标识,可自定义或自动生成
  • _version:文档版本号,每次更新自动递增
  • result:操作结果(created/updated/deleted)

自动生成 ID

POST /articles/_doc
{
"title": "Elasticsearch 进阶教程",
"content": "深入讲解 Elasticsearch 的高级特性",
"author": "李四"
}

响应包含自动生成的 _id

{
"_id": "abc123def456",
"result": "created"
}

创建时检查是否存在

# 使用 _create,如果文档已存在则报错
PUT /articles/_create/1
{
"title": "新文章"
}

# 或使用 op_type 参数
PUT /articles/_doc/1?op_type=create
{
"title": "新文章"
}

获取文档

根据 ID 获取

GET /articles/_doc/1

响应:

{
"_index": "articles",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"title": "Elasticsearch 入门教程",
"content": "本文介绍 Elasticsearch 的基本概念和使用方法",
"author": "张三"
}
}

只获取源数据

GET /articles/_source/1

响应:

{
"title": "Elasticsearch 入门教程",
"content": "本文介绍 Elasticsearch 的基本概念和使用方法",
"author": "张三"
}

获取特定字段

# 只获取 title 和 author 字段
GET /articles/_doc/1?_source=title,author

# 排除某些字段
GET /articles/_doc/1?_source_excludes=content

检查文档是否存在

HEAD /articles/_doc/1

# 存在返回 200 OK
# 不存在返回 404 Not Found

批量获取

GET /_mget
{
"docs": [
{ "_index": "articles", "_id": "1" },
{ "_index": "articles", "_id": "2" },
{ "_index": "products", "_id": "1" }
]
}

# 或简化写法(同一索引)
GET /articles/_mget
{
"ids": ["1", "2", "3"]
}

更新文档

完整替换

使用 PUT 会完全替换文档(乐观锁机制):

PUT /articles/_doc/1
{
"title": "Elasticsearch 完全指南",
"content": "更新后的内容",
"author": "张三"
}

注意:这种方式会完全替换原文档,缺失的字段会被删除。

部分更新

使用 _update 只更新指定字段:

POST /articles/_update/1
{
"doc": {
"views": 1500,
"title": "Elasticsearch 完全指南"
}
}

使用脚本更新

POST /articles/_update/1
{
"script": {
"source": "ctx._source.views += params.increment",
"params": {
"increment": 100
}
}
}

# 条件更新
POST /articles/_update/1
{
"script": {
"source": "if (ctx._source.views > 1000) { ctx._source.status = 'hot' }"
}
}

Upsert 操作

文档存在则更新,不存在则创建:

POST /articles/_update/2
{
"doc": {
"title": "新文章",
"views": 0
},
"doc_as_upsert": true
}

# 或使用 upsert 字段
POST /articles/_update/2
{
"script": {
"source": "ctx._source.views += 1"
},
"upsert": {
"title": "新文章",
"views": 1
}
}

乐观锁控制

使用 if_seq_noif_primary_term 防止并发冲突:

# 先获取当前版本
GET /articles/_doc/1
# 返回 _seq_no: 0, _primary_term: 1

# 更新时指定版本
POST /articles/_update/1?if_seq_no=0&if_primary_term=1
{
"doc": {
"views": 1600
}
}

删除文档

根据 ID 删除

DELETE /articles/_doc/1

响应:

{
"_index": "articles",
"_id": "1",
"_version": 2,
"result": "deleted"
}

根据查询删除

POST /articles/_delete_by_query
{
"query": {
"match": {
"status": "draft"
}
}
}

删除所有文档

POST /articles/_delete_by_query
{
"query": {
"match_all": {}
}
}

批量操作

Bulk API

Bulk API 允许在单次请求中执行多个操作:

POST /_bulk
{"index": {"_index": "articles", "_id": "1"}}
{"title": "文章1", "author": "张三"}
{"index": {"_index": "articles", "_id": "2"}}
{"title": "文章2", "author": "李四"}
{"update": {"_index": "articles", "_id": "1"}}
{"doc": {"views": 100}}
{"delete": {"_index": "articles", "_id": "3"}}

解释

  • 每行必须是有效的 JSON,不能有多余的逗号
  • 操作类型:index、create、update、delete
  • index/create 需要下一行是文档内容
  • update 需要下一行是 docscript

批量操作的格式

{ "action": { "metadata" }}
{ "document body" }
{ "action": { "metadata" }}
{ "document body" }

批量操作的 Python 示例

from elasticsearch import Elasticsearch, helpers

es = Elasticsearch("http://localhost:9200")

# 准备批量数据
actions = [
{
"_index": "articles",
"_id": i,
"_source": {
"title": f"文章{i}",
"author": "张三",
"views": i * 100
}
}
for i in range(1000)
]

# 执行批量操作
helpers.bulk(es, actions)

文档路由

默认情况下,文档根据 ID 的哈希值分配到分片。

自定义路由

# 写入时指定路由
PUT /articles/_doc/1?routing=user_123
{
"title": "用户文章",
"user_id": "user_123"
}

# 查询时指定路由
GET /articles/_doc/1?routing=user_123

# 搜索时指定路由(只搜索特定分片)
GET /articles/_search?routing=user_123
{
"query": {
"match_all": {}
}
}

解释:使用自定义路由可以确保相关文档存储在同一个分片,提高查询效率。

强制路由

# 创建索引时配置
PUT /articles
{
"mappings": {
"_routing": {
"required": true
}
}
}

文档刷新机制

刷新间隔

默认情况下,文档索引后约 1 秒才能被搜索到。

# 立即刷新
POST /articles/_refresh

# 修改刷新间隔
PUT /articles/_settings
{
"index": {
"refresh_interval": "5s"
}
}

# 禁用自动刷新(批量导入时)
PUT /articles/_settings
{
"index": {
"refresh_interval": "-1"
}
}

写入时刷新

# 写入后立即刷新
PUT /articles/_doc/1?refresh=true
{
"title": "新文章"
}

# 写入后等待刷新完成
PUT /articles/_doc/1?refresh=wait_for
{
"title": "新文章"
}

实战示例

博客文章管理

# 创建文章
POST /articles/_doc
{
"title": "Python 异步编程指南",
"content": "详细介绍 asyncio 的使用方法...",
"author": "张三",
"category": "Python",
"tags": ["python", "asyncio", "教程"],
"status": "published",
"views": 0,
"likes": 0,
"created_at": "2024-01-15 10:30:00",
"updated_at": "2024-01-15 10:30:00"
}

# 更新浏览量
POST /articles/_doc/abc123/_update
{
"script": {
"source": "ctx._source.views += 1"
}
}

# 点赞
POST /articles/_doc/abc123/_update
{
"script": {
"source": "ctx._source.likes += 1"
}
}

# 修改状态
POST /articles/_doc/abc123/_update
{
"doc": {
"status": "featured",
"updated_at": "2024-01-16 15:00:00"
}
}

批量导入数据

# 使用 bulk 批量导入
POST /_bulk
{"index":{"_index":"products"}}
{"name":"iPhone 15","price":6999,"category":"手机"}
{"index":{"_index":"products"}}
{"name":"MacBook Pro","price":14999,"category":"电脑"}
{"index":{"_index":"products"}}
{"name":"iPad Pro","price":7999,"category":"平板"}
{"index":{"_index":"products"}}
{"name":"AirPods Pro","price":1999,"category":"耳机"}

小结

本章我们学习了:

  1. 创建文档(指定 ID、自动生成 ID)
  2. 获取文档(单个获取、批量获取、字段过滤)
  3. 更新文档(完整替换、部分更新、脚本更新)
  4. 删除文档(按 ID 删除、按查询删除)
  5. 批量操作(Bulk API)
  6. 文档路由
  7. 刷新机制

练习

  1. 创建一个商品索引,添加 10 个商品文档
  2. 使用批量操作导入 1000 条数据
  3. 编写脚本实现浏览量自增
  4. 实现文章发布后自动更新状态

参考资源