索引管理

索引（Index）是 Elasticsearch 存储文档的容器，类似于关系型数据库中的数据库概念。本章将详细介绍如何创建、配置和管理索引，帮助你深入理解 Elasticsearch 的核心存储机制。

索引的基本概念

在深入操作之前，我们先理解几个核心概念：

索引（Index） 是文档的逻辑集合，每个索引都有独立的名字（必须小写），用于在 API 中引用。例如，存储文章数据的索引可以命名为 articles，存储用户数据的索引可以命名为 users。

分片（Shard） 是索引的物理存储单元。一个索引可以分成多个分片，每个分片是一个独立的 Lucene 索引，可以分布在集群的不同节点上。分片数量在创建索引时确定，之后无法修改。

副本（Replica） 是主分片的复制，用于提供数据冗余和查询负载均衡。副本数量可以动态调整。

创建索引

基本创建

最简单的方式是创建一个使用默认设置的索引：

PUT /articles

这个命令会创建一个名为 articles 的索引，使用默认配置：1 个主分片，1 个副本。响应如下：

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "articles"
}

acknowledged 为 true 表示索引创建请求已被集群确认，shards_acknowledged 为 true 表示所有分片都已成功启动。

创建索引时指定设置

生产环境中，我们通常需要在创建索引时指定分片数量、副本数量等设置：

PUT /articles
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

参数详解：

number_of_shards：主分片数量，创建后不能修改。默认值为 1。每个分片建议控制在 10-50GB 之间，过小会导致集群管理开销增加，过大会影响故障恢复速度。
number_of_replicas：每个主分片的副本数量，可以动态修改。默认值为 1。设置为 0 表示不创建副本，适用于开发环境或临时数据。

分片数量如何选择？ 这取决于数据量和集群规模。假设你预计索引会增长到 150GB，希望每个分片控制在 30GB 左右，那么可以设置 5 个主分片。同时考虑查询并发量，如果查询非常频繁，可能需要更多分片来分散负载。

创建索引时指定映射

映射（Mapping）定义了文档的字段结构和类型。创建索引时可以同时指定映射：

PUT /articles
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "content": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "author": {
        "type": "keyword"
      },
      "category": {
        "type": "keyword"
      },
      "views": {
        "type": "integer"
      },
      "tags": {
        "type": "keyword"
      },
      "status": {
        "type": "keyword"
      },
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}

字段类型选择原则：

text：用于全文搜索，如文章标题、内容。会被分词器处理。
keyword：用于精确匹配、排序、聚合，如标签、状态、分类。不会被分词。
integer/long：整数类型，根据数值范围选择。
date：日期类型，需要指定格式。

使用索引模板

当需要创建多个结构相似的索引时，可以使用索引模板（Index Template）来简化操作。模板会自动应用于匹配索引名称模式的新索引：

PUT /_index_template/articles_template
{
  "index_patterns": ["articles*"],
  "priority": 100,
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1
    },
    "mappings": {
      "properties": {
        "title": { "type": "text" },
        "author": { "type": "keyword" },
        "created_at": { "type": "date" }
      }
    }
  }
}

创建模板后，任何以 articles 开头的索引都会自动应用这些设置：

# 创建索引时会自动应用模板
PUT /articles_2024

# 等价于手动指定所有设置和映射

优先级规则： 当多个模板匹配同一个索引时，priority 值高的模板优先级更高，会覆盖低优先级模板中的相同设置。

查看索引

查看所有索引

使用 Cat API 可以快速查看集群中所有索引的状态：

GET /_cat/indices?v

输出示例：

health status index        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   articles     xxx                    3   1          5            0     10.5kb         10.5kb
yellow open   products     xxx                    1   1          0            0       225b           225b

各列含义：

列名	说明
health	健康状态：green（所有分片正常）、yellow（主分片正常但副本未分配）、red（部分主分片不可用）
status	索引状态：open（开启）或 close（关闭）
index	索引名称
pri	主分片数量
rep	副本数量
docs.count	文档数量
docs.deleted	已删除但未清理的文档数
store.size	总存储大小
pri.store.size	主分片存储大小

查看索引设置

查看特定索引的详细设置：

GET /articles/_settings

响应示例：

{
  "articles": {
    "settings": {
      "index": {
        "number_of_shards": "3",
        "number_of_replicas": "1",
        "uuid": "xxx",
        "version": {
          "created": "8120099"
        }
      }
    }
  }
}

查看索引映射

查看索引的字段映射定义：

GET /articles/_mapping

响应示例：

{
  "articles": {
    "mappings": {
      "properties": {
        "title": { "type": "text" },
        "author": { "type": "keyword" },
        "views": { "type": "integer" }
      }
    }
  }
}

修改索引

更新设置

索引的动态设置可以随时修改：

# 修改副本数量
PUT /articles/_settings
{
  "number_of_replicas": 2
}

重要提示： number_of_shards（主分片数量）是静态设置，创建后无法修改。如果需要改变分片数量，必须创建新索引并重新索引数据。

开启和关闭索引

关闭索引可以释放集群资源，同时保留数据：

# 关闭索引
POST /articles/_close

# 开启索引
POST /articles/_open

关闭后的索引无法读写，但数据仍然保留在磁盘上。这在需要临时冻结某些数据时很有用。

添加字段映射

可以随时向已有索引添加新字段：

PUT /articles/_mapping
{
  "properties": {
    "summary": {
      "type": "text"
    },
    "rating": {
      "type": "float"
    }
  }
}

注意： 已存在字段的类型不能修改，只能添加新字段。这是 Elasticsearch 的一个重要限制，原因是修改字段类型需要重建整个索引。如果确实需要修改字段类型，需要创建新索引并迁移数据。

重建索引

当需要修改已有字段的类型或进行大规模结构调整时，使用重建索引：

# 第一步：创建新索引
PUT /articles_new
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "author": { "type": "text" }  # 将 author 从 keyword 改为 text
    }
  }
}

# 第二步：复制数据
POST /_reindex
{
  "source": {
    "index": "articles"
  },
  "dest": {
    "index": "articles_new"
  }
}

# 第三步：删除旧索引
DELETE /articles

# 第四步：创建别名（可选但推荐）
POST /_aliases
{
  "actions": [
    { "add": { "index": "articles_new", "alias": "articles" } }
  ]
}

重建索引过程可能需要较长时间，特别是数据量大的时候。可以使用 wait_for_completion=false 参数让操作在后台执行。

删除索引

删除索引会永久删除所有数据，操作不可逆：

# 删除单个索引
DELETE /articles

# 删除多个索引
DELETE /articles,products

# 删除匹配模式的索引
DELETE /articles_*

警告： 删除操作非常危险，生产环境建议使用索引生命周期管理（ILM）或设置删除保护来防止误操作。

索引别名

别名是索引的替代名称，它提供了极大的灵活性，可以实现零停机的索引切换、数据分层查询等功能。

为什么使用别名？

假设你的应用直接使用索引名 articles 进行查询。当需要重建索引时，你必须修改应用配置指向新索引名，这会导致服务中断。使用别名后，应用只需要知道别名 articles，你可以在后台无缝地将别名从旧索引切换到新索引。

创建别名

# 为单个索引创建别名
POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "articles",
        "alias": "articles_alias"
      }
    }
  ]
}

# 创建索引时同时创建别名
PUT /articles
{
  "aliases": {
    "articles_alias": {},
    "search_articles": {}
  }
}

别名指向多个索引

一个别名可以指向多个索引，实现数据分区查询：

POST /_aliases
{
  "actions": [
    { "add": { "index": "articles_2023", "alias": "articles_all" } },
    { "add": { "index": "articles_2024", "alias": "articles_all" } }
  ]
}

# 查询 articles_all 会同时搜索两个索引
GET /articles_all/_search
{
  "query": { "match_all": {} }
}

这在按时间分区存储数据时非常有用，比如每个月创建一个新索引，但查询时使用统一的别名。

带过滤器的别名

别名可以包含过滤条件，自动限制查询范围：

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "articles",
        "alias": "published_articles",
        "filter": {
          "term": { "status": "published" }
        }
      }
    }
  ]
}

# 查询 published_articles 只返回已发布的文章
GET /published_articles/_search
{
  "query": { "match_all": {} }
}

这种技术可以用于实现数据权限控制，不同用户使用不同别名来访问不同数据子集。

切换别名

原子操作：移除旧索引，添加新索引，整个过程不会中断服务：

POST /_aliases
{
  "actions": [
    { "remove": { "index": "articles_v1", "alias": "articles" } },
    { "add": { "index": "articles_v2", "alias": "articles" } }
  ]
}

查看别名

# 查看索引的所有别名
GET /articles/_alias

# 查看别名指向的索引
GET /_alias/articles_alias

# 查看所有别名
GET /_aliases

写索引

当别名指向多个索引时，可以指定哪个索引接收写操作：

POST /_aliases
{
  "actions": [
    { "add": { "index": "articles_v1", "alias": "articles" } },
    { "add": { "index": "articles_v2", "alias": "articles", "is_write_index": true } }
  ]
}

这样，查询会搜索两个索引，但写入只会写入 articles_v2。

索引统计与监控

索引统计信息

GET /articles/_stats

返回详细的统计信息，包括：

{
  "primaries": {
    "docs": {
      "count": 1000,
      "deleted": 50
    },
    "store": {
      "size_in_bytes": 1048576
    },
    "indexing": {
      "index_total": 1050,
      "index_time_in_millis": 5000
    },
    "search": {
      "query_total": 100,
      "query_time_in_millis": 500
    }
  }
}

关键字段说明：

docs.count：当前文档数量
docs.deleted：已删除文档数量（占用空间，可通过 force merge 回收）
store.size_in_bytes：存储大小
indexing.index_total：总索引操作次数
search.query_time_in_millis：查询总耗时

索引段信息

Lucene 段是索引的底层存储单元：

GET /articles/_segments

段信息可以帮助你了解索引的物理结构，段数量过多会影响查询性能，可以通过 force merge 合并段。

索引设置详解

常用设置汇总

PUT /articles
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    
    "refresh_interval": "1s",
    
    "index.translog.durability": "async",
    "index.translog.sync_interval": "5s",
    "index.translog.flush_threshold_size": "512mb",
    
    "index.max_result_window": 10000,
    
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "stop"]
        }
      }
    }
  }
}

设置详解：

设置	说明	默认值
`refresh_interval`	刷新间隔，新索引的文档在此时间后可见	1s
`index.max_result_window`	分页查询 `from + size` 的最大值	10000
`index.translog.durability`	事务日志持久化策略：`request`（每次请求）或 `async`（异步）	request

写入性能优化

批量导入大量数据时，可以临时调整设置来提高写入性能：

# 导入前：优化写入性能
PUT /articles/_settings
{
  "number_of_replicas": 0,
  "refresh_interval": "-1"
}

# 导入数据...

# 导入后：恢复正常设置
PUT /articles/_settings
{
  "number_of_replicas": 1,
  "refresh_interval": "1s"
}

# 手动刷新，使数据可见
POST /articles/_refresh

# 强制合并段，优化查询性能
POST /articles/_forcemerge?max_num_segments=1

为什么要这样做？

副本数为 0：避免副本同步的写入开销
刷新间隔为 -1：禁用自动刷新，减少段数量
强制合并段：减少段数量，提高查询效率

索引生命周期管理

索引生命周期管理（Index Lifecycle Management，ILM）可以自动管理索引从创建到删除的全生命周期，特别适合日志、指标等时序数据。

创建生命周期策略

PUT /_ilm/policy/articles_policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "30d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "60d",
        "actions": {
          "freeze": {},
          "set_priority": {
            "priority": 0
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

生命周期阶段说明：

阶段	说明	典型操作
hot	活跃数据，频繁读写	rollover（滚动创建新索引）
warm	不再更新，偶尔查询	shrink（收缩分片）、forcemerge（合并段）
cold	很少查询，需要保留	freeze（冻结索引）
delete	可以删除	delete（删除索引）

应用生命周期策略

PUT /_index_template/articles_template
{
  "index_patterns": ["articles-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.lifecycle.name": "articles_policy",
      "index.lifecycle.rollover_alias": "articles"
    }
  }
}

小结

本章我们深入学习了 Elasticsearch 索引管理的核心知识：

索引基础：理解索引、分片、副本的概念及其关系
创建索引：基本创建、指定设置和映射、使用索引模板
索引操作：查看、修改、删除索引
索引别名：创建、切换、过滤别名的各种用法
索引监控：统计信息、段信息的使用
性能优化：批量导入时的设置优化
生命周期管理：自动管理索引的完整生命周期

练习

创建一个商品索引 products，包含名称、价格、分类、库存等字段，设置合理的字段类型
为商品索引创建一个搜索别名 products_search 和一个管理别名 products_admin
使用索引模板创建 logs-2024-01 和 logs-2024-02 两个索引
配置索引生命周期策略，实现：30 天后自动收缩分片，90 天后自动删除

索引的基本概念​

创建索引​

基本创建​

创建索引时指定设置​

创建索引时指定映射​

使用索引模板​

查看索引​

查看所有索引​

查看索引设置​

查看索引映射​

修改索引​

更新设置​

开启和关闭索引​

添加字段映射​

重建索引​

删除索引​

索引别名​

为什么使用别名？​

创建别名​

别名指向多个索引​

带过滤器的别名​

切换别名​

查看别名​

写索引​

索引统计与监控​

索引统计信息​

索引段信息​

索引设置详解​

常用设置汇总​

写入性能优化​

索引生命周期管理​

创建生命周期策略​

应用生命周期策略​

小结​

练习​

参考资料​