跳到主要内容

集群管理

本章介绍 Elasticsearch 集群的配置和管理。

集群架构

节点类型

┌─────────────────────────────────────────────────────────────┐
│ Elasticsearch 集群 │
├─────────────────────────────────────────────────────────────┤
│ │
│ Master 节点 Data 节点 Coordinating 节点 │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ 管理集群 │ │ 存储数据 │ │ 处理请求 │ │
│ │ 创建索引 │ │ 执行搜索 │ │ 路由分发 │ │
│ │ 分配分片 │ │ 聚合计算 │ │ 结果合并 │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
节点类型配置说明
Masternode.roles: [master]管理集群状态,创建/删除索引
Datanode.roles: [data]存储数据,执行搜索和聚合
Ingestnode.roles: [ingest]预处理管道
Coordinatingnode.roles: []处理请求,路由分发

集群健康状态

GET /_cluster/health

# 响应
{
"cluster_name": "my-cluster",
"status": "green", # green/yellow/red
"number_of_nodes": 3,
"number_of_data_nodes": 2,
"active_primary_shards": 10,
"active_shards": 20,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0
}

状态说明

状态说明
green所有分片正常分配
yellow主分片正常,部分副本未分配
red部分主分片未分配

集群配置

基本配置

# elasticsearch.yml

# 集群名称(同一集群必须相同)
cluster.name: my-application

# 节点名称
node.name: node-1

# 节点角色
node.roles: [master, data]

# 网络配置
network.host: 0.0.0.0
http.port: 9200
transport.port: 9300

# 发现配置
discovery.seed_hosts: ["node-1", "node-2", "node-3"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

# 内存配置
bootstrap.memory_lock: true

三个节点集群示例

节点 1 配置(Master + Data):

cluster.name: my-cluster
node.name: node-1
node.roles: [master, data]
network.host: 192.168.1.10
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

节点 2 配置(Master + Data):

cluster.name: my-cluster
node.name: node-2
node.roles: [master, data]
network.host: 192.168.1.11
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

节点 3 配置(Data only):

cluster.name: my-cluster
node.name: node-3
node.roles: [data]
network.host: 192.168.1.12
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

分片管理

分片分配设置

# 查看分配设置
GET /_cluster/setting?include_defaults=true&filter_path=**.allocation.*

# 设置分片分配
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "all", # all/primaries/new_primaries/none
"cluster.routing.allocation.cluster_concurrent_rebalance": 2,
"cluster.routing.allocation.node_concurrent_recoveries": 2
}
}

分片分配过滤

# 将特定索引分配到特定节点
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.include._name": "node-1,node-2"
}
}

# 排除特定节点
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._name": "node-3"
}
}

手动移动分片

POST /_cluster/reroute
{
"commands": [
{
"move": {
"index": "articles",
"shard": 0,
"from_node": "node-1",
"to_node": "node-2"
}
}
]
}

集群监控

集群状态 API

# 集群健康
GET /_cluster/health?pretty

# 集群状态
GET /_cluster/state?pretty

# 集群统计
GET /_cluster/stats?pretty

# 节点统计
GET /_nodes/stats?pretty

Cat API

# 查看节点
GET /_cat/nodes?v

# 查看分片
GET /_cat/shards?v

# 查看索引
GET /_cat/indices?v

# 查看分配情况
GET /_cat/allocation?v

# 查看恢复状态
GET /_cat/recovery?v

集群设置

# 查看设置
GET /_cluster/settings?include_defaults=true

# 动态修改设置
PUT /_cluster/settings
{
"persistent": {
"indices.recovery.max_bytes_per_sec": "100mb"
}
}

# 临时设置(重启后失效)
PUT /_cluster/settings
{
"transient": {
"indices.recovery.max_bytes_per_sec": "50mb"
}
}

故障处理

节点故障

# 查看未分配的分片
GET /_cat/shards?v&s=state

# 解释分片未分配原因
GET /_cluster/allocation/explain
{
"index": "articles",
"shard": 0,
"primary": true
}

# 强制分配分片(危险操作)
POST /_cluster/reroute?retry_failed=true

分片恢复

# 查看恢复进度
GET /_cat/recovery?v&active_only=true

# 取消恢复
POST /_cluster/reroute
{
"commands": [
{
"cancel": {
"index": "articles",
"shard": 0,
"node": "node-1"
}
}
]
}

集群重启

# 准备重启:禁用分片分配
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "primaries"
}
}

# 执行同步刷新
POST /_flush/synced

# 重启节点...

# 恢复分片分配
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}

快照和恢复

创建仓库

# 创建文件系统仓库
PUT /_snapshot/my_backup
{
"type": "fs",
"settings": {
"location": "/mount/backups/my_backup",
"compress": true
}
}

# 创建 S3 仓库
PUT /_snapshot/s3_backup
{
"type": "s3",
"settings": {
"bucket": "my-bucket",
"region": "us-east-1"
}
}

创建快照

# 快照所有索引
PUT /_snapshot/my_backup/snapshot_1

# 快照指定索引
PUT /_snapshot/my_backup/snapshot_2
{
"indices": "articles,products",
"ignore_unavailable": true
}

# 查看快照进度
GET /_snapshot/my_backup/snapshot_1/_status

恢复快照

# 恢复所有索引
POST /_snapshot/my_backup/snapshot_1/_restore

# 恢复指定索引
POST /_snapshot/my_backup/snapshot_1/_restore
{
"indices": "articles",
"ignore_unavailable": true,
"include_global_state": false,
"rename_pattern": "(.+)",
"rename_replacement": "restored_$1"
}

小结

本章我们学习了:

  1. 集群架构和节点类型
  2. 集群配置和部署
  3. 分片管理
  4. 集群监控
  5. 故障处理
  6. 快照和恢复

练习

  1. 配置一个三节点的 Elasticsearch 集群
  2. 使用 Cat API 监控集群状态
  3. 配置快照仓库并创建快照
  4. 模拟节点故障并恢复

参考资源