集群管理
本章介绍 Elasticsearch 集群的配置和管理。
集群架构
节点类型
┌─────────────────────────────────────────────────────────────┐
│ Elasticsearch 集群 │
├─────────────────────────────────────────────────────────────┤
│ │
│ Master 节点 Data 节点 Coordinating 节点 │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ 管理集群 │ │ 存储数据 │ │ 处理请求 │ │
│ │ 创建索引 │ │ 执行搜索 │ │ 路由分发 │ │
│ │ 分配分片 │ │ 聚合计算 │ │ 结果合并 │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
| 节点类型 | 配置 | 说明 |
|---|---|---|
| Master | node.roles: [master] | 管理集群状态,创建/删除索引 |
| Data | node.roles: [data] | 存储数据,执行搜索和聚合 |
| Ingest | node.roles: [ingest] | 预处理管道 |
| Coordinating | node.roles: [] | 处理请求,路由分发 |
集群健康状态
GET /_cluster/health
# 响应
{
"cluster_name": "my-cluster",
"status": "green", # green/yellow/red
"number_of_nodes": 3,
"number_of_data_nodes": 2,
"active_primary_shards": 10,
"active_shards": 20,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0
}
状态说明:
| 状态 | 说明 |
|---|---|
| green | 所有分片正常分配 |
| yellow | 主分片正常,部分副本未分配 |
| red | 部分主分片未分配 |
集群配置
基本配置
# elasticsearch.yml
# 集群名称(同一集群必须相同)
cluster.name: my-application
# 节点名称
node.name: node-1
# 节点角色
node.roles: [master, data]
# 网络配置
network.host: 0.0.0.0
http.port: 9200
transport.port: 9300
# 发现配置
discovery.seed_hosts: ["node-1", "node-2", "node-3"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
# 内存配置
bootstrap.memory_lock: true
三个节点集群示例
节点 1 配置(Master + Data):
cluster.name: my-cluster
node.name: node-1
node.roles: [master, data]
network.host: 192.168.1.10
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
节点 2 配置(Master + Data):
cluster.name: my-cluster
node.name: node-2
node.roles: [master, data]
network.host: 192.168.1.11
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
节点 3 配置(Data only):
cluster.name: my-cluster
node.name: node-3
node.roles: [data]
network.host: 192.168.1.12
discovery.seed_hosts: ["192.168.1.10", "192.168.1.11", "192.168.1.12"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
分片管理
分片分配设置
# 查看分配设置
GET /_cluster/setting?include_defaults=true&filter_path=**.allocation.*
# 设置分片分配
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "all", # all/primaries/new_primaries/none
"cluster.routing.allocation.cluster_concurrent_rebalance": 2,
"cluster.routing.allocation.node_concurrent_recoveries": 2
}
}
分片分配过滤
# 将特定索引分配到特定节点
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.include._name": "node-1,node-2"
}
}
# 排除特定节点
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.exclude._name": "node-3"
}
}
手动移动分片
POST /_cluster/reroute
{
"commands": [
{
"move": {
"index": "articles",
"shard": 0,
"from_node": "node-1",
"to_node": "node-2"
}
}
]
}
集群监控
集群状态 API
# 集群健康
GET /_cluster/health?pretty
# 集群状态
GET /_cluster/state?pretty
# 集群统计
GET /_cluster/stats?pretty
# 节点统计
GET /_nodes/stats?pretty
Cat API
# 查看节点
GET /_cat/nodes?v
# 查看分片
GET /_cat/shards?v
# 查看索引
GET /_cat/indices?v
# 查看分配情况
GET /_cat/allocation?v
# 查看恢复状态
GET /_cat/recovery?v
集群设置
# 查看设置
GET /_cluster/settings?include_defaults=true
# 动态修改设置
PUT /_cluster/settings
{
"persistent": {
"indices.recovery.max_bytes_per_sec": "100mb"
}
}
# 临时设置(重启后失效)
PUT /_cluster/settings
{
"transient": {
"indices.recovery.max_bytes_per_sec": "50mb"
}
}
故障处理
节点故障
# 查看未分配的分片
GET /_cat/shards?v&s=state
# 解释分片未分配原因
GET /_cluster/allocation/explain
{
"index": "articles",
"shard": 0,
"primary": true
}
# 强制分配分片(危险操作)
POST /_cluster/reroute?retry_failed=true
分片恢复
# 查看恢复进度
GET /_cat/recovery?v&active_only=true
# 取消恢复
POST /_cluster/reroute
{
"commands": [
{
"cancel": {
"index": "articles",
"shard": 0,
"node": "node-1"
}
}
]
}
集群重启
# 准备重启:禁用分片分配
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "primaries"
}
}
# 执行同步刷新
POST /_flush/synced
# 重启节点...
# 恢复分片分配
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}
快照和恢复
创建仓库
# 创建文件系统仓库
PUT /_snapshot/my_backup
{
"type": "fs",
"settings": {
"location": "/mount/backups/my_backup",
"compress": true
}
}
# 创建 S3 仓库
PUT /_snapshot/s3_backup
{
"type": "s3",
"settings": {
"bucket": "my-bucket",
"region": "us-east-1"
}
}
创建快照
# 快照所有索引
PUT /_snapshot/my_backup/snapshot_1
# 快照指定索引
PUT /_snapshot/my_backup/snapshot_2
{
"indices": "articles,products",
"ignore_unavailable": true
}
# 查看快照进度
GET /_snapshot/my_backup/snapshot_1/_status
恢复快照
# 恢复所有索引
POST /_snapshot/my_backup/snapshot_1/_restore
# 恢复指定索引
POST /_snapshot/my_backup/snapshot_1/_restore
{
"indices": "articles",
"ignore_unavailable": true,
"include_global_state": false,
"rename_pattern": "(.+)",
"rename_replacement": "restored_$1"
}
小结
本章我们学习了:
- 集群架构和节点类型
- 集群配置和部署
- 分片管理
- 集群监控
- 故障处理
- 快照和恢复
练习
- 配置一个三节点的 Elasticsearch 集群
- 使用 Cat API 监控集群状态
- 配置快照仓库并创建快照
- 模拟节点故障并恢复