查询语法

Elasticsearch 提供了强大的 Query DSL（Domain Specific Language）查询语言，基于 JSON 格式，支持从简单到复杂的各种查询场景。本章将详细介绍查询语法的各个方面，帮助你掌握 Elasticsearch 查询的核心技能。

理解查询上下文与过滤上下文

在学习具体查询语法之前，必须理解两个核心概念：查询上下文和过滤上下文。

查询上下文

在查询上下文中，查询子句会计算每个文档的相关性评分（_score），评分越高表示文档与查询条件越匹配。查询上下文回答的问题是："这个文档与查询条件有多匹配？"

查询上下文适用于：

全文搜索
需要按相关性排序的结果
需要计算评分的场景

过滤上下文

在过滤上下文中，查询子句只判断文档是否匹配，不计算评分。过滤上下文回答的问题是："这个文档是否匹配查询条件？"

过滤上下文的优势：

更快：不需要计算评分
可缓存：Elasticsearch 会自动缓存频繁使用的过滤器，提高后续查询速度
适合结构化数据：如日期范围、精确匹配等

示例对比

GET /articles/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "Elasticsearch" } }    # 查询上下文，计算评分
      ],
      "filter": [
        { "term": { "status": "published" } },       # 过滤上下文，不计算评分
        { "range": { "views": { "gte": 100 } } }     # 过滤上下文，不计算评分
      ]
    }
  }
}

基本查询结构

一个完整的搜索请求通常包含以下部分：

GET /articles/_search
{
  "query": {
    "match": {
      "title": "Elasticsearch"
    }
  },
  "from": 0,              # 分页起始位置
  "size": 10,             # 返回文档数量
  "sort": [               # 排序规则
    { "created_at": "desc" },
    { "_score": "desc" }
  ],
  "_source": [            # 返回字段
    "title",
    "author",
    "views"
  ],
  "highlight": {          # 高亮显示
    "fields": {
      "title": {}
    }
  }
}

全文查询

全文查询用于搜索文本字段，查询词会被分析器处理。

match 查询

match 是最常用的全文查询，适合标准全文搜索场景：

GET /articles/_search
{
  "query": {
    "match": {
      "title": "Elasticsearch 教程"
    }
  }
}

工作原理：

查询词 "Elasticsearch 教程" 被分析器分词为 ["elasticsearch", "教程"]
在倒排索引中查找包含任一词项的文档
计算每个文档的相关性评分
返回评分最高的文档

高级参数：

GET /articles/_search
{
  "query": {
    "match": {
      "title": {
        "query": "Elasticsearch 教程",
        "operator": "and",              # 或 "or"（默认）
        "minimum_should_match": "75%",  # 最少匹配百分比
        "analyzer": "ik_max_word"       # 指定分析器
      }
    }
  }
}

operator 参数详解：

or（默认）：任意一个词匹配即可，相当于 SQL 的 OR
and：所有词都必须匹配，相当于 SQL 的 AND

minimum_should_match 参数：

可以精确控制匹配程度，支持多种格式：

"75%"：至少匹配 75% 的词
"2"：至少匹配 2 个词
"2<75%"：3 个词以下全匹配，3 个词以上匹配 75%

match_phrase 查询

match_phrase 用于精确短语匹配，词的顺序必须一致：

GET /articles/_search
{
  "query": {
    "match_phrase": {
      "title": "Elasticsearch 入门教程"
    }
  }
}

这个查询只匹配标题中恰好包含 "Elasticsearch 入门教程" 这个短语的文档，词的顺序不能改变。

slop 参数： 允许词之间有间隔：

GET /articles/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "Elasticsearch 教程",
        "slop": 2    # 允许中间间隔最多 2 个词
      }
    }
  }
}

这个查询可以匹配 "Elasticsearch 入门教程"、"Elasticsearch 高级教程" 等文档。

multi_match 查询

multi_match 允许在多个字段中搜索：

GET /articles/_search
{
  "query": {
    "multi_match": {
      "query": "Python 教程",
      "fields": ["title", "content", "author"]
    }
  }
}

字段权重： 使用 ^ 符号提升特定字段的权重：

GET /articles/_search
{
  "query": {
    "multi_match": {
      "query": "Python 教程",
      "fields": ["title^3", "content"],  # title 权重 x3
      "type": "best_fields"
    }
  }
}

type 参数说明：

类型	说明	适用场景
`best_fields`	使用最高分字段的评分（默认）	精确匹配场景
`most_fields`	所有匹配字段的评分相加	多字段同义词场景
`cross_fields`	跨字段匹配，将所有字段视为一个大字段	人名、地址等
`phrase`	在每个字段上执行 match_phrase	短语搜索
`phrase_prefix`	短语前缀匹配	输入即搜索

cross_fields 示例：

GET /users/_search
{
  "query": {
    "multi_match": {
      "query": "张 三",
      "fields": ["first_name", "last_name"],
      "type": "cross_fields",
      "operator": "and"
    }
  }
}

这个查询会找到 first_name 为 "张" 且 last_name 为 "三" 的用户，或者反过来。

query_string 查询

query_string 支持 Lucene 查询语法，功能强大但语法复杂：

GET /articles/_search
{
  "query": {
    "query_string": {
      "query": "title:(Elasticsearch OR Python) AND author:张三",
      "default_field": "content"
    }
  }
}

常用语法：

title:Python：在 title 字段搜索
title:"Python 教程"：精确短语
title:(Python OR Java)：布尔或
title:(Python AND 教程)：布尔与
title:(Python NOT Java)：排除
title:Py*：通配符
title:/py[a-z]+/：正则表达式

simple_query_string 查询

比 query_string 更安全，不会因语法错误而失败：

GET /articles/_search
{
  "query": {
    "simple_query_string": {
      "query": "Python +教程 -Java",
      "fields": ["title", "content"],
      "default_operator": "and"
    }
  }
}

语法说明：

+ 表示 AND
| 表示 OR
- 表示 NOT
" 表示短语
* 表示通配符

精确查询

精确查询用于结构化数据，不进行分词处理。

term 查询

term 查询用于精确匹配单个值：

# 匹配 keyword 字段
GET /articles/_search
{
  "query": {
    "term": {
      "status": "published"
    }
  }
}

# 匹配数值字段
GET /articles/_search
{
  "query": {
    "term": {
      "views": 1000
    }
  }
}

# 匹配日期字段
GET /articles/_search
{
  "query": {
    "term": {
      "created_at": "2024-01-15"
    }
  }
}

重要提示： term 查询不会对查询词进行分词。如果在 text 字段上使用 term 查询，可能会有意外结果。例如，如果文档的 title 是 "Elasticsearch 教程"，经过分词后存储的是 ["elasticsearch", "教程"]，使用 term 查询 "Elasticsearch 教程" 将找不到匹配。

terms 查询

terms 查询用于匹配多个值，类似 SQL 的 IN 操作：

GET /articles/_search
{
  "query": {
    "terms": {
      "category": ["Python", "Java", "Go"]
    }
  }
}

从另一个索引获取 terms：

GET /articles/_search
{
  "query": {
    "terms": {
      "author_id": {
        "index": "authors",
        "id": "1",
        "path": "favorite_author_ids"
      }
    }
  }
}

这个查询会从 authors 索引中获取 ID 为 "1" 的文档的 favorite_author_ids 字段值，然后用这些值作为 terms 查询的条件。

range 查询

range 查询用于范围匹配：

# 数值范围
GET /articles/_search
{
  "query": {
    "range": {
      "views": {
        "gt": 1000,       # 大于
        "gte": 1000,      # 大于等于
        "lt": 10000,      # 小于
        "lte": 10000,     # 小于等于
        "boost": 2.0      # 权重
      }
    }
  }
}

# 日期范围
GET /articles/_search
{
  "query": {
    "range": {
      "created_at": {
        "gte": "2024-01-01",
        "lt": "2024-02-01",
        "time_zone": "+08:00"
      }
    }
  }
}

# 相对日期
GET /articles/_search
{
  "query": {
    "range": {
      "created_at": {
        "gte": "now-7d/d",   # 7 天前，向下取整到天
        "lt": "now/d"        # 今天
      }
    }
  }
}

日期数学表达式：

now：当前时间
now+1h：1 小时后
now-1d：1 天前
now-1w：1 周前
now-1M：1 个月前
/d：向下取整到天
/M：向下取整到月

exists 查询

exists 查询检查字段是否存在且不为空：

# 查找有 author 字段的文档
GET /articles/_search
{
  "query": {
    "exists": {
      "field": "author"
    }
  }
}

# 查找没有 author 字段的文档
GET /articles/_search
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "author"
        }
      }
    }
  }
}

字段被认为不存在的情况：

字段未定义
字段值为 null
字段值为空数组 []
字段值为 [null]

ids 查询

ids 查询根据文档 ID 查询：

GET /articles/_search
{
  "query": {
    "ids": {
      "values": ["1", "2", "3"]
    }
  }
}

prefix 查询

prefix 查询匹配以指定前缀开头的词：

GET /articles/_search
{
  "query": {
    "prefix": {
      "title.keyword": "Python"
    }
  }
}

wildcard 查询

wildcard 查询支持通配符匹配：

GET /articles/_search
{
  "query": {
    "wildcard": {
      "title.keyword": {
        "value": "Py*n",
        "boost": 1.0
      }
    }
  }
}

通配符说明：

*：匹配任意字符序列（包括空）
?：匹配单个字符

性能警告： 通配符查询可能很慢，特别是以通配符开头的模式（如 *thon），应尽量避免。

regexp 查询

regexp 查询支持正则表达式：

GET /articles/_search
{
  "query": {
    "regexp": {
      "title.keyword": "py[a-z]+"
    }
  }
}

复合查询

复合查询用于组合多个查询条件。

bool 查询

bool 查询是最常用的复合查询，可以组合多个查询子句：

GET /articles/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "Python" } }
      ],
      "must_not": [
        { "term": { "status": "draft" } }
      ],
      "should": [
        { "term": { "category": "技术" } },
        { "term": { "tags": "教程" } }
      ],
      "filter": [
        { "range": { "views": { "gte": 100 } } }
      ],
      "minimum_should_match": 1
    }
  }
}

四个子句详解：

子句	说明	计算评分	缓存
`must`	必须匹配，参与评分	是	否
`must_not`	必须不匹配	否	是
`should`	可选匹配，增加评分	是	否
`filter`	必须匹配，不参与评分	否	是

minimum_should_match 参数：

当 must 存在时，should 子句是可选的。当只有 should 时，至少需要匹配一个。minimum_should_match 可以控制最少匹配数量：

# 至少匹配 should 中的 2 个
"minimum_should_match": 2

# 至少匹配 should 中的 75%
"minimum_should_match": "75%"

should 子句的行为：

# 情况1：只有 should，至少匹配 1 个（或 minimum_should_match 指定的数量）
{
  "query": {
    "bool": {
      "should": [
        { "term": { "tags": "热门" } },
        { "term": { "tags": "推荐" } }
      ]
    }
  }
}

# 情况2：有 must 或 filter，should 只影响评分，不影响是否匹配
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "Python" } }
      ],
      "should": [
        { "term": { "tags": "热门" } }
      ]
    }
  }
}

boosting 查询

boosting 查询用于降低某些文档的评分，而不是完全排除：

GET /articles/_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": { "title": "Python" }
      },
      "negative": {
        "term": { "category": "广告" }
      },
      "negative_boost": 0.5
    }
  }
}

匹配 negative 条件的文档，其评分会乘以 negative_boost（0.5），从而降低排名但不完全排除。

constant_score 查询

constant_score 查询忽略相关性评分，所有匹配文档的评分都是指定的 boost 值：

GET /articles/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": { "status": "published" }
      },
      "boost": 1.2
    }
  }
}

适用于不需要评分的场景，如精确过滤。

dis_max 查询

dis_max 查询取最高评分的子查询，而不是合并评分：

GET /articles/_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "title": "Python" } },
        { "match": { "content": "Python" } }
      ],
      "tie_breaker": 0.3
    }
  }
}

tie_breaker 参数控制其他子查询评分的影响程度：

0：只取最高分
1：所有子查询评分相加（类似 bool should）
0 < tie_breaker < 1：最高分 + 其他评分 × tie_breaker

嵌套查询

nested 查询

当文档包含嵌套对象数组时，使用 nested 查询来查询嵌套字段：

# 假设文档结构
{
  "title": "文章标题",
  "comments": [
    { "user": "张三", "content": "好文章" },
    { "user": "李四", "content": "学习了" }
  ]
}

# 查询嵌套对象
GET /articles/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "match": { "comments.user": "张三" } },
            { "match": { "comments.content": "好文章" } }
          ]
        }
      },
      "inner_hits": {}
    }
  }
}

inner_hits： 返回匹配的嵌套文档，方便查看具体是哪个嵌套对象匹配了查询条件。

高亮显示

搜索结果中高亮显示匹配的关键词：

GET /articles/_search
{
  "query": {
    "match": { "content": "Elasticsearch" }
  },
  "highlight": {
    "pre_tags": ["<em>"],
    "post_tags": ["</em>"],
    "fields": {
      "content": {
        "fragment_size": 150,        # 片段大小
        "number_of_fragments": 3     # 返回片段数量
      },
      "title": {}
    }
  }
}

高亮类型：

unified（默认）：基于 BM25 评分
plain：使用标准 Lucene 高亮器
fvh：Fast Vector Highlighter，需要字段开启 term_vector

排序

基本排序

GET /articles/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "views": "desc" },
    { "created_at": "desc" }
  ]
}

按相关性评分排序

GET /articles/_search
{
  "query": {
    "match": { "title": "Python" }
  },
  "sort": [
    "_score",
    { "views": "desc" }
  ]
}

按脚本排序

GET /articles/_search
{
  "query": { "match_all": {} },
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "source": "doc['likes'].value + doc['views'].value"
      },
      "order": "desc"
    }
  }
}

处理缺失值

GET /articles/_search
{
  "query": { "match_all": {} },
  "sort": [
    {
      "views": {
        "order": "desc",
        "missing": "_last"    # 缺失值排在最后
      }
    }
  ]
}

分页

基本分页

GET /articles/_search
{
  "query": { "match_all": {} },
  "from": 0,    # 起始位置（从 0 开始）
  "size": 10    # 每页数量
}

深度分页问题

from + size 的默认最大值是 10000。超过这个值会报错。这是因为深度分页会消耗大量内存：假设 from=10000，size=10，Elasticsearch 需要从每个分片获取 10010 个文档，然后在协调节点上排序。

search_after 分页

search_after 是解决深度分页的推荐方案：

# 第一步：获取第一页，并记录最后一条数据的排序值
GET /articles/_search
{
  "query": { "match_all": {} },
  "size": 10,
  "sort": [
    { "created_at": "desc" },
    { "_id": "desc" }    # 使用 _id 作为第二排序字段确保唯一性
  ]
}

# 第二步：使用 search_after 获取下一页
GET /articles/_search
{
  "query": { "match_all": {} },
  "size": 10,
  "sort": [
    { "created_at": "desc" },
    { "_id": "desc" }
  ],
  "search_after": ["2024-01-15T10:30:00", "abc123"]
}

注意事项：

search_after 的值必须与排序字段一致
需要唯一排序字段（如 _id）来保证分页稳定性
不支持随机跳页，只能顺序翻页

scroll 滚动查询

scroll 适合一次性获取大量数据，如数据导出：

# 创建滚动上下文，保持 1 分钟
GET /articles/_search?scroll=1m
{
  "query": { "match_all": {} },
  "size": 1000
}

# 使用 scroll_id 继续获取
GET /_search/scroll
{
  "scroll": "1m",
  "scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFl..."
}

# 清除滚动上下文
DELETE /_search/scroll
{
  "scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFl..."
}

scroll 的特点：

滚动上下文会在指定时间后过期
滚动过程中数据是快照，不会看到后续的更新
大量使用滚动会消耗服务器资源

建议器

term suggester（词项建议）

用于拼写纠错：

POST /articles/_search
{
  "suggest": {
    "my-suggestion": {
      "text": "pyton tutoril",
      "term": {
        "field": "title"
      }
    }
  }
}

响应会包含可能的正确拼写建议。

completion suggester（自动补全）

需要字段类型为 completion：

# 创建索引时配置
PUT /articles
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "title_suggest": { "type": "completion" }
    }
  }
}

# 查询建议
POST /articles/_search
{
  "suggest": {
    "title-suggest": {
      "prefix": "pyth",
      "completion": {
        "field": "title_suggest",
        "size": 5
      }
    }
  }
}

实战示例

综合搜索

GET /articles/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "Python 异步编程",
            "fields": ["title^2", "content"],
            "type": "best_fields"
          }
        }
      ],
      "filter": [
        { "term": { "status": "published" } },
        { "range": { "views": { "gte": 100 } } }
      ],
      "should": [
        { "term": { "category": { "value": "Python", "boost": 2 } } },
        { "term": { "tags": "教程" } }
      ]
    }
  },
  "sort": [
    "_score",
    { "created_at": "desc" }
  ],
  "from": 0,
  "size": 20,
  "highlight": {
    "fields": {
      "title": {},
      "content": {
        "fragment_size": 150
      }
    }
  },
  "_source": ["title", "author", "summary", "created_at", "views"]
}

多条件筛选搜索

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "手机"
          }
        }
      ],
      "filter": [
        {
          "terms": {
            "brand": ["Apple", "Samsung", "Huawei"]
          }
        },
        {
          "range": {
            "price": {
              "gte": 2000,
              "lte": 8000
            }
          }
        },
        {
          "term": {
            "in_stock": true
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "status": "discontinued"
          }
        }
      ]
    }
  },
  "sort": [
    { "_score": "desc" },
    { "sales": "desc" }
  ]
}

小结

本章我们深入学习了 Elasticsearch Query DSL 的核心知识：

查询上下文与过滤上下文：理解评分查询和过滤查询的区别
全文查询：match、match_phrase、multi_match、query_string
精确查询：term、terms、range、exists、prefix、wildcard
复合查询：bool、boosting、constant_score、dis_max
嵌套查询：nested 查询的使用
高亮显示：搜索结果的词项高亮
排序与分页：多字段排序、深度分页解决方案
建议器：拼写纠错和自动补全

练习

实现一个商品搜索功能，支持关键词搜索、分类过滤、价格区间筛选
实现文章搜索并高亮显示关键词
使用 search_after 实现无限制的深度分页
实现搜索词拼写纠错功能，当用户输入错误时提供建议

理解查询上下文与过滤上下文​

查询上下文​

过滤上下文​

示例对比​

基本查询结构​

全文查询​

match 查询​

match_phrase 查询​

multi_match 查询​

query_string 查询​

simple_query_string 查询​

精确查询​

term 查询​

terms 查询​

range 查询​

exists 查询​

ids 查询​

prefix 查询​

wildcard 查询​

regexp 查询​

复合查询​

bool 查询​

boosting 查询​

constant_score 查询​

dis_max 查询​

嵌套查询​

nested 查询​

高亮显示​

排序​

基本排序​

按相关性评分排序​

按脚本排序​

处理缺失值​

分页​

基本分页​

深度分页问题​

search_after 分页​

scroll 滚动查询​

建议器​

term suggester（词项建议）​

completion suggester（自动补全）​

实战示例​

综合搜索​

多条件筛选搜索​

小结​

练习​

参考资料​