映射配置

映射（Mapping）定义了文档的结构和字段类型，类似于关系型数据库中的表结构定义（Schema）。理解映射是正确使用 Elasticsearch 的基础，它决定了数据如何被索引、存储和搜索。本章将详细介绍映射的各种配置和使用方法。

映射概述

显式映射 vs 动态映射

显式映射：在创建索引时明确定义字段类型和属性。适合对数据结构有明确要求的场景。

动态映射：Elasticsearch 根据文档内容自动推断字段类型。适合快速原型开发或数据结构不固定的场景。

生产环境通常使用显式映射，确保数据类型正确，避免因自动推断导致的查询问题。

映射的不可变性

映射一旦创建，字段类型就不能修改。这是因为 Elasticsearch 已经根据映射创建了倒排索引，修改类型会导致索引数据与新映射不兼容。

如果确实需要修改字段类型，只能创建新索引并重新索引数据。

核心数据类型

字符串类型

text 类型：用于全文搜索，会被分析器分词。

PUT /articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word"
      }
    }
  }
}

适用场景：文章标题、内容、描述等需要全文搜索的字段。

keyword 类型：用于精确匹配，不分词，整体作为一个词项。

PUT /articles
{
  "mappings": {
    "properties": {
      "status": {
        "type": "keyword"
      },
      "tags": {
        "type": "keyword"
      }
    }
  }
}

适用场景：状态、分类、标签、ID、邮箱等需要精确匹配或聚合的字段。

text + keyword 组合：一个字段同时支持全文搜索和精确匹配。

PUT /articles
{
  "mappings": {
    "properties": {
      "category": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

使用方式：

全文搜索：match { "category": "技术" }
精确匹配：term { "category.keyword": "技术" }

数值类型

类型	大小	最小值	最大值	适用场景
`integer`	32 位	-2^31	2^31-1	年龄、数量
`long`	64 位	-2^63	2^63-1	ID、时间戳
`short`	16 位	-32768	32767	小数值
`byte`	8 位	-128	127	极小数值
`float`	32 位	IEEE 754	IEEE 754	价格、评分
`double`	64 位	IEEE 754	IEEE 754	精确计算
`half_float`	16 位	IEEE 754	IEEE 754	内存敏感场景
`scaled_float`	可变	取决于比例因子	取决于比例因子	货币金额

PUT /products
{
  "mappings": {
    "properties": {
      "price": {
        "type": "scaled_float",
        "scaling_factor": 100    # 价格乘以 100 存储为整数
      },
      "stock": {
        "type": "integer"
      },
      "rating": {
        "type": "half_float"
      }
    }
  }
}

选择原则：

能用小类型就不用大类型，节省存储空间
货币金额推荐 scaled_float，避免浮点精度问题
大数值用 long，如 ID、时间戳

日期类型

PUT /articles
{
  "mappings": {
    "properties": {
      "created_at": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      },
      "updated_at": {
        "type": "date"
      }
    }
  }
}

日期格式说明：

format 支持多种格式，用 || 分隔
epoch_millis：Unix 时间戳（毫秒）
epoch_second：Unix 时间戳（秒）
默认格式：strict_date_optional_time||epoch_millis

日期查询：

GET /articles/_search
{
  "query": {
    "range": {
      "created_at": {
        "gte": "2024-01-01",
        "lt": "2024-02-01"
      }
    }
  }
}

布尔类型

PUT /articles
{
  "mappings": {
    "properties": {
      "is_published": {
        "type": "boolean"
      }
    }
  }
}

接受值：true/false、"true"/"false"、"on"/"off"、"yes"/"no"、1/0。

二进制类型

PUT /articles
{
  "mappings": {
    "properties": {
      "image_data": {
        "type": "binary"
      }
    }
  }
}

二进制类型存储 Base64 编码的数据，默认不索引，不能搜索。

范围类型

PUT /events
{
  "mappings": {
    "properties": {
      "age_range": {
        "type": "integer_range"
      },
      "time_range": {
        "type": "date_range"
      }
    }
  }
}

# 索引文档
POST /events/_doc
{
  "age_range": {
    "gte": 18,
    "lt": 30
  },
  "time_range": {
    "gte": "2024-01-01",
    "lt": "2024-12-31"
  }
}

范围类型适合存储时间区间、年龄区间等范围数据。

复杂数据类型

对象类型

JSON 对象会被 Elasticsearch 扁平化存储：

PUT /articles
{
  "mappings": {
    "properties": {
      "author": {
        "properties": {
          "name": { "type": "keyword" },
          "email": { "type": "keyword" },
          "address": {
            "properties": {
              "city": { "type": "keyword" },
              "country": { "type": "keyword" }
            }
          }
        }
      }
    }
  }
}

查询嵌套对象：

GET /articles/_search
{
  "query": {
    "term": {
      "author.name": "张三"
    }
  }
}

注意： 对象数组的问题。如果 author 是一个数组，内部对象之间的关系会丢失：

{
  "authors": [
    { "name": "张三", "email": "[email protected]" },
    { "name": "李四", "email": "[email protected]" }
  ]
}

这个文档会被扁平化为：

authors.name: ["张三", "李四"]
authors.email: ["[email protected]", "[email protected]"]

查询 name: "张三" AND email: "[email protected]" 会错误地匹配这个文档。

嵌套类型（Nested）

nested 类型解决了对象数组的关系丢失问题：

PUT /articles
{
  "mappings": {
    "properties": {
      "comments": {
        "type": "nested",
        "properties": {
          "user": { "type": "keyword" },
          "content": { "type": "text" },
          "created_at": { "type": "date" }
        }
      }
    }
  }
}

查询嵌套对象使用 nested 查询：

GET /articles/_search
{
  "query": {
    "nested": {
      "path": "comments",
      "query": {
        "bool": {
          "must": [
            { "term": { "comments.user": "张三" } },
            { "match": { "comments.content": "好文章" } }
          ]
        }
      },
      "inner_hits": {}
    }
  }
}

inner_hits： 返回匹配的嵌套对象，便于查看具体是哪条评论匹配了查询。

地理类型

geo_point： 地理坐标点

PUT /locations
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_point"
      }
    }
  }
}

# 索引文档
POST /locations/_doc
{
  "name": "北京",
  "location": {
    "lat": 39.9042,
    "lon": 116.4074
  }
}

# 或使用字符串格式
POST /locations/_doc
{
  "name": "上海",
  "location": "31.2304,121.4737"
}

# 地理距离查询
GET /locations/_search
{
  "query": {
    "geo_distance": {
      "distance": "100km",
      "location": {
        "lat": 39.9042,
        "lon": 116.4074
      }
    }
  }
}

geo_shape： 地理形状（多边形、线等）

PUT /areas
{
  "mappings": {
    "properties": {
      "area": {
        "type": "geo_shape"
      }
    }
  }
}

IP 类型

PUT /logs
{
  "mappings": {
    "properties": {
      "client_ip": {
        "type": "ip"
      }
    }
  }
}

# IP 范围查询
GET /logs/_search
{
  "query": {
    "range": {
      "client_ip": {
        "gte": "192.168.0.0",
        "lte": "192.168.255.255"
      }
    }
  }
}

Completion 类型（自动补全）

PUT /articles
{
  "mappings": {
    "properties": {
      "title_suggest": {
        "type": "completion"
      }
    }
  }
}

# 查询建议
POST /articles/_search
{
  "suggest": {
    "title-suggest": {
      "prefix": "elas",
      "completion": {
        "field": "title_suggest"
      }
    }
  }
}

动态映射

动态映射规则

当 Elasticsearch 遇到未知字段时，会根据动态映射规则自动添加字段：

JSON 类型	Elasticsearch 类型
null	不添加字段
true/false	boolean
整数	long
浮点数	float
字符串（符合日期格式）	date
字符串（符合数值格式）	float 或 long
字符串（其他）	text + keyword 子字段
对象	object
数组	取第一个元素的类型

动态映射控制

PUT /articles
{
  "mappings": {
    "dynamic": "true",     # true、false、strict、runtime
    "properties": {
      "title": { "type": "text" }
    }
  }
}

dynamic 参数值：

值	说明
`true`	自动添加新字段（默认）
`false`	忽略新字段，不索引不搜索
`strict`	遇到新字段报错
`runtime`	新字段作为运行时字段

动态模板

动态模板允许根据字段名或类型自定义映射：

PUT /articles
{
  "mappings": {
    "dynamic_templates": [
      {
        "strings_as_keyword": {
          "match_mapping_type": "string",
          "match": "*_code",
          "mapping": {
            "type": "keyword"
          }
        }
      },
      {
        "strings_as_text": {
          "match_mapping_type": "string",
          "mapping": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    ]
  }
}

匹配规则：

match：匹配字段名（支持通配符）
unmatch：排除匹配的字段
match_mapping_type：匹配 JSON 类型
path_match：匹配完整路径
path_unmatch：排除路径匹配

字段属性详解

索引属性

PUT /articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "index": true           # 是否索引（默认 true）
      },
      "content_preview": {
        "type": "text",
        "index": false          # 不索引，不能搜索
      }
    }
  }
}

index: false 适合只用于展示、不需要搜索的字段。

Doc Values

Doc Values 是列式存储，用于排序、聚合、脚本访问：

PUT /articles
{
  "mappings": {
    "properties": {
      "views": {
        "type": "integer",
        "doc_values": true      # 默认 true
      },
      "internal_notes": {
        "type": "text",
        "doc_values": false     # 不用于排序聚合，可禁用以节省空间
      }
    }
  }
}

Norms

Norms 存储评分因子，用于计算相关性评分：

PUT /articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "norms": true           # 默认 true
      },
      "tags": {
        "type": "text",
        "norms": false          # 不需要评分时禁用
      }
    }
  }
}

如果字段只用于过滤不用于评分，禁用 norms 可以节省磁盘空间。

Index Options

控制索引存储的信息级别：

PUT /articles
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "index_options": "positions"    # docs、freqs、positions、offsets
      }
    }
  }
}

选项	存储信息	用途
`docs`	只存文档号	只需判断是否存在
`freqs`	文档号 + 词频	计算评分
`positions`	+ 位置信息	短语查询、临近查询
`offsets`	+ 字符偏移	高亮显示

Store

单独存储字段值：

PUT /articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "store": true          # 单独存储
      },
      "content": {
        "type": "text"
      }
    }
  }
}

# 只获取存储的字段
GET /articles/_doc/1?stored_fields=title

默认情况下，字段值存储在 _source 中。store: true 会额外存储一份，适合从大文档中只提取少量字段的场景。

copy_to

将多个字段值复制到一个字段：

PUT /articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "copy_to": "full_content"
      },
      "content": {
        "type": "text",
        "copy_to": "full_content"
      },
      "author": {
        "type": "keyword",
        "copy_to": "full_content"
      },
      "full_content": {
        "type": "text",
        "store": true
      }
    }
  }
}

# 搜索时只需查询 full_content 字段
GET /articles/_search
{
  "query": {
    "match": {
      "full_content": "搜索关键词"
    }
  }
}

copy_to 不会复制到 _source，只用于索引。full_content 字段不会出现在文档源数据中。

Ignore Above

忽略超长字符串：

PUT /articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "keyword",
        "ignore_above": 256     # 超过 256 字符的值不被索引
      }
    }
  }
}

分析器配置

内置分析器

分析器	说明
`standard`	默认分析器，按空格和标点分词
`simple`	按非字母分词，转小写
`whitespace`	只按空格分词
`stop`	standard + 停用词过滤
`keyword`	不分词，整体作为一个词
`pattern`	使用正则表达式分词
`language`	特定语言分析器，如 english、chinese

测试分析器

# 测试标准分析器
GET /_analyze
{
  "analyzer": "standard",
  "text": "Hello World, Elasticsearch!"
}

# 测试特定字段的分析器
GET /articles/_analyze
{
  "field": "title",
  "text": "测试分词效果"
}

# 测试自定义分析器组件
GET /_analyze
{
  "tokenizer": "standard",
  "filter": ["lowercase", "stop"],
  "text": "The Quick Brown Fox"
}

自定义分析器

PUT /articles
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_stopwords",
            "my_synonyms"
          ]
        }
      },
      "filter": {
        "my_stopwords": {
          "type": "stop",
          "stopwords": ["的", "是", "在", "了"]
        },
        "my_synonyms": {
          "type": "synonym",
          "synonyms": [
            "手机,移动电话,智能机",
            "电脑,计算机,PC"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "my_analyzer"
      }
    }
  }
}

索引分析器 vs 搜索分析器

PUT /articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "ik_max_word",       # 索引时使用
        "search_analyzer": "ik_smart"    # 搜索时使用
      }
    }
  }
}

为什么使用不同的分析器？

索引时使用细粒度分词（如 ik_max_word），尽可能多地切分词项，提高召回率
搜索时使用粗粒度分词（如 ik_smart），保留合理的分词结果，提高准确率

更新映射

添加新字段

PUT /articles/_mapping
{
  "properties": {
    "summary": {
      "type": "text"
    },
    "rating": {
      "type": "float"
    }
  }
}

修改已存在字段的限制

某些属性可以修改：

PUT /articles/_mapping
{
  "properties": {
    "title": {
      "type": "text",
      "norms": false    # 可以禁用 norms
    }
  }
}

可以修改的属性：

norms
其他分析器相关设置

小结

本章我们深入学习了 Elasticsearch 映射配置的核心知识：

核心数据类型：text、keyword、数值、日期、布尔、二进制、范围
复杂数据类型：对象、嵌套、地理、IP、Completion
动态映射：自动类型推断、动态模板
字段属性：index、doc_values、norms、index_options、store、copy_to
分析器配置：内置分析器、自定义分析器、索引/搜索分析器

练习

创建一个商品索引，定义合适的字段类型，包括名称、价格、分类、标签、库存等
使用 nested 类型存储商品的多规格属性（颜色、尺码）
使用 copy_to 实现商品名称和描述的联合搜索
配置一个自定义分析器，包含停用词和同义词

映射概述​

显式映射 vs 动态映射​

映射的不可变性​

核心数据类型​

字符串类型​

数值类型​

日期类型​

布尔类型​

二进制类型​

范围类型​

复杂数据类型​

对象类型​

嵌套类型（Nested）​

地理类型​

IP 类型​

Completion 类型（自动补全）​

动态映射​

动态映射规则​

动态映射控制​

动态模板​

字段属性详解​

索引属性​

Doc Values​

Norms​

Index Options​

Store​

copy_to​

Ignore Above​

分析器配置​

内置分析器​

测试分析器​

自定义分析器​

索引分析器 vs 搜索分析器​

更新映射​

添加新字段​

修改已存在字段的限制​

小结​

练习​

参考资料​