数据建模

本章介绍 MongoDB 的数据建模模式，包括文档设计原则、关系处理和常见模式。

数据建模原则

1. 根据查询模式设计

先分析应用的查询模式，再决定数据如何存储：

// 如果经常按类别查询产品
db.products.createIndex({ category: 1 })

// 如果经常按用户查询订单
db.orders.createIndex({ customerId: 1 })

2. 优先考虑嵌入

在 MongoDB 中，优先考虑将相关数据嵌入同一个文档：

// 好的做法：订单包含所有商品信息
{
  orderId: "ORD001",
  customer: { name: "张三", id: 123 },
  items: [
    { product: "iPhone", price: 999, qty: 1 },
    { product: "Case", price: 30, qty: 2 }
  ]
}

3. 避免过度嵌入

数据量差异大的字段不应嵌入：

// 不好的做法：订单包含大量历史评论
{
  orderId: "ORD001",
  comments: [/* 1000条评论 */]  // 文档过大
}

// 好的做法：评论单独存储
{
  _id: ObjectId("..."),
  orderId: "ORD001",
  comment: "great product"
}

关系建模

一对一关系

// 嵌入方式（推荐）
{
  username: "zhangsan",
  profile: {
    bio: "Software engineer",
    avatar: "https://...",
    location: "Beijing"
  }
}

// 引用方式（如果 profile 独立变化）
{
  _id: ObjectId("..."),
  username: "zhangsan",
  profileId: ObjectId("...")
}

一对多关系

// 方式1：嵌入（适用于有限数量的子文档）
{
  name: "Order 1",
  items: [
    { product: "iPhone", qty: 1 },
    { product: "Case", qty: 2 }
  ]
}

// 方式2：引用（适用于子文档数量不固定或很大）
// orders 集合
{
  _id: 1,
  customerId: 100,
  total: 1000
}

// items 集合
{ orderId: 1, product: "iPhone", qty: 1 }
{ orderId: 1, product: "Case", qty: 2 }

多对多关系

// 方式1：嵌入（适用于数据量小）
{
  studentId: 1,
  courses: ["Math", "Physics", "Chemistry"]
}

// 方式2：引用（推荐，更灵活）
// students 集合
{ _id: 1, name: "Alice", courseIds: [101, 102, 103] }

// courses 集合
{ _id: 101, name: "Math" }
{ _id: 102, name: "Physics" }
{ _id: 103, name: "Chemistry" }

设计模式

1. 扩展模式（Extended Pattern）

当需要添加新字段时使用：

// 初始设计
{
  name: "Product",
  price: 99
}

// 添加新属性（无需修改现有文档）
{
  name: "New Product",
  price: 199,
  specs: { color: "red", weight: "500g" }
}

2. 属性模式（Attribute Pattern）

当文档有大量可选属性时使用：

// 问题：每个产品属性不同，难以索引
{
  name: "Laptop",
  screenSize: 15,
  ram: "16GB",
  storage: "512GB",
  color: "silver"
}

{
  name: "T-Shirt",
  size: "L",
  material: "cotton",
  color: "blue"
}

// 解决方案：使用属性模式
{
  _id: "prod1",
  name: "Laptop",
  attributes: [
    { key: "screenSize", value: 15, unit: "inch" },
    { key: "ram", value: "16GB" },
    { key: "storage", value: "512GB" }
  ]
}

// 创建索引以便快速查询
db.products.createIndex({ "attributes.key": 1, "attributes.value": 1 })

3. 桶模式（Bucket Pattern）

将数据分组存储，适用于时间序列数据：

// 不好的做法：每天一条记录，数据量巨大
// sensors 集合有 1000000+ 条记录

// 解决方案：按时间段分组
{
  sensorId: 1,
  date: "2024-01-15",
  readings: [
    { time: "00:00", temp: 20.5 },
    { time: "00:01", temp: 20.6 },
    // ... 1440 条（每分钟一条）
  ],
  minTemp: 18.0,
  maxTemp: 25.0,
  avgTemp: 21.5
}

4. 异常值模式（Outlier Pattern）

处理极端数据差异：

// 大多数用户有少量订单，少数用户有上万订单
// 方案：正常用户嵌入，大户引用

// 普通用户
{ userId: 1, orders: [{ id: 1 }, { id: 2 }] }

// 特殊用户（引用到外部）
{ userId: 999, orderCount: 15000, orderIds: [...] }

5. 预计算模式（Pre-aggregation Pattern）

存储计算结果以提高性能：

// 每次查询都计算很慢

// 方案：预先计算并存储
// monthly_sales 集合
{
  year: 2024,
  month: 1,
  totalSales: 150000,
  orderCount: 1234,
  topProducts: [
    { productId: 1, count: 100 },
    { productId: 2, count: 80 }
  ]
}

// 更新预聚合数据
db.monthly_sales.updateOne(
  { year: 2024, month: 1 },
  { 
    $inc: { totalSales: orderTotal, orderCount: 1 },
    $push: { topProducts: { productId: pId, count: qty } }
  }
)

模式验证

MongoDB 4.0+ 支持 JSON Schema 验证：

// 创建带验证规则的集合
db.createCollection("products", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "price"],
      properties: {
        name: {
          bsonType: "string",
          description: "产品名称"
        },
        price: {
          bsonType: "number",
          minimum: 0
        },
        category: {
          bsonType: "string",
          enum: ["electronics", "clothing", "food"]
        }
      }
    }
  }
})

数据建模决策流程

实战示例

1. 电商订单系统

// 订单文档
{
  _id: ObjectId("..."),
  orderNumber: "ORD-2024-001",
  customerId: ObjectId("..."),
  customer: {
    name: "张三",
    email: "[email protected]",
    address: { city: "Beijing", detail: "..." }
  },
  items: [
    { productId: ObjectId("..."), name: "iPhone", price: 999, quantity: 1 },
    { productId: ObjectId("..."), name: "Case", price: 30, quantity: 2 }
  ],
  subtotal: 1059,
  tax: 106,
  shipping: 10,
  total: 1175,
  status: "pending",
  createdAt: ISODate("2024-01-15T10:30:00Z"),
  updatedAt: ISODate("2024-01-15T10:30:00Z")
}

// 索引设计
db.orders.createIndex({ customerId: 1, createdAt: -1 })
db.orders.createIndex({ orderNumber: 1 }, { unique: true })
db.orders.createIndex({ status: 1, createdAt: -1 })

2. 博客系统

// 文章文档
{
  _id: ObjectId("..."),
  title: "MongoDB 教程",
  slug: "mongodb-tutorial",
  content: "...",
  author: {
    id: ObjectId("..."),
    name: "张三"
  },
  tags: ["MongoDB", "Database", "NoSQL"],
  category: "技术教程",
  status: "published",
  viewCount: 1000,
  createdAt: ISODate("2024-01-15T10:30:00Z"),
  publishedAt: ISODate("2024-01-15T12:00:00Z")
}

// 评论（引用）
{
  _id: ObjectId("..."),
  articleId: ObjectId("..."),
  author: "用户A",
  content: "很好的教程！",
  createdAt: ISODate("2024-01-15T14:00:00Z")
}

// 索引
db.articles.createIndex({ status: 1, publishedAt: -1 })
db.articles.createIndex({ tags: 1 })
db.articles.createIndex({ "author.id": 1 })
db.comments.createIndex({ articleId: 1, createdAt: -1 })

小结

本章我们学习了：

数据建模原则：根据查询模式设计、优先嵌入、避免过度嵌入
关系建模：一对一、一对多、多对多的实现方式
设计模式：扩展、属性、桶、异常值、预计算模式
模式验证：使用 JSON Schema 验证文档结构
决策流程：如何选择嵌入还是引用

数据建模原则​

1. 根据查询模式设计​

2. 优先考虑嵌入​

3. 避免过度嵌入​

关系建模​

一对一关系​

一对多关系​

多对多关系​

设计模式​

1. 扩展模式（Extended Pattern）​

2. 属性模式（Attribute Pattern）​

3. 桶模式（Bucket Pattern）​

4. 异常值模式（Outlier Pattern）​

5. 预计算模式（Pre-aggregation Pattern）​

模式验证​

数据建模决策流程​

实战示例​

1. 电商订单系统​

2. 博客系统​

小结​