跳到主要内容

MySQL 窗口函数 (Window Functions)

窗口函数(Window Functions)是 SQL 中强大的高级特性,允许对一组相关行(窗口)进行计算,同时保留每行的详细信息。与聚合函数不同,窗口函数不会将多行合并为一行,而是为每行返回一个值。

什么是窗口函数?

窗口函数对一组行(称为"窗口")执行计算,并为每一行返回一个值。窗口函数可以用于:

  • 排名计算(ROW_NUMBER、RANK、DENSE_RANK)
  • 前后行访问(LAG、LEAD)
  • 累计计算(累计求和、移动平均)
  • 分组内的聚合计算

窗口函数 vs 聚合函数

-- 聚合函数:将多行合并为一行
SELECT category, SUM(amount) AS total
FROM orders
GROUP BY category;

-- 窗口函数:保留所有行,添加计算结果
SELECT
id,
category,
amount,
SUM(amount) OVER (PARTITION BY category) AS category_total
FROM orders;

基本语法

函数名 OVER (
PARTITION BY 分组列
ORDER BY 排序列
窗口框架子句
)

各部分说明

子句说明是否必需
PARTITION BY将数据分组,类似 GROUP BY可选
ORDER BY窗口内的排序方式可选
窗口框架定义窗口的范围可选

排名函数

ROW_NUMBER()

为每一行分配唯一的序号,从 1 开始,相同值也会有不同序号。

SELECT 
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS rank
FROM students;

结果:

| name  | score | rank |
|-------|-------|------|
| 张三 | 95 | 1 |
| 李四 | 90 | 2 |
| 王五 | 90 | 3 |
| 赵六 | 85 | 4 |

RANK()

计算排名,相同值获得相同排名,但会跳过后续排名。

SELECT 
name,
score,
RANK() OVER (ORDER BY score DESC) AS rank
FROM students;

结果:

| name  | score | rank |
|-------|-------|------|
| 张三 | 95 | 1 |
| 李四 | 90 | 2 |
| 王五 | 90 | 2 |
| 赵六 | 85 | 4 | -- 跳过了第3名

DENSE_RANK()

计算排名,相同值获得相同排名,但不会跳过后续排名。

SELECT 
name,
score,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM students;

结果:

| name  | score | dense_rank |
|-------|-------|------------|
| 张三 | 95 | 1 |
| 李四 | 90 | 2 |
| 王五 | 90 | 2 |
| 赵六 | 85 | 3 | -- 不跳过,继续第3名

三种排名函数对比

SELECT 
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
RANK() OVER (ORDER BY score DESC) AS rank,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM students;

结果:

| name  | score | row_num | rank | dense_rank |
|-------|-------|---------|------|------------|
| 张三 | 95 | 1 | 1 | 1 |
| 李四 | 90 | 2 | 2 | 2 |
| 王五 | 90 | 3 | 2 | 2 |
| 赵六 | 85 | 4 | 4 | 3 |
| 孙七 | 80 | 5 | 5 | 4 |

分组排名(PARTITION BY)

使用 PARTITION BY 在分组内进行排名:

-- 每个班级内的成绩排名
SELECT
class,
name,
score,
RANK() OVER (PARTITION BY class ORDER BY score DESC) AS class_rank
FROM students;

结果:

| class | name  | score | class_rank |
|-------|-------|-------|------------|
| A | 张三 | 95 | 1 |
| A | 李四 | 90 | 2 |
| A | 王五 | 85 | 3 |
| B | 赵六 | 92 | 1 |
| B | 孙七 | 88 | 2 |
| B | 周八 | 85 | 3 |

LAG 和 LEAD 函数

LAGLEAD 用于访问前一行或后一行的数据,常用于计算环比增长、时间序列分析。

LAG() - 获取前一行

LAG(列名, 偏移量, 默认值)
-- 计算日销售额和前一天的对比
SELECT
date,
amount,
LAG(amount) OVER (ORDER BY date) AS prev_amount,
amount - LAG(amount) OVER (ORDER BY date) AS diff
FROM daily_sales;

结果:

| date       | amount | prev_amount | diff |
|------------|--------|-------------|------|
| 2024-01-01 | 1000 | NULL | NULL |
| 2024-01-02 | 1200 | 1000 | 200 |
| 2024-01-03 | 1150 | 1200 | -50 |
| 2024-01-04 | 1300 | 1150 | 150 |

LEAD() - 获取后一行

LEAD(列名, 偏移量, 默认值)
-- 查看下一天的销售额
SELECT
date,
amount,
LEAD(amount) OVER (ORDER BY date) AS next_amount
FROM daily_sales;

计算增长率

SELECT 
date,
amount,
LAG(amount, 1, 0) OVER (ORDER BY date) AS prev_amount,
ROUND((amount - LAG(amount, 1, 0) OVER (ORDER BY date)) /
LAG(amount, 1, 0) OVER (ORDER BY date) * 100, 2) AS growth_rate
FROM daily_sales
WHERE LAG(amount, 1, 0) OVER (ORDER BY date) > 0;

聚合窗口函数

聚合函数(SUM、AVG、COUNT、MAX、MIN)也可以作为窗口函数使用。

累计求和

-- 计算累计销售额
SELECT
date,
amount,
SUM(amount) OVER (ORDER BY date) AS running_total
FROM daily_sales;

结果:

| date       | amount | running_total |
|------------|--------|---------------|
| 2024-01-01 | 1000 | 1000 |
| 2024-01-02 | 1200 | 2200 |
| 2024-01-03 | 1150 | 3350 |
| 2024-01-04 | 1300 | 4650 |

分组累计求和

-- 每个部门的累计工资
SELECT
department,
name,
salary,
SUM(salary) OVER (PARTITION BY department ORDER BY name) AS dept_running_total
FROM employees;

移动平均

-- 计算3天移动平均值
SELECT
date,
amount,
AVG(amount) OVER (
ORDER BY date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3
FROM daily_sales;

结果:

| date       | amount | moving_avg_3 |
|------------|--------|--------------|
| 2024-01-01 | 1000 | 1000.00 |
| 2024-01-02 | 1200 | 1100.00 |
| 2024-01-03 | 1150 | 1116.67 |
| 2024-01-04 | 1300 | 1216.67 |
| 2024-01-05 | 1250 | 1233.33 |

窗口框架(Frame Clause)

窗口框架定义了窗口函数操作的行范围。

语法

ROWS BETWEEN 起点 AND 终点
-- 或
RANGE BETWEEN 起点 AND 终点

框架边界

边界说明
UNBOUNDED PRECEDING窗口起点(分区的第一行)
n PRECEDING当前行之前的第 n 行
CURRENT ROW当前行
n FOLLOWING当前行之后的第 n 行
UNBOUNDED FOLLOWING窗口终点(分区的最后一行)

常用框架示例

-- 从分区起点到当前行(默认行为)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- 当前行前后各2行
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING

-- 当前行到分区终点
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

-- 前3行到当前行
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW

ROWS vs RANGE

  • ROWS:基于物理行数
  • RANGE:基于逻辑值范围
-- ROWS: 前2行到当前行(物理行)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

-- RANGE: 当前值减100到当前值加100的行(逻辑范围)
RANGE BETWEEN 100 PRECEDING AND 100 FOLLOWING

NTILE 函数

将行分成指定数量的桶,并为每行分配桶号。

-- 将学生按成绩分成4个等级
SELECT
name,
score,
NTILE(4) OVER (ORDER BY score DESC) AS quartile
FROM students;

结果:

| name  | score | quartile |
|-------|-------|----------|
| 张三 | 95 | 1 | -- 前25%
| 李四 | 92 | 1 |
| 王五 | 90 | 2 | -- 25%-50%
| 赵六 | 88 | 2 |
| 孙七 | 85 | 3 | -- 50%-75%
| 周八 | 82 | 3 |
| 吴九 | 78 | 4 | -- 后25%
| 郑十 | 75 | 4 |

FIRST_VALUE 和 LAST_VALUE

获取窗口内第一个或最后一个值。

SELECT 
date,
amount,
FIRST_VALUE(amount) OVER (ORDER BY date) AS first_amount,
LAST_VALUE(amount) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_amount
FROM daily_sales;
注意

LAST_VALUE 默认的窗口框架是 RANGE UNBOUNDED PRECEDING AND CURRENT ROW,这会导致结果不正确。需要显式指定 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 才能获取真正的最后一个值。

NTH_VALUE

获取窗口内第 n 个值。

SELECT 
date,
amount,
NTH_VALUE(amount, 2) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS second_amount
FROM daily_sales;

实战案例

案例1:查找每个部门的最高薪资员工

WITH ranked_employees AS (
SELECT
department,
name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees
)
SELECT department, name, salary
FROM ranked_employees
WHERE salary_rank = 1;

案例2:计算同比和环比增长

SELECT 
year,
month,
revenue,
-- 同比(与去年同期比较)
LAG(revenue, 12) OVER (ORDER BY year, month) AS last_year_revenue,
-- 环比(与上月比较)
LAG(revenue, 1) OVER (ORDER BY year, month) AS last_month_revenue,
-- 环比增长率
ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY year, month)) /
LAG(revenue, 1) OVER (ORDER BY year, month) * 100, 2) AS mom_growth
FROM monthly_revenue;

案例3:计算每个用户的累计消费和排名

SELECT 
user_id,
order_date,
amount,
-- 累计消费
SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS cumulative_amount,
-- 消费排名(按金额降序)
RANK() OVER (ORDER BY amount DESC) AS amount_rank,
-- 消费百分比
ROUND(amount / SUM(amount) OVER () * 100, 2) AS percent_of_total
FROM orders;

案例4:计算移动平均和标准差

SELECT 
date,
price,
-- 7天移动平均
AVG(price) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS ma_7,
-- 30天移动平均
AVG(price) OVER (
ORDER BY date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS ma_30,
-- 7天移动标准差
STDDEV(price) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS std_7
FROM stock_prices;

案例5:去除重复数据

-- 删除重复记录,只保留每个用户的最新记录
WITH ranked AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) AS rn
FROM user_data
)
DELETE FROM user_data
WHERE id IN (
SELECT id FROM ranked WHERE rn > 1
);

窗口函数执行顺序

理解 SQL 查询的执行顺序对正确使用窗口函数很重要:

  1. FROM / JOIN
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. 窗口函数
  6. SELECT
  7. DISTINCT
  8. ORDER BY
  9. LIMIT
注意

窗口函数在 WHERE、GROUP BY、HAVING 之后执行,因此不能在这些子句中直接引用窗口函数的结果。如果需要这样做,应使用 CTE 或子查询。

性能优化建议

  1. 合理使用 PARTITION BY:分区越少,性能越好
  2. 避免过度使用窗口函数:能用简单聚合函数解决的不要用窗口函数
  3. 注意索引:ORDER BY 和 PARTITION BY 的列应该有索引
  4. 限制窗口大小:使用框架子句限制处理的行数

小结

本章我们学习了:

  1. 窗口函数概念:对一组行进行计算,保留每行详情
  2. 排名函数:ROW_NUMBER、RANK、DENSE_RANK、NTILE
  3. 导航函数:LAG、LEAD、FIRST_VALUE、LAST_VALUE、NTH_VALUE
  4. 聚合窗口函数:SUM、AVG、COUNT、MAX、MIN
  5. 窗口框架:ROWS 和 RANGE 子句定义窗口范围
  6. PARTITION BY:分组计算
  7. 实战案例:累计计算、增长率、移动平均等

练习

  1. 查询每个部门的员工,按薪资从高到低排名
  2. 计算每天的销售额和前一天的对比(增长/下降)
  3. 计算每个用户的累计消费金额
  4. 查询每个类别中价格最高的前3个商品
  5. 计算7天移动平均温度