跳到主要内容

MySQL 窗口函数 (Window Functions)

窗口函数(Window Functions)是 SQL 中强大的高级特性,允许对一组相关行(窗口)进行计算,同时保留每行的详细信息。与聚合函数不同,窗口函数不会将多行合并为一行,而是为每行返回一个值。

什么是窗口函数?

窗口函数对一组行(称为"窗口")执行计算,并为每一行返回一个值。窗口函数可以用于:

  • 排名计算(ROW_NUMBER、RANK、DENSE_RANK)
  • 前后行访问(LAG、LEAD)
  • 累计计算(累计求和、移动平均)
  • 分组内的聚合计算

窗口函数 vs 聚合函数

-- 聚合函数:将多行合并为一行
SELECT category, SUM(amount) AS total
FROM orders
GROUP BY category;

-- 窗口函数:保留所有行,添加计算结果
SELECT
id,
category,
amount,
SUM(amount) OVER (PARTITION BY category) AS category_total
FROM orders;

核心区别

特性聚合函数窗口函数
结果行数减少为一行保持原行数
GROUP BY必需不需要
保留明细
适用场景汇总统计排名、累计、对比

基本语法

函数名 OVER (
PARTITION BY 分组列
ORDER BY 排序列
窗口框架子句
)

各部分说明

子句说明是否必需
PARTITION BY将数据分组,类似 GROUP BY可选
ORDER BY窗口内的排序方式可选
窗口框架定义窗口的范围可选

语法示例

-- 最简单的窗口函数
SELECT name, salary,
AVG(salary) OVER () AS avg_salary
FROM employees;

-- 带 PARTITION BY 的窗口函数
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

-- 带 ORDER BY 的窗口函数
SELECT name, salary,
SUM(salary) OVER (ORDER BY salary) AS running_total
FROM employees;

-- 完整语法
SELECT name, department, salary,
AVG(salary) OVER (
PARTITION BY department
ORDER BY hire_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg
FROM employees;

排名函数

排名函数是最常用的窗口函数类型,用于计算行在分组内的排名位置。

ROW_NUMBER()

为每一行分配唯一的序号,从 1 开始,相同值也会有不同序号。

SELECT 
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS rank
FROM students;

结果:

| name  | score | rank |
|-------|-------|------|
| 张三 | 95 | 1 |
| 李四 | 90 | 2 |
| 王五 | 90 | 3 |
| 赵六 | 85 | 4 |

使用场景

  • 分页查询(获取每组前 N 条记录)
  • 去除重复数据
  • 生成唯一序号

RANK()

计算排名,相同值获得相同排名,但会跳过后续排名。

SELECT 
name,
score,
RANK() OVER (ORDER BY score DESC) AS rank
FROM students;

结果:

| name  | score | rank |
|-------|-------|------|
| 张三 | 95 | 1 |
| 李四 | 90 | 2 |
| 王五 | 90 | 2 |
| 赵六 | 85 | 4 | -- 跳过了第3名

使用场景

  • 体育比赛排名
  • 考试成绩排名(允许并列)

DENSE_RANK()

计算排名,相同值获得相同排名,不会跳过后续排名。

SELECT 
name,
score,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM students;

结果:

| name  | score | dense_rank |
|-------|-------|------------|
| 张三 | 95 | 1 |
| 李四 | 90 | 2 |
| 王五 | 90 | 2 |
| 赵六 | 85 | 3 | -- 不跳过,继续编号

使用场景

  • 奖项分配(需要连续的名次)
  • 等级划分

三种排名函数对比

SELECT 
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
RANK() OVER (ORDER BY score DESC) AS rank,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM students;

结果:

| name  | score | row_num | rank | dense_rank |
|-------|-------|---------|------|------------|
| 张三 | 95 | 1 | 1 | 1 |
| 李四 | 90 | 2 | 2 | 2 |
| 王五 | 90 | 3 | 2 | 2 |
| 赵六 | 85 | 4 | 4 | 3 |
| 孙七 | 80 | 5 | 5 | 4 |

选择建议

  • 需要唯一序号:使用 ROW_NUMBER()
  • 需要真实排名(跳过):使用 RANK()
  • 需要连续排名:使用 DENSE_RANK()

分组排名(PARTITION BY)

使用 PARTITION BY 在分组内进行排名:

-- 每个班级内的成绩排名
SELECT
class,
name,
score,
RANK() OVER (PARTITION BY class ORDER BY score DESC) AS class_rank
FROM students;

结果:

| class | name  | score | class_rank |
|-------|-------|-------|------------|
| A | 张三 | 95 | 1 |
| A | 李四 | 90 | 2 |
| A | 王五 | 85 | 3 |
| B | 赵六 | 92 | 1 |
| B | 孙七 | 88 | 2 |
| B | 周八 | 85 | 3 |

获取每组前 N 条记录

-- 获取每个部门薪资前 3 的员工
WITH ranked AS (
SELECT
department,
name,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT department, name, salary
FROM ranked
WHERE rn <= 3;

LAG 和 LEAD 函数

LAGLEAD 用于访问前一行或后一行的数据,常用于计算环比增长、时间序列分析。

LAG() - 获取前一行

LAG(列名, 偏移量, 默认值)
-- 计算日销售额和前一天的对比
SELECT
date,
amount,
LAG(amount) OVER (ORDER BY date) AS prev_amount,
amount - LAG(amount) OVER (ORDER BY date) AS diff
FROM daily_sales;

结果:

| date       | amount | prev_amount | diff |
|------------|--------|-------------|------|
| 2024-01-01 | 1000 | NULL | NULL |
| 2024-01-02 | 1200 | 1000 | 200 |
| 2024-01-03 | 1150 | 1200 | -50 |
| 2024-01-04 | 1300 | 1150 | 150 |

参数说明

  • 第一个参数:要获取的列名
  • 第二个参数:偏移量(默认 1)
  • 第三个参数:默认值(当没有前一行时返回)
-- 使用默认值避免 NULL
SELECT
date,
amount,
LAG(amount, 1, 0) OVER (ORDER BY date) AS prev_amount
FROM daily_sales;

LEAD() - 获取后一行

LEAD(列名, 偏移量, 默认值)
-- 查看下一天的销售额
SELECT
date,
amount,
LEAD(amount) OVER (ORDER BY date) AS next_amount
FROM daily_sales;

计算增长率

SELECT 
date,
amount,
LAG(amount, 1, 0) OVER (ORDER BY date) AS prev_amount,
CASE
WHEN LAG(amount, 1, 0) OVER (ORDER BY date) > 0
THEN ROUND((amount - LAG(amount, 1, 0) OVER (ORDER BY date)) /
LAG(amount, 1, 0) OVER (ORDER BY date) * 100, 2)
ELSE NULL
END AS growth_rate
FROM daily_sales;

分组内的 LAG/LEAD

-- 计算每个用户的连续登录间隔
SELECT
user_id,
login_date,
LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date) AS prev_login,
DATEDIFF(login_date, LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date)) AS days_diff
FROM user_logins;

聚合窗口函数

聚合函数(SUM、AVG、COUNT、MAX、MIN)也可以作为窗口函数使用。

累计求和

-- 计算累计销售额
SELECT
date,
amount,
SUM(amount) OVER (ORDER BY date) AS running_total
FROM daily_sales;

结果:

| date       | amount | running_total |
|------------|--------|---------------|
| 2024-01-01 | 1000 | 1000 |
| 2024-01-02 | 1200 | 2200 |
| 2024-01-03 | 1150 | 3350 |
| 2024-01-04 | 1300 | 4650 |

分组累计求和

-- 每个部门的累计工资
SELECT
department,
name,
salary,
SUM(salary) OVER (PARTITION BY department ORDER BY name) AS dept_running_total
FROM employees;

分组总计

-- 每行显示部门总计
SELECT
name,
department,
salary,
SUM(salary) OVER (PARTITION BY department) AS dept_total,
SUM(salary) OVER () AS grand_total
FROM employees;

移动平均

-- 计算3天移动平均值
SELECT
date,
amount,
AVG(amount) OVER (
ORDER BY date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3
FROM daily_sales;

结果:

| date       | amount | moving_avg_3 |
|------------|--------|--------------|
| 2024-01-01 | 1000 | 1000.00 |
| 2024-01-02 | 1200 | 1100.00 |
| 2024-01-03 | 1150 | 1116.67 |
| 2024-01-04 | 1300 | 1216.67 |
| 2024-01-05 | 1250 | 1233.33 |

窗口框架(Frame Clause)

窗口框架定义了窗口函数操作的行范围。

语法

ROWS BETWEEN 起点 AND 终点
-- 或
RANGE BETWEEN 起点 AND 终点

框架边界

边界说明
UNBOUNDED PRECEDING窗口起点(分区的第一行)
n PRECEDING当前行之前的第 n 行
CURRENT ROW当前行
n FOLLOWING当前行之后的第 n 行
UNBOUNDED FOLLOWING窗口终点(分区的最后一行)

常用框架示例

-- 从分区起点到当前行(默认行为)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- 当前行前后各 2 行
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING

-- 当前行到分区终点
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

-- 前 3 行到当前行
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW

ROWS vs RANGE

  • ROWS:基于物理行数
  • RANGE:基于逻辑值范围
-- ROWS: 前 2 行到当前行(物理行)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW

-- RANGE: 当前值减100到当前值加100的行(逻辑范围)
RANGE BETWEEN 100 PRECEDING AND 100 FOLLOWING

框架示例:滑动窗口

-- 计算7天滑动平均
SELECT
date,
amount,
AVG(amount) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS ma_7
FROM daily_sales;

-- 计算当前行前后各3行的统计
SELECT
date,
amount,
MAX(amount) OVER (
ORDER BY date
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
) AS local_max,
MIN(amount) OVER (
ORDER BY date
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
) AS local_min
FROM daily_sales;

NTILE 函数

将行分成指定数量的桶,并为每行分配桶号。

-- 将学生按成绩分成4个等级
SELECT
name,
score,
NTILE(4) OVER (ORDER BY score DESC) AS quartile
FROM students;

结果:

| name  | score | quartile |
|-------|-------|----------|
| 张三 | 95 | 1 | -- 前25%
| 李四 | 92 | 1 |
| 王五 | 90 | 2 | -- 25%-50%
| 赵六 | 88 | 2 |
| 孙七 | 85 | 3 | -- 50%-75%
| 周八 | 82 | 3 |
| 吴九 | 78 | 4 | -- 后25%
| 郑十 | 75 | 4 |

使用场景

  • 数据分桶分析
  • 用户分层(如 VIP 等级)
  • A/B 测试分组

FIRST_VALUE 和 LAST_VALUE

获取窗口内第一个或最后一个值。

SELECT 
date,
amount,
FIRST_VALUE(amount) OVER (ORDER BY date) AS first_amount,
LAST_VALUE(amount) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_amount
FROM daily_sales;
注意

LAST_VALUE 默认的窗口框架是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,这会导致结果不正确。需要显式指定 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 才能获取真正的最后一个值。

分组内的首尾值

-- 获取每个部门的最高和最低薪资
SELECT
name,
department,
salary,
FIRST_VALUE(name) OVER (PARTITION BY department ORDER BY salary DESC) AS highest_paid,
FIRST_VALUE(name) OVER (PARTITION BY department ORDER BY salary ASC) AS lowest_paid
FROM employees;

NTH_VALUE

获取窗口内第 n 个值。

SELECT 
date,
amount,
NTH_VALUE(amount, 2) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS second_amount
FROM daily_sales;

PERCENT_RANK 和 CUME_DIST

PERCENT_RANK

计算某行在分区内的百分位排名(0 到 1):

SELECT 
name,
score,
PERCENT_RANK() OVER (ORDER BY score) AS percent_rank
FROM students;

CUME_DIST

计算某行在分区内的累积分布(小于等于当前值的行数占比):

SELECT 
name,
score,
CUME_DIST() OVER (ORDER BY score) AS cume_dist
FROM students;

实战案例

案例1:查找每个部门的最高薪资员工

WITH ranked_employees AS (
SELECT
department,
name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees
)
SELECT department, name, salary
FROM ranked_employees
WHERE salary_rank = 1;

案例2:计算同比和环比增长

SELECT 
year,
month,
revenue,
-- 同比(与去年同期比较)
LAG(revenue, 12) OVER (ORDER BY year, month) AS last_year_revenue,
-- 环比(与上月比较)
LAG(revenue, 1) OVER (ORDER BY year, month) AS last_month_revenue,
-- 环比增长率
ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY year, month)) /
LAG(revenue, 1) OVER (ORDER BY year, month) * 100, 2) AS mom_growth
FROM monthly_revenue;

案例3:计算每个用户的累计消费和排名

SELECT 
user_id,
order_date,
amount,
-- 累计消费
SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS cumulative_amount,
-- 消费排名(按金额降序)
RANK() OVER (ORDER BY amount DESC) AS amount_rank,
-- 消费百分比
ROUND(amount / SUM(amount) OVER () * 100, 2) AS percent_of_total
FROM orders;

案例4:计算移动平均和标准差

SELECT 
date,
price,
-- 7天移动平均
AVG(price) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS ma_7,
-- 30天移动平均
AVG(price) OVER (
ORDER BY date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS ma_30,
-- 7天移动标准差
STDDEV(price) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS std_7
FROM stock_prices;

案例5:去除重复数据

-- 删除重复记录,只保留每个用户的最新记录
DELETE FROM user_data
WHERE id IN (
SELECT id FROM (
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) AS rn
FROM user_data
) t
WHERE rn > 1
);

案例6:连续登录问题

-- 查找连续3天登录的用户
WITH login_groups AS (
SELECT
user_id,
login_date,
DATE_SUB(login_date, INTERVAL
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) DAY
) AS group_date
FROM user_logins
)
SELECT DISTINCT user_id
FROM login_groups
GROUP BY user_id, group_date
HAVING COUNT(*) >= 3;

窗口函数执行顺序

理解 SQL 查询的执行顺序对正确使用窗口函数很重要:

  1. FROM / JOIN
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. 窗口函数
  6. SELECT
  7. DISTINCT
  8. ORDER BY
  9. LIMIT
注意

窗口函数在 WHERE、GROUP BY、HAVING 之后执行,因此不能在这些子句中直接引用窗口函数的结果。如果需要这样做,应使用 CTE 或子查询。

-- 错误:不能在 WHERE 中使用窗口函数
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
WHERE rnk <= 3; -- 错误!

-- 正确:使用 CTE 或子查询
WITH ranked AS (
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT name, salary
FROM ranked
WHERE rnk <= 3;

性能优化建议

1. 合理使用 PARTITION BY

分区越少,性能越好。但也要避免分区过多。

-- 避免:对高基数列分区(如用户ID)
SUM(amount) OVER (PARTITION BY user_id)

-- 推荐:对低基数列分区(如部门、状态)
SUM(amount) OVER (PARTITION BY department)

2. 避免过度使用窗口函数

能用简单聚合函数解决的不要用窗口函数:

-- 如果只需要部门平均值,使用 GROUP BY
SELECT department, AVG(salary) FROM employees GROUP BY department;

-- 只有需要保留明细时才用窗口函数
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

3. 注意索引

ORDER BY 和 PARTITION BY 的列应该有索引:

-- 为窗口函数常用的排序列创建索引
CREATE INDEX idx_hire_date ON employees(hire_date);
CREATE INDEX idx_dept_salary ON employees(department, salary);

4. 限制窗口大小

使用框架子句限制处理的行数:

-- 避免无限制的窗口
SUM(amount) OVER () -- 全表累加

-- 推荐:限制窗口范围
SUM(amount) OVER (
ORDER BY date
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
)

窗口函数完整列表

排名函数

函数说明
ROW_NUMBER()行号(唯一)
RANK()排名(跳过)
DENSE_RANK()排名(不跳过)
PERCENT_RANK()百分位排名
CUME_DIST()累积分布
NTILE(n)分桶

导航函数

函数说明
LAG(expr, n, default)前 n 行的值
LEAD(expr, n, default)后 n 行的值
FIRST_VALUE(expr)窗口内第一个值
LAST_VALUE(expr)窗口内最后一个值
NTH_VALUE(expr, n)窗口内第 n 个值

聚合函数

函数说明
SUM(expr)求和
AVG(expr)平均值
COUNT(expr)计数
MAX(expr)最大值
MIN(expr)最小值

小结

本章我们学习了:

  1. 窗口函数概念:对一组行进行计算,保留每行详情
  2. 排名函数:ROW_NUMBER、RANK、DENSE_RANK、NTILE
  3. 导航函数:LAG、LEAD、FIRST_VALUE、LAST_VALUE、NTH_VALUE
  4. 聚合窗口函数:SUM、AVG、COUNT、MAX、MIN
  5. 窗口框架:ROWS 和 RANGE 子句定义窗口范围
  6. PARTITION BY:分组计算
  7. 实战案例:累计计算、增长率、移动平均等

练习

  1. 查询每个部门的员工,按薪资从高到低排名
  2. 计算每天的销售额和前一天的对比(增长/下降)
  3. 计算每个用户的累计消费金额
  4. 查询每个类别中价格最高的3个商品
  5. 计算7天移动平均温度

参考资料