跳到主要内容

窗口函数详解

窗口函数是 SQL 中最强大的分析工具之一,也是面试中的高频考点。它允许你在不减少行数的情况下进行聚合计算,解决许多传统聚合函数无法处理的问题。MySQL 8.0 正式引入了窗口函数,掌握它是从"会写 SQL"到"精通 SQL"的关键一步。

什么是窗口函数?

窗口函数(Window Function)是一种特殊的函数,它对一组查询行(称为"窗口")执行计算,为每一行返回一个结果。

与普通聚合函数的区别

理解窗口函数的关键是理解它与普通聚合函数的区别:

-- 普通聚合函数:多行合并为一行
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
-- 结果:每个部门一行

-- 窗口函数:每行都返回结果,不减少行数
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;
-- 结果:每个员工一行,但包含部门平均工资

核心区别

特性普通聚合函数窗口函数
结果行数减少为一行/组保持原行数不变
计算范围整个组可定义的窗口范围
能否访问其他行不能能(如 LAG、LEAD)
使用场景汇总统计排名、累计、对比分析

基本语法

窗口函数的基本语法结构如下:

函数名(参数) OVER (
PARTITION BY 分组表达式 -- 可选:定义分区
ORDER BY 排序表达式 -- 可选:定义排序
frame_clause -- 可选:定义窗口帧
)

语法拆解

1. PARTITION BY 子句

将数据分成多个分区,每个分区独立计算窗口函数。类似于 GROUP BY,但不会减少行数。

-- 计算每个员工的工资在其部门内的排名
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

如果不指定 PARTITION BY,整个结果集被视为一个分区。

2. ORDER BY 子句

在分区内定义排序顺序。对于排名函数(RANKDENSE_RANKROW_NUMBER)和偏移函数(LAGLEAD),这是必须的。

-- 按日期排序计算累计销售额
SELECT date, amount,
SUM(amount) OVER (ORDER BY date) AS running_total
FROM sales;

3. frame_clause 窗口帧

定义窗口的具体范围,用于聚合窗口函数(SUMAVG 等)。

-- 计算每行及其前两行的移动平均
SELECT date, amount,
AVG(amount) OVER (
ORDER BY date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg
FROM daily_sales;

窗口函数分类

MySQL 提供的窗口函数可以分为以下几类:

1. 排名函数

函数说明
ROW_NUMBER()为每行分配唯一的连续编号
RANK()分配排名,相同值并列,后续排名跳过
DENSE_RANK()分配排名,相同值并列,后续排名不跳过
NTILE(n)将分区内的行分成 n 个桶

2. 偏移函数

函数说明
LAG(expr, n, default)获取当前行前 n 行的值
LEAD(expr, n, default)获取当前行后 n 行的值
FIRST_VALUE(expr)获取窗口第一行的值
LAST_VALUE(expr)获取窗口最后一行的值
NTH_VALUE(expr, n)获取窗口第 n 行的值

3. 分布函数

函数说明
PERCENT_RANK()百分比排名(0 到 1)
CUME_DIST()累积分布值

4. 聚合窗口函数

普通的聚合函数(SUMAVGCOUNTMAXMIN)也可以作为窗口函数使用。

排名函数详解

排名函数是面试中最常考的窗口函数。三种排名函数的核心区别在于处理"并列"的方式。

ROW_NUMBER()

为每行分配唯一的连续编号,不考虑并列情况。

-- 示例数据:salary = [100, 100, 90, 80]
SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;

-- 结果:
-- | name | salary | row_num |
-- | A | 100 | 1 |
-- | B | 100 | 2 | <- 相同薪水,但编号不同
-- | C | 90 | 3 |
-- | D | 80 | 4 |

使用场景

  • 需要唯一编号时(如分页查询)
  • 随机抽取排名(当有并列时)

RANK()

分配排名,相同值获得相同排名,但后续排名会跳过。

-- 示例数据:salary = [100, 100, 90, 80]
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rank_num
FROM employees;

-- 结果:
-- | name | salary | rank_num |
-- | A | 100 | 1 |
-- | B | 100 | 1 | <- 相同薪水,相同排名
-- | C | 90 | 3 | <- 跳过了 2
-- | D | 80 | 4 |

使用场景

  • 竞赛排名(有并列时保留名次的"空缺")
  • 需要反映真实排名情况

DENSE_RANK()

分配排名,相同值获得相同排名,后续排名不跳过。

-- 示例数据:salary = [100, 100, 90, 80]
SELECT name, salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;

-- 结果:
-- | name | salary | dense_rank |
-- | A | 100 | 1 |
-- | B | 100 | 1 | <- 相同薪水,相同排名
-- | C | 90 | 2 | <- 不跳过,继续是 2
-- | D | 80 | 3 |

使用场景

  • 需要连续排名(如"薪水等级")
  • 查询"前 N 高"(如部门工资前三高)

排名函数对比表

场景函数选择
分页查询,需要唯一编号ROW_NUMBER()
竞赛排名,有空缺名次RANK()
连续排名,无空缺DENSE_RANK()
部门工资前三高DENSE_RANK()

偏移函数详解

偏移函数允许你访问"其他行"的数据,这是窗口函数独有的能力。

LAG 函数

获取当前行之前第 n 行的值。

-- 语法
LAG(expr, n, default) OVER (...)

-- expr: 要获取的列或表达式
-- n: 向前偏移的行数(默认 1)
-- default: 当没有前 n 行时的默认值(默认 NULL)

示例:计算环比增长

SELECT date, revenue,
LAG(revenue, 1) OVER (ORDER BY date) AS prev_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY date) AS growth,
ROUND((revenue - LAG(revenue) OVER (ORDER BY date))
/ LAG(revenue) OVER (ORDER BY date) * 100, 2) AS growth_rate
FROM daily_revenue;

-- 结果:
-- | date | revenue | prev_revenue | growth | growth_rate |
-- | 2024-01-01 | 1000 | NULL | NULL | NULL |
-- | 2024-01-02 | 1200 | 1000 | 200 | 20.00 |
-- | 2024-01-03 | 1150 | 1200 | -50 | -4.17 |

示例:判断连续登录

-- 找出连续 3 天登录的用户
SELECT DISTINCT user_id
FROM (
SELECT user_id, login_date,
LAG(login_date, 1) OVER (PARTITION BY user_id ORDER BY login_date) AS prev1,
LAG(login_date, 2) OVER (PARTITION BY user_id ORDER BY login_date) AS prev2
FROM login_log
) t
WHERE DATEDIFF(login_date, prev1) = 1
AND DATEDIFF(prev1, prev2) = 1;

LEAD 函数

获取当前行之后第 n 行的值。语法与 LAG 相同。

-- 示例:查看下一天的销售额
SELECT date, revenue,
LEAD(revenue, 1) OVER (ORDER BY date) AS next_revenue
FROM daily_revenue;

FIRST_VALUE 和 LAST_VALUE

获取窗口内第一行和最后一行的值。

-- 计算每个部门员工工资与最高工资的差距
SELECT name, department, salary,
FIRST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY salary DESC
) AS max_salary,
FIRST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY salary DESC
) - salary AS diff
FROM employees;

注意LAST_VALUE 的默认窗口范围是 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW,这可能导致意外结果。如果需要整个分区的最后一个值,应该显式指定窗口范围:

SELECT name, department, salary,
LAST_VALUE(salary) OVER (
PARTITION BY department
ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS max_salary
FROM employees;

聚合窗口函数

普通聚合函数配合 OVER 子句可以成为窗口函数,实现滚动计算。

累计求和

-- 计算累计销售额
SELECT date, amount,
SUM(amount) OVER (ORDER BY date) AS running_total
FROM sales;

-- 等价于
SELECT date, amount,
SUM(amount) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM sales;

分组内累计

-- 计算每个部门的累计销售额
SELECT department, month, revenue,
SUM(revenue) OVER (
PARTITION BY department
ORDER BY month
) AS dept_running_total
FROM monthly_revenue;

移动平均

-- 计算 3 天移动平均
SELECT date, amount,
AVG(amount) OVER (
ORDER BY date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS moving_avg_3
FROM daily_sales;

-- 计算 7 天移动平均(当前 + 前 6 天)
SELECT date, amount,
AVG(amount) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7
FROM daily_sales;

分组内统计

-- 计算每个员工工资占部门总工资的比例
SELECT name, department, salary,
SUM(salary) OVER (PARTITION BY department) AS dept_total,
ROUND(salary / SUM(salary) OVER (PARTITION BY department) * 100, 2) AS pct
FROM employees;

窗口帧详解

窗口帧(Frame)定义了窗口函数的计算范围,是高级应用的关键。

帧语法

{ROWS | RANGE} BETWEEN frame_start AND frame_end

帧边界

边界说明
UNBOUNDED PRECEDING分区的第一行
n PRECEDING当前行之前的第 n 行
CURRENT ROW当前行
n FOLLOWING当前行之后的第 n 行
UNBOUNDED FOLLOWING分区的最后一行

ROWS vs RANGE

  • ROWS:基于物理行号
  • RANGE:基于逻辑值(排序键的值)
-- ROWS:精确的行数
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING -- 前2行 + 当前行 + 后2行 = 5行

-- RANGE:值的范围(对于相同值的行会一起包含)
RANGE BETWEEN INTERVAL 7 DAY PRECEDING AND CURRENT ROW -- 过去7天的所有行

常用帧模式

-- 从开头到当前行(累计)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- 当前行到末尾
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

-- 整个分区
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

-- 滑动窗口(前后 n 行)
ROWS BETWEEN n PRECEDING AND n FOLLOWING

实战案例

案例一:部门工资前三高

-- 使用 DENSE_RANK 找出每个部门工资前三高的员工
WITH ranked AS (
SELECT name, department_id, salary,
DENSE_RANK() OVER (
PARTITION BY department_id
ORDER BY salary DESC
) AS rnk
FROM employees
)
SELECT r.name AS Employee, d.name AS Department, r.salary AS Salary
FROM ranked r
JOIN departments d ON r.department_id = d.id
WHERE r.rnk <= 3;

为什么用 DENSE_RANK 而不是 RANK?

如果部门有两个并列第一的工资,用 DENSE_RANK,下一个是第二高;用 RANK,下一个是第三高(跳过了第二)。题目要求的是"前三高",所以用 DENSE_RANK

案例二:连续出现三次的数字

-- 方法一:使用 LAG
SELECT DISTINCT num AS ConsecutiveNums
FROM (
SELECT num,
LAG(num, 1) OVER (ORDER BY id) AS prev1,
LAG(num, 2) OVER (ORDER BY id) AS prev2
FROM Logs
) t
WHERE num = prev1 AND num = prev2;

-- 方法二:使用自连接
SELECT DISTINCT l1.num AS ConsecutiveNums
FROM Logs l1
JOIN Logs l2 ON l1.id = l2.id - 1
JOIN Logs l3 ON l1.id = l3.id - 2
WHERE l1.num = l2.num AND l2.num = l3.num;

案例三:滚动累计直到达到阈值

-- 找出累计重量不超过 1000 的最后一个人
SELECT person_name
FROM (
SELECT person_name, weight, turn,
SUM(weight) OVER (ORDER BY turn) AS total_weight
FROM Queue
) t
WHERE total_weight <= 1000
ORDER BY total_weight DESC
LIMIT 1;

案例四:同比和环比计算

-- 年同比:与去年同期比较
-- 月环比:与上月比较
SELECT
year, month, revenue,
-- 月环比
LAG(revenue, 1) OVER (ORDER BY year, month) AS prev_month,
ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY year, month))
/ LAG(revenue, 1) OVER (ORDER BY year, month) * 100, 2) AS month_growth,
-- 年同比
LAG(revenue, 12) OVER (ORDER BY year, month) AS prev_year,
ROUND((revenue - LAG(revenue, 12) OVER (ORDER BY year, month))
/ LAG(revenue, 12) OVER (ORDER BY year, month) * 100, 2) AS year_growth
FROM monthly_revenue;

性能考虑

窗口函数虽然强大,但也有性能上的考量:

1. 执行时机

窗口函数在 WHEREGROUP BYHAVING 之后执行,不能在这些子句中直接使用窗口函数的结果。

-- 错误:不能在 WHERE 中使用窗口函数
SELECT name, salary, RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
WHERE rnk <= 3; -- 报错!

-- 正确:使用子查询或 CTE
WITH ranked AS (
SELECT name, salary, RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT * FROM ranked WHERE rnk <= 3;

2. 索引使用

  • ORDER BY 子句中的列如果有索引,可以提高性能
  • 多个窗口函数使用相同的 PARTITION BYORDER BY,可以共享排序结果

3. 窗口帧范围

  • 大的窗口帧范围(如 UNBOUNDED PRECEDING)需要更多内存
  • 尽量使用具体的行数限制(如 ROWS 100 PRECEDING)而非无界范围

4. 与子查询的比较

在许多情况下,窗口函数比相关子查询性能更好:

-- 子查询方式:对每行执行一次子查询
SELECT name, salary,
(SELECT AVG(salary) FROM employees e2 WHERE e2.dept_id = e1.dept_id) AS avg_salary
FROM employees e1;

-- 窗口函数方式:只排序一次
SELECT name, salary,
AVG(salary) OVER (PARTITION BY dept_id) AS avg_salary
FROM employees;

知识点总结

语法速查

-- 基本语法
函数名() OVER (
PARTITION BY 分组列
ORDER BY 排序列
ROWS/RANGE BETWEEN ... AND ...
)

-- 排名
ROW_NUMBER() OVER (ORDER BY col)
RANK() OVER (ORDER BY col)
DENSE_RANK() OVER (ORDER BY col)

-- 偏移
LAG(col, n, default) OVER (ORDER BY col)
LEAD(col, n, default) OVER (ORDER BY col)

-- 聚合窗口
SUM(col) OVER (ORDER BY col)
AVG(col) OVER (PARTITION BY grp ORDER BY col ROWS BETWEEN ... AND ...)

函数选择指南

需求推荐函数
唯一编号ROW_NUMBER()
竞赛排名RANK()
连续排名DENSE_RANK()
前一行值LAG(col, 1)
下一行值LEAD(col, 1)
累计求和SUM(col) OVER (ORDER BY ...)
移动平均AVG(col) OVER (ORDER BY ... ROWS ...)
分组排名DENSE_RANK() OVER (PARTITION BY ... ORDER BY ...)

注意事项

  1. 窗口函数在 WHERE/GROUP BY/HAVING 之后执行
  2. RANK 会跳号,DENSE_RANK 不跳号
  3. LAST_VALUE 默认窗口范围可能不是整个分区
  4. 使用子查询或 CTE 过滤窗口函数结果
  5. 相同的 PARTITION BYORDER BY 可以共享排序