SQL 窗口函数
窗口函数(Window Functions)是 SQL 中强大的高级特性,允许对一组相关行(窗口)进行计算,同时保留每行的详细信息。与聚合函数不同,窗口函数不会将多行合并为一行,而是为每行返回一个值。
什么是窗口函数?
窗口函数对一组行(称为"窗口")执行计算,并为每一行返回一个值。窗口函数可以用于:
- 排名计算(ROW_NUMBER、RANK、DENSE_RANK)
- 前后行访问(LAG、LEAD)
- 累计计算(累计求和、移动平均)
- 分组内的聚合计算
窗口函数 vs 聚合函数
-- 聚合函数:将多行合并为一行
SELECT category, SUM(amount) AS total
FROM orders
GROUP BY category;
-- 窗口函数:保留所有行,添加计算结果
SELECT
id,
category,
amount,
SUM(amount) OVER (PARTITION BY category) AS category_total
FROM orders;
基本语法
函数名 OVER (
PARTITION BY 分组列
ORDER BY 排序列
窗口框架子句
)
各部分说明
| 子句 | 说明 | 是否必需 |
|---|---|---|
PARTITION BY | 将数据分组,类似 GROUP BY | 可选 |
ORDER BY | 窗口内的排序方式 | 可选 |
窗口框架 | 定义窗口的范围 | 可选 |
排名函数
ROW_NUMBER()
为每一行分配唯一的序号,从 1 开始,相同值也会有不同序号。
SELECT
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS rank
FROM students;
结果:
| name | score | rank |
|-------|-------|------|
| 张三 | 95 | 1 |
| 李四 | 90 | 2 |
| 王五 | 90 | 3 |
| 赵六 | 85 | 4 |
RANK()
计算排名,相同值获得相同排名,但会跳过后续排名。
SELECT
name,
score,
RANK() OVER (ORDER BY score DESC) AS rank
FROM students;
结果:
| name | score | rank |
|-------|-------|------|
| 张三 | 95 | 1 |
| 李四 | 90 | 2 |
| 王五 | 90 | 2 |
| 赵六 | 85 | 4 | -- 跳过了第3名
DENSE_RANK()
计算排名,相同值获得相同排名,但不会跳过后续排名。
SELECT
name,
score,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM students;
结果:
| name | score | dense_rank |
|-------|-------|------------|
| 张三 | 95 | 1 |
| 李四 | 90 | 2 |
| 王五 | 90 | 2 |
| 赵六 | 85 | 3 | -- 不跳过,继续第3名
三种排名函数对比
SELECT
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num,
RANK() OVER (ORDER BY score DESC) AS rank,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
FROM students;
结果:
| name | score | row_num | rank | dense_rank |
|-------|-------|---------|------|------------|
| 张三 | 95 | 1 | 1 | 1 |
| 李四 | 90 | 2 | 2 | 2 |
| 王五 | 90 | 3 | 2 | 2 |
| 赵六 | 85 | 4 | 4 | 3 |
| 孙七 | 80 | 5 | 5 | 4 |
分组排名(PARTITION BY)
使用 PARTITION BY 在分组内进行排名:
-- 每个班级内的成绩排名
SELECT
class,
name,
score,
RANK() OVER (PARTITION BY class ORDER BY score DESC) AS class_rank
FROM students;
结果:
| class | name | score | class_rank |
|-------|-------|-------|------------|
| A | 张三 | 95 | 1 |
| A | 李四 | 90 | 2 |
| A | 王五 | 85 | 3 |
| B | 赵六 | 92 | 1 |
| B | 孙七 | 88 | 2 |
| B | 周八 | 85 | 3 |
LAG 和 LEAD 函数
LAG 和 LEAD 用于访问前一行或后一行的数据,常用于计算环比增长、时间序列分析。
LAG() - 获取前一行
LAG(列名, 偏移量, 默认值)
-- 计算日销售额和前一天的对比
SELECT
date,
amount,
LAG(amount) OVER (ORDER BY date) AS prev_amount,
amount - LAG(amount) OVER (ORDER BY date) AS diff
FROM daily_sales;
结果:
| date | amount | prev_amount | diff |
|------------|--------|-------------|------|
| 2024-01-01 | 1000 | NULL | NULL |
| 2024-01-02 | 1200 | 1000 | 200 |
| 2024-01-03 | 1150 | 1200 | -50 |
| 2024-01-04 | 1300 | 1150 | 150 |
LEAD() - 获取后一行
LEAD(列名, 偏移量, 默认值)
-- 查看下一天的销售额
SELECT
date,
amount,
LEAD(amount) OVER (ORDER BY date) AS next_amount
FROM daily_sales;
计算增长率
SELECT
date,
amount,
LAG(amount, 1, 0) OVER (ORDER BY date) AS prev_amount,
ROUND((amount - LAG(amount, 1, 0) OVER (ORDER BY date)) /
LAG(amount, 1, 0) OVER (ORDER BY date) * 100, 2) AS growth_rate
FROM daily_sales
WHERE LAG(amount, 1, 0) OVER (ORDER BY date) > 0;
聚合窗口函数
聚合函数(SUM、AVG、COUNT、MAX、MIN)也可以作为窗口函数使用。
累计求和
-- 计算累计销售额
SELECT
date,
amount,
SUM(amount) OVER (ORDER BY date) AS running_total
FROM daily_sales;
结果:
| date | amount | running_total |
|------------|--------|---------------|
| 2024-01-01 | 1000 | 1000 |
| 2024-01-02 | 1200 | 2200 |
| 2024-01-03 | 1150 | 3350 |
| 2024-01-04 | 1300 | 4650 |
分组累计求和
-- 每个部门的累计工资
SELECT
department,
name,
salary,
SUM(salary) OVER (PARTITION BY department ORDER BY name) AS dept_running_total
FROM employees;
移动平均
-- 计算3天移动平均值
SELECT
date,
amount,
AVG(amount) OVER (
ORDER BY date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3
FROM daily_sales;
结果:
| date | amount | moving_avg_3 |
|------------|--------|--------------|
| 2024-01-01 | 1000 | 1000.00 |
| 2024-01-02 | 1200 | 1100.00 |
| 2024-01-03 | 1150 | 1116.67 |
| 2024-01-04 | 1300 | 1216.67 |
| 2024-01-05 | 1250 | 1233.33 |
窗口框架(Frame Clause)
窗口框架定义了窗口函数操作的行范围。
语法
ROWS BETWEEN 起点 AND 终点
-- 或
RANGE BETWEEN 起点 AND 终点
框架边界
| 边界 | 说明 |
|---|---|
UNBOUNDED PRECEDING | 窗口起点(分区的第一行) |
n PRECEDING | 当前行之前的第 n 行 |
CURRENT ROW | 当前行 |
n FOLLOWING | 当前行之后的第 n 行 |
UNBOUNDED FOLLOWING | 窗口终点(分区的最后一行) |
常用框架示例
-- 从分区起点到当前行(默认行为)
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
-- 当前行前后各2行
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
-- 当前行到分区终点
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
-- 前3行到当前行
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW
ROWS vs RANGE
- ROWS:基于物理行数
- RANGE:基于逻辑值范围
-- ROWS: 前2行到当前行(物理行)
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
-- RANGE: 当前值减100到当前值加100的行(逻辑范围)
RANGE BETWEEN 100 PRECEDING AND 100 FOLLOWING
NTILE 函数
将行分成指定数量的桶,并为每行分配桶号。
-- 将学生按成绩分成4个等级
SELECT
name,
score,
NTILE(4) OVER (ORDER BY score DESC) AS quartile
FROM students;
结果:
| name | score | quartile |
|-------|-------|----------|
| 张三 | 95 | 1 | -- 前25%
| 李四 | 92 | 1 |
| 王五 | 90 | 2 | -- 25%-50%
| 赵六 | 88 | 2 |
| 孙七 | 85 | 3 | -- 50%-75%
| 周八 | 82 | 3 |
| 吴九 | 78 | 4 | -- 后25%
| 郑十 | 75 | 4 |
FIRST_VALUE 和 LAST_VALUE
获取窗口内第一个或最后一个值。
SELECT
date,
amount,
FIRST_VALUE(amount) OVER (ORDER BY date) AS first_amount,
LAST_VALUE(amount) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_amount
FROM daily_sales;
注意
LAST_VALUE 默认的窗口框架是 RANGE UNBOUNDED PRECEDING AND CURRENT ROW,这会导致结果不正确。需要显式指定 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 才能获取真正的最后一个值。
NTH_VALUE
获取窗口内第 n 个值。
SELECT
date,
amount,
NTH_VALUE(amount, 2) OVER (
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS second_amount
FROM daily_sales;
实战案例
案例1:查找每个部门的最高薪资员工
WITH ranked_employees AS (
SELECT
department,
name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees
)
SELECT department, name, salary
FROM ranked_employees
WHERE salary_rank = 1;
案例2:计算同比和环比增长
SELECT
year,
month,
revenue,
-- 同比(与去年同期比较)
LAG(revenue, 12) OVER (ORDER BY year, month) AS last_year_revenue,
-- 环比(与上月比较)
LAG(revenue, 1) OVER (ORDER BY year, month) AS last_month_revenue,
-- 环比增长率
ROUND((revenue - LAG(revenue, 1) OVER (ORDER BY year, month)) /
LAG(revenue, 1) OVER (ORDER BY year, month) * 100, 2) AS mom_growth
FROM monthly_revenue;
案例3:计算每个用户的累计消费和排名
SELECT
user_id,
order_date,
amount,
-- 累计消费
SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) AS cumulative_amount,
-- 消费排名(按金额降序)
RANK() OVER (ORDER BY amount DESC) AS amount_rank,
-- 消费百分比
ROUND(amount / SUM(amount) OVER () * 100, 2) AS percent_of_total
FROM orders;
案例4:计算移动平均和标准差
SELECT
date,
price,
-- 7天移动平均
AVG(price) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS ma_7,
-- 30天移动平均
AVG(price) OVER (
ORDER BY date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS ma_30,
-- 7天移动标准差
STDDEV(price) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS std_7
FROM stock_prices;
案例5:去除重复数据
-- 删除重复记录,只保留每个用户的最新记录
WITH ranked AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY updated_at DESC) AS rn
FROM user_data
)
DELETE FROM user_data
WHERE id IN (
SELECT id FROM ranked WHERE rn > 1
);
窗口函数执行顺序
理解 SQL 查询的执行顺序对正确使用窗口函数很重要:
- FROM / JOIN
- WHERE
- GROUP BY
- HAVING
- 窗口函数
- SELECT
- DISTINCT
- ORDER BY
- LIMIT
注意
窗口函数在 WHERE、GROUP BY、HAVING 之后执行,因此不能在这些子句中直接引用窗口函数的结果。如果需要这样做,应使用 CTE 或子查询。
性能优化建议
- 合理使用 PARTITION BY:分区越少,性能越好
- 避免过度使用窗口函数:能用简单聚合函数解决的不要用窗口函数
- 注意索引:ORDER BY 和 PARTITION BY 的列应该有索引
- 限制窗口大小:使用框架子句限制处理的行数
小结
本章我们学习了:
- 窗口函数概念:对一组行进行计算,保留每行详情
- 排名函数:ROW_NUMBER、RANK、DENSE_RANK、NTILE
- 导航函数:LAG、LEAD、FIRST_VALUE、LAST_VALUE、NTH_VALUE
- 聚合窗口函数:SUM、AVG、COUNT、MAX、MIN
- 窗口框架:ROWS 和 RANGE 子句定义窗口范围
- PARTITION BY:分组计算
- 实战案例:累计计算、增长率、移动平均等
练习
- 查询每个部门的员工,按薪资从高到低排名
- 计算每天的销售额和前一天的对比(增长/下降)
- 计算每个用户的累计消费金额
- 查询每个类别中价格最高的前3个商品
- 计算7天移动平均温度