跳到主要内容

SQL 子查询

子查询(Subquery)是嵌套在另一个查询中的查询。子查询可以在 SELECT、FROM、WHERE 和 HAVING 子句中使用,是构建复杂查询的重要工具。

什么是子查询?

子查询是嵌套在其他 SQL 语句中的 SELECT 语句,用括号括起来:

SELECT * 
FROM users
WHERE id IN (SELECT user_id FROM orders WHERE amount > 1000);

子查询的特点:

  • 必须用括号括起来
  • 可以返回单行、多行或整个表
  • 可以嵌套多层
  • 外层查询可以使用内层查询的结果

子查询的分类

按返回结果分类

类型返回结果适用运算符
标量子查询单个值=, >, <, >=, <=, <>
列子查询单列多行IN, NOT IN, ANY, ALL
行子查询单行多列=, <>
表子查询多行多列EXISTS, FROM 子句

按位置分类

位置说明
WHERE 子句作为过滤条件
HAVING 子句作为分组过滤条件
FROM 子句作为临时表(派生表)
SELECT 子句作为计算字段

WHERE 子句中的子查询

标量子查询

返回单个值,可以与比较运算符一起使用:

-- 查询薪资高于平均薪资的员工
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- 查询与"张三"同部门的员工
SELECT name, department
FROM employees
WHERE department = (SELECT department FROM employees WHERE name = '张三');

使用 IN 的子查询

返回多行单列,用于检查是否在列表中:

-- 查询有订单的用户
SELECT *
FROM users
WHERE id IN (SELECT DISTINCT user_id FROM orders);

-- 查询没有订单的用户
SELECT *
FROM users
WHERE id NOT IN (SELECT DISTINCT user_id FROM orders);
注意

使用 NOT IN 时要特别注意 NULL 值。如果子查询返回 NULL,NOT IN 的结果会是 UNKNOWN,可能导致意外结果:

-- 危险:如果 subquery 返回包含 NULL 的结果
-- 整个 NOT IN 条件结果为 UNKNOWN
WHERE column NOT IN (SELECT ... WHERE column IS NOT NULL)

使用 ANY / ALL 的子查询

-- ANY:满足任意一个条件即可
-- 查询薪资高于任意一个部门最低薪资的员工
SELECT name, salary
FROM employees
WHERE salary > ANY (SELECT MIN(salary) FROM employees GROUP BY department);

-- 等价于
WHERE salary > (SELECT MIN(salary) FROM employees WHERE department = 'IT')
OR salary > (SELECT MIN(salary) FROM employees WHERE department = 'HR')
...;

-- ALL:必须满足所有条件
-- 查询薪资高于所有部门平均薪资的员工
SELECT name, salary
FROM employees
WHERE salary > ALL (SELECT AVG(salary) FROM employees GROUP BY department);

-- 等价于
WHERE salary > (SELECT AVG(salary) FROM employees WHERE department = 'IT')
AND salary > (SELECT AVG(salary) FROM employees WHERE department = 'HR')
...;

EXISTS 子查询

检查子查询是否返回任何行,返回 TRUE 或 FALSE:

-- 查询有订单的用户(使用 EXISTS)
SELECT u.name
FROM users u
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

-- 查询没有订单的用户(使用 NOT EXISTS)
SELECT u.name
FROM users u
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

IN vs EXISTS

场景推荐使用
子查询表小,主查询表大IN
子查询表大,主查询表小EXISTS
需要检查 NULLEXISTS
子查询需要关联主查询EXISTS
-- 通常 EXISTS 对于大子查询更高效
-- 因为 EXISTS 找到第一条匹配就停止

-- 使用 IN
SELECT * FROM large_table
WHERE id IN (SELECT id FROM small_table);

-- 使用 EXISTS(小主表 + 大子表时更高效)
SELECT * FROM small_table s
WHERE EXISTS (SELECT 1 FROM large_table l WHERE l.id = s.id);

FROM 子句中的子查询

FROM 子句中的子查询称为派生表(Derived Table):

-- 查询每个部门的员工数和平均薪资
SELECT
d.department_name,
e_stats.employee_count,
e_stats.avg_salary
FROM departments d
JOIN (
SELECT
department_id,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) e_stats ON d.id = e_stats.department_id;

派生表必须使用别名

-- 错误:派生表没有别名
SELECT * FROM (SELECT * FROM users);

-- 正确:派生表必须有别名
SELECT * FROM (SELECT * FROM users) AS u;

派生表与 CTE 的选择

-- 派生表:简单场景
SELECT u.name, o.order_count
FROM users u
JOIN (
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
) o ON u.id = o.user_id;

-- CTE:复杂场景,需要多次引用
WITH order_stats AS (
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
)
SELECT
u.name,
o.order_count,
CASE WHEN o.order_count > avg_stats.avg_count THEN 'Above Average'
ELSE 'Below Average' END AS comparison
FROM users u
JOIN order_stats o ON u.id = o.user_id
CROSS JOIN (SELECT AVG(order_count) AS avg_count FROM order_stats) avg_stats;

SELECT 子句中的子查询

在 SELECT 子句中使用子查询作为计算字段:

SELECT 
name,
salary,
(SELECT AVG(salary) FROM employees) AS avg_salary,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;

相关子查询

相关子查询引用外部查询的列,对外部查询的每一行都会执行一次:

-- 查询每个员工及其部门平均薪资
SELECT
name,
department,
salary,
(SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e1.department) AS dept_avg
FROM employees e1;
性能注意

相关子查询会对外部查询的每一行执行一次,可能影响性能。对于大数据量,考虑使用 JOIN 或窗口函数替代。

相关子查询 vs 窗口函数

-- 相关子查询(性能较差)
SELECT
name,
department,
salary,
(SELECT AVG(salary) FROM employees e2 WHERE e2.department = e1.department) AS dept_avg
FROM employees e1;

-- 窗口函数(性能更好)
SELECT
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

HAVING 子句中的子查询

-- 查询订单数超过平均订单数的用户
SELECT
user_id,
COUNT(*) AS order_count
FROM orders
GROUP BY user_id
HAVING COUNT(*) > (SELECT AVG(cnt) FROM (SELECT COUNT(*) AS cnt FROM orders GROUP BY user_id) t);

-- 查询总消费超过部门平均消费的用户
SELECT
user_id,
SUM(amount) AS total_amount
FROM orders
GROUP BY user_id
HAVING SUM(amount) > (SELECT AVG(total) FROM (SELECT SUM(amount) AS total FROM orders GROUP BY user_id) t);

行子查询

返回单行多列,可以与行构造器比较:

-- 查询与张三同部门同薪资的员工
SELECT name, department, salary
FROM employees
WHERE (department, salary) = (
SELECT department, salary
FROM employees
WHERE name = '张三'
);

表子查询

返回多行多列,通常用于 FROM 子句或 EXISTS:

-- 在 FROM 子句中使用表子查询
SELECT u.name, recent_orders.order_id
FROM users u
JOIN (
SELECT user_id, order_id
FROM orders
WHERE order_date > DATE_SUB(NOW(), INTERVAL 30 DAY)
) recent_orders ON u.id = recent_orders.user_id;

-- 在 EXISTS 中使用表子查询
SELECT * FROM products p
WHERE EXISTS (
SELECT 1 FROM order_items oi
JOIN orders o ON oi.order_id = o.id
WHERE oi.product_id = p.id
AND o.status = 'completed'
);

嵌套子查询

子查询可以嵌套多层:

-- 三层嵌套:查询消费金额最高的用户所在的城市
SELECT city
FROM users
WHERE id = (
SELECT user_id
FROM orders
GROUP BY user_id
ORDER BY SUM(amount) DESC
LIMIT 1
);
建议

虽然子查询可以嵌套多层,但超过 2-3 层会影响可读性。建议使用 CTE 来分解复杂查询。

子查询与 JOIN 的选择

很多时候子查询可以用 JOIN 替代,选择哪种方式取决于具体情况:

子查询更合适的场景

-- 需要聚合结果作为条件
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- 检查存在性
SELECT * FROM orders
WHERE EXISTS (SELECT 1 FROM order_items WHERE order_id = orders.id);

-- 计算、统计作为列
SELECT name, (SELECT COUNT(*) FROM orders WHERE user_id = users.id) AS order_count
FROM users;

JOIN 更合适的场景

-- 需要来自多个表的列
SELECT u.name, o.order_id, o.amount
FROM users u
JOIN orders o ON u.id = o.user_id;

-- 多表关联过滤
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;

实战案例

案例1:查找各部门薪资最高的员工

-- 方法1:相关子查询
SELECT e1.name, e1.department, e1.salary
FROM employees e1
WHERE e1.salary = (
SELECT MAX(salary)
FROM employees e2
WHERE e2.department = e1.department
);

-- 方法2:窗口函数(推荐)
WITH ranked AS (
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT name, department, salary
FROM ranked
WHERE rnk = 1;

案例2:查找连续 N 天登录的用户

-- 查找连续 3 天登录的用户
SELECT DISTINCT user_id
FROM (
SELECT
user_id,
login_date,
DATE_SUB(login_date, INTERVAL rn DAY) AS group_date
FROM (
SELECT
user_id,
login_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY login_date) AS rn
FROM user_logins
) t
) t2
GROUP BY user_id, group_date
HAVING COUNT(*) >= 3;

案例3:获取每个分组的前 N 条记录

-- 获取每个部门薪资前 3 的员工
SELECT * FROM employees e1
WHERE (
SELECT COUNT(*)
FROM employees e2
WHERE e2.department = e1.department
AND e2.salary > e1.salary
) < 3
ORDER BY department, salary DESC;

-- 或使用窗口函数(更高效)
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT * FROM ranked WHERE rn <= 3;

案例4:同比环比计算

-- 月度销售额及同比环比
SELECT
m1.month,
m1.sales,
m2.sales AS last_month_sales,
m3.sales AS last_year_sales,
ROUND((m1.sales - m2.sales) / m2.sales * 100, 2) AS mom_growth,
ROUND((m1.sales - m3.sales) / m3.sales * 100, 2) AS yoy_growth
FROM monthly_sales m1
LEFT JOIN monthly_sales m2 ON DATE_ADD(m2.month, INTERVAL 1 MONTH) = m1.month
LEFT JOIN monthly_sales m3 ON DATE_ADD(m3.month, INTERVAL 1 YEAR) = m1.month;

性能优化建议

1. 使用 EXISTS 替代 IN

-- 对于大子查询,EXISTS 通常更高效
SELECT * FROM large_table l
WHERE EXISTS (SELECT 1 FROM small_table s WHERE s.id = l.id);

2. 避免在 WHERE 中使用相关子查询

-- 较慢(相关子查询)
SELECT * FROM orders
WHERE customer_id IN (SELECT id FROM customers WHERE region = 'Asia');

-- 较快(JOIN)
SELECT DISTINCT o.*
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.region = 'Asia';

3. 使用窗口函数替代相关子查询

-- 较慢
SELECT name, salary,
(SELECT AVG(salary) FROM employees WHERE department = e.department) AS avg_dept
FROM employees e;

-- 较快
SELECT name, salary,
AVG(salary) OVER (PARTITION BY department) AS avg_dept
FROM employees;

4. 使用 CTE 提高可读性和性能

-- 复杂子查询难以优化
SELECT ... FROM ... WHERE ... IN (SELECT ... FROM ... WHERE ... IN (...));

-- CTE 更清晰,某些数据库可以物化 CTE
WITH cte1 AS (SELECT ... FROM ...),
cte2 AS (SELECT ... FROM cte1 ...)
SELECT ... FROM cte2;

小结

本章我们学习了:

  1. 子查询概念:嵌套在其他查询中的查询
  2. 子查询分类:标量、列、行、表子查询
  3. 使用位置:WHERE、HAVING、FROM、SELECT
  4. IN vs EXISTS:不同场景的选择
  5. 相关子查询:引用外部查询的列
  6. 子查询 vs JOIN:如何选择
  7. 性能优化:使用窗口函数、CTE 替代复杂子查询

练习

  1. 查询薪资高于公司平均薪资的员工
  2. 查询没有任何订单的用户
  3. 查询每个部门的员工数和平均薪资(使用派生表)
  4. 查询每个类别中价格最高的商品
  5. 使用子查询实现分页查询