PostgreSQL 性能优化
性能优化是数据库管理的重要技能。本章将介绍 PostgreSQL 性能优化的各个方面,包括查询优化、索引优化、配置调优等。
性能优化概述
优化层次
优化原则
- 先诊断后优化:使用工具分析问题,而非盲目优化
- 优化高频操作:优先优化执行频繁的慢查询
- 监控持续改进:建立监控体系,持续跟踪性能
- 平衡成本收益:考虑优化成本与收益的平衡
查询分析工具
EXPLAIN 基础
EXPLAIN 是分析查询执行计划的核心工具:
-- 查看执行计划
EXPLAIN SELECT * FROM orders WHERE customer_id = 100;
-- 查看详细执行计划(实际执行)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;
-- 查看更详细的信息
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 100;
-- JSON 格式输出
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE customer_id = 100;
读懂执行计划
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;
输出示例:
Index Scan using idx_orders_customer on orders (cost=0.43..8.45 rows=1 width=100) (actual time=0.025..0.026 rows=1 loops=1)
Index Cond: (customer_id = 100)
Buffers: shared hit=4
Planning Time: 0.100 ms
Execution Time: 0.050 ms
关键指标解释:
| 指标 | 说明 |
|---|---|
cost=0.43..8.45 | 启动成本..总成本(相对值) |
rows=1 | 预估返回行数 |
width=100 | 每行平均字节数 |
actual time | 实际执行时间(毫秒) |
rows=1 | 实际返回行数 |
loops=1 | 执行次数 |
Buffers: shared hit=4 | 缓存命中的数据块数 |
常见扫描类型
-- 1. Seq Scan(顺序扫描)- 扫描全表
EXPLAIN ANALYZE SELECT * FROM orders;
-- 注意:大表上出现 Seq Scan 可能需要优化
-- 2. Index Scan(索引扫描)- 使用索引读取数据
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;
-- 3. Index Only Scan(仅索引扫描)- 只读取索引
EXPLAIN ANALYZE SELECT customer_id FROM orders WHERE customer_id = 100;
-- 4. Bitmap Index Scan(位图索引扫描)- 多条件查询
EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 100 AND status = 'pending';
-- 5. Bitmap Heap Scan(位图堆扫描)- 与 Bitmap Index Scan 配合
常见连接类型
-- Nested Loop(嵌套循环)- 适合小表驱动大表
EXPLAIN ANALYZE SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.id = 100;
-- Hash Join(哈希连接)- 适合大表连接
EXPLAIN ANALYZE SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id;
-- Merge Join(归并连接)- 适合已排序的数据
EXPLAIN ANALYZE SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
ORDER BY o.customer_id;
查询优化
避免全表扫描
-- 问题:全表扫描
SELECT * FROM orders WHERE customer_id = 100;
-- 解决:添加索引
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- 验证:使用 EXPLAIN 检查
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;
优化 WHERE 条件
-- 问题:函数导致索引失效
SELECT * FROM orders WHERE UPPER(status) = 'PENDING';
-- 解决:使用表达式索引
CREATE INDEX idx_orders_status_upper ON orders(UPPER(status));
-- 问题:类型转换导致索引失效
SELECT * FROM orders WHERE customer_id = '100'; -- 字符串比较
-- 解决:使用正确类型
SELECT * FROM orders WHERE customer_id = 100; -- 整数比较
-- 问题:LIKE 前置通配符
SELECT * FROM orders WHERE product_name LIKE '%Product%';
-- 解决:使用全文搜索或 trigram 索引
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_orders_product_trgm ON orders USING GIN (product_name gin_trgm_ops);
SELECT * FROM orders WHERE product_name LIKE '%Product%';
优化 ORDER BY
-- 问题:排序操作消耗大
SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;
-- 解决:添加索引
CREATE INDEX idx_orders_created_desc ON orders(created_at DESC);
-- 多列排序
SELECT * FROM orders
WHERE customer_id = 100
ORDER BY created_at DESC;
-- 解决:复合索引
CREATE INDEX idx_orders_customer_created ON orders(customer_id, created_at DESC);
优化 LIMIT 和 OFFSET
-- 问题:大 OFFSET 性能差
SELECT * FROM orders ORDER BY id LIMIT 10 OFFSET 100000;
-- 解决:使用键集分页(seek pagination)
SELECT * FROM orders
WHERE id > 100000 -- 上一页最后一条记录的 ID
ORDER BY id
LIMIT 10;
-- 或者使用子查询
SELECT * FROM orders o
JOIN (SELECT id FROM orders ORDER BY id LIMIT 10 OFFSET 100000) t
ON o.id = t.id;
优化子查询
-- 问题:相关子查询效率低
SELECT * FROM orders o1
WHERE total_amount > (
SELECT AVG(total_amount) FROM orders o2
WHERE o2.customer_id = o1.customer_id
);
-- 解决:使用窗口函数
SELECT * FROM (
SELECT *,
AVG(total_amount) OVER (PARTITION BY customer_id) AS avg_amount
FROM orders
) t
WHERE total_amount > avg_amount;
-- 问题:IN 子查询效率低
SELECT * FROM orders
WHERE customer_id IN (SELECT id FROM customers WHERE status = 'active');
-- 解决:使用 JOIN
SELECT o.* FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.status = 'active';
-- 或使用 EXISTS
SELECT * FROM orders o
WHERE EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id AND c.status = 'active');
优化 UNION
-- 问题:UNION 会去重,消耗资源
SELECT name FROM customers
UNION
SELECT name FROM suppliers;
-- 解决:如果确定无重复,使用 UNION ALL
SELECT name FROM customers
UNION ALL
SELECT name FROM suppliers;
索引优化
索引选择性分析
-- 查看列的选择性(唯一值比例)
SELECT
COUNT(DISTINCT customer_id) AS distinct_values,
COUNT(*) AS total_rows,
ROUND(COUNT(DISTINCT customer_id)::NUMERIC / COUNT(*) * 100, 2) AS selectivity_pct
FROM orders;
-- 选择性越高,索引效果越好
-- 选择性 < 10%:索引效果一般
-- 选择性 > 50%:索引效果很好
复合索引设计
-- 复合索引列顺序原则:
-- 1. 等值查询的列放前面
-- 2. 范围查询的列放后面
-- 3. 排序的列可以考虑加入
-- 查询模式:
SELECT * FROM orders
WHERE customer_id = 100
AND status = 'pending'
ORDER BY created_at DESC;
-- 推荐索引:
CREATE INDEX idx_orders_customer_status_created
ON orders(customer_id, status, created_at DESC);
部分索引
-- 只索引满足条件的行
CREATE INDEX idx_orders_active ON orders(customer_id)
WHERE status = 'active';
-- 查询时自动使用部分索引
SELECT * FROM orders
WHERE status = 'active' AND customer_id = 100;
覆盖索引
-- 包含查询所需的所有列,避免回表
CREATE INDEX idx_orders_covering ON orders(customer_id, status)
INCLUDE (total_amount, created_at);
-- 查询可以仅从索引获取数据
SELECT customer_id, status, total_amount, created_at
FROM orders
WHERE customer_id = 100 AND status = 'pending';
索引使用分析
-- 查看索引使用情况
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
-- 找出未使用的索引
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS size
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT i.indisunique
AND idx_scan = 0
AND pg_relation_size(i.indexrelid) > 1024 * 1024
ORDER BY pg_relation_size(i.indexrelid) DESC;
表结构优化
选择合适数据类型
-- 1. 整数类型选择
-- SMALLINT:-32768 ~ 32767
-- INTEGER:-21亿 ~ 21亿(最常用)
-- BIGINT:非常大的数值
-- 2. 字符串类型选择
-- CHAR(n):固定长度,适合定长编码等
-- VARCHAR(n):可变长度,有限制
-- TEXT:可变长度,无限制
-- 3. 数值精度
-- NUMERIC:精确计算(货币)
-- REAL/DOUBLE PRECISION:近似计算
-- 示例:优化前
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name CHAR(200), -- 浪费空间
description TEXT,
price NUMERIC(20, 10), -- 过高精度
stock INTEGER,
status VARCHAR(50) -- 应该用 ENUM
);
-- 优化后
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200), -- 可变长度
description TEXT,
price NUMERIC(10, 2), -- 合适精度
stock INTEGER,
status product_status -- 使用枚举
);
CREATE TYPE product_status AS ENUM ('draft', 'active', 'inactive');
规范化与反规范化
-- 规范化:减少数据冗余
-- 优点:数据一致性好,更新简单
-- 缺点:查询需要多表连接
-- 反规范化:适当冗余提高查询性能
-- 示例:在订单表中冗余客户名称
-- 规范化设计
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
total_amount NUMERIC(10, 2)
);
-- 反规范化设计(适合读多写少场景)
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
customer_name VARCHAR(100), -- 冗余字段
total_amount NUMERIC(10, 2)
);
表分区
对于大表,分区可以提高查询和维护效率:
-- 创建分区表(按范围)
CREATE TABLE orders (
id SERIAL,
customer_id INTEGER,
order_date DATE,
total_amount NUMERIC(10, 2)
) PARTITION BY RANGE (order_date);
-- 创建分区
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE orders_2024_q2 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
-- 创建默认分区
CREATE TABLE orders_default PARTITION OF orders DEFAULT;
-- 查询时自动分区裁剪
SELECT * FROM orders WHERE order_date = '2024-02-15';
-- 只扫描 orders_2024_q1 分区
配置优化
内存配置
# postgresql.conf 关键内存参数
# 共享内存缓冲区(建议:系统内存的 25%)
shared_buffers = 4GB
# 工作内存(每个排序/哈希操作)
work_mem = 64MB
# 维护工作内存(VACUUM、CREATE INDEX)
maintenance_work_mem = 512MB
# 有效缓存大小( planner 估计)
effective_cache_size = 12GB
# WAL 缓冲区
wal_buffers = 64MB
连接配置
# 最大连接数
max_connections = 200
# 连接池推荐使用 PgBouncer
# 每个连接的内存
# 每个连接大约消耗:work_mem × max_connections
查询规划器配置
# 随机页面成本(SSD 可降低)
random_page_cost = 1.1 # 默认 4.0
# 有效 I/O 并发
effective_io_concurrency = 200
# 并行查询配置
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
日志配置
# 慢查询日志
log_min_duration_statement = 1000 # 毫秒
# 记录执行计划
log_auto_explain_min_duration = 5000
# 记录锁等待
log_lock_waits = on
deadlock_timeout = 1s
统计信息
更新统计信息
-- 更新单个表的统计信息
ANALYZE orders;
-- 更新所有表
ANALYZE;
-- 详细分析(更精确但更慢)
VACUUM ANALYZE orders;
-- 设置统计信息详细程度
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 500;
ANALYZE orders;
查看统计信息
-- 查看表统计信息
SELECT
tablename,
n_live_tup,
n_dead_tup,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze
FROM pg_stat_user_tables;
-- 查看列统计信息
SELECT
attname,
n_distinct,
most_common_vals,
most_common_freqs
FROM pg_stats
WHERE tablename = 'orders';
VACUUM 和清理
为什么需要 VACUUM?
PostgreSQL 使用 MVCC,DELETE 和 UPDATE 不会立即删除旧数据,需要 VACUUM 清理:
-- 手动执行 VACUUM
VACUUM orders;
-- VACUUM ANALYZE(清理并更新统计)
VACUUM ANALYZE orders;
-- VACUUM FULL(重建表,会锁表)
VACUUM FULL orders;
-- 查看表的膨胀情况
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
n_dead_tup,
n_live_tup,
ROUND(n_dead_tup::NUMERIC / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 0
ORDER BY n_dead_tup DESC;
自动清理配置
# postgresql.conf
# 启用自动清理
autovacuum = on
# 触发阈值
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.1
# 清理速度限制
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_cost_limit = 200
# 分析阈值
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.05
性能监控
慢查询监控
-- 启用 pg_stat_statements 扩展
CREATE EXTENSION pg_stat_statements;
-- 查看最慢的查询
SELECT
calls,
ROUND(total_exec_time::NUMERIC, 2) AS total_time_ms,
ROUND(mean_exec_time::NUMERIC, 2) AS avg_time_ms,
ROUND((100 * total_exec_time / SUM(total_exec_time) OVER ())::NUMERIC, 2) AS percent,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- 查看执行次数最多的查询
SELECT
calls,
ROUND(mean_exec_time::NUMERIC, 2) AS avg_time_ms,
query
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;
锁监控
-- 查看当前锁
SELECT
pid,
locktype,
mode,
granted,
query_start,
query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;
-- 查看阻塞的查询
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocked_locks.locktype = blocking_locks.locktype
AND blocked_locks.database IS NOT DISTINCT FROM blocking_locks.database
AND blocked_locks.relation IS NOT DISTINCT FROM blocking_locks.relation
AND blocked_locks.page IS NOT DISTINCT FROM blocking_locks.page
AND blocked_locks.tuple IS NOT DISTINCT FROM blocking_locks.tuple
AND blocked_locks.pid != blocking_locks.pid
JOIN pg_stat_activity blocking ON blocking_locks.pid = blocking.pid
WHERE NOT blocked_locks.granted;
-- 终止阻塞的查询
SELECT pg_terminate_backend(pid);
系统监控视图
-- 查看数据库活动
SELECT * FROM pg_stat_activity;
-- 查看表访问统计
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;
-- 查看索引使用统计
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- 查看数据库大小
SELECT
datname,
pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;
-- 查看表大小
SELECT
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS indexes_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;
小结
本章我们学习了 PostgreSQL 性能优化:
- 查询分析:EXPLAIN、执行计划解读
- 查询优化:避免全表扫描、优化条件、优化排序
- 索引优化:选择性分析、复合索引、部分索引
- 表结构优化:数据类型选择、规范化、分区
- 配置优化:内存、连接、规划器配置
- 统计信息:ANALYZE、统计信息查看
- VACUUM:清理死元组、自动清理配置
- 性能监控:慢查询、锁、系统视图
练习
- 使用 EXPLAIN 分析一个复杂查询的执行计划
- 找出并优化一个慢查询
- 分析索引使用情况,删除未使用的索引
- 配置慢查询日志,监控数据库性能