跳到主要内容

PostgreSQL 性能优化

性能优化是数据库管理的重要技能。本章将介绍 PostgreSQL 性能优化的各个方面,包括查询优化、索引优化、配置调优等。

性能优化概述

优化层次

优化原则

  1. 先诊断后优化:使用工具分析问题,而非盲目优化
  2. 优化高频操作:优先优化执行频繁的慢查询
  3. 监控持续改进:建立监控体系,持续跟踪性能
  4. 平衡成本收益:考虑优化成本与收益的平衡

查询分析工具

EXPLAIN 基础

EXPLAIN 是分析查询执行计划的核心工具:

-- 查看执行计划
EXPLAIN SELECT * FROM orders WHERE customer_id = 100;

-- 查看详细执行计划(实际执行)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;

-- 查看更详细的信息
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 100;

-- JSON 格式输出
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE customer_id = 100;

读懂执行计划

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;

输出示例:

Index Scan using idx_orders_customer on orders  (cost=0.43..8.45 rows=1 width=100) (actual time=0.025..0.026 rows=1 loops=1)
Index Cond: (customer_id = 100)
Buffers: shared hit=4
Planning Time: 0.100 ms
Execution Time: 0.050 ms

关键指标解释

指标说明
cost=0.43..8.45启动成本..总成本(相对值)
rows=1预估返回行数
width=100每行平均字节数
actual time实际执行时间(毫秒)
rows=1实际返回行数
loops=1执行次数
Buffers: shared hit=4缓存命中的数据块数

常见扫描类型

-- 1. Seq Scan(顺序扫描)- 扫描全表
EXPLAIN ANALYZE SELECT * FROM orders;
-- 注意:大表上出现 Seq Scan 可能需要优化

-- 2. Index Scan(索引扫描)- 使用索引读取数据
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;

-- 3. Index Only Scan(仅索引扫描)- 只读取索引
EXPLAIN ANALYZE SELECT customer_id FROM orders WHERE customer_id = 100;

-- 4. Bitmap Index Scan(位图索引扫描)- 多条件查询
EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 100 AND status = 'pending';

-- 5. Bitmap Heap Scan(位图堆扫描)- 与 Bitmap Index Scan 配合

常见连接类型

-- Nested Loop(嵌套循环)- 适合小表驱动大表
EXPLAIN ANALYZE SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.id = 100;

-- Hash Join(哈希连接)- 适合大表连接
EXPLAIN ANALYZE SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id;

-- Merge Join(归并连接)- 适合已排序的数据
EXPLAIN ANALYZE SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
ORDER BY o.customer_id;

查询优化

避免全表扫描

-- 问题:全表扫描
SELECT * FROM orders WHERE customer_id = 100;

-- 解决:添加索引
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- 验证:使用 EXPLAIN 检查
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 100;

优化 WHERE 条件

-- 问题:函数导致索引失效
SELECT * FROM orders WHERE UPPER(status) = 'PENDING';

-- 解决:使用表达式索引
CREATE INDEX idx_orders_status_upper ON orders(UPPER(status));

-- 问题:类型转换导致索引失效
SELECT * FROM orders WHERE customer_id = '100'; -- 字符串比较

-- 解决:使用正确类型
SELECT * FROM orders WHERE customer_id = 100; -- 整数比较

-- 问题:LIKE 前置通配符
SELECT * FROM orders WHERE product_name LIKE '%Product%';

-- 解决:使用全文搜索或 trigram 索引
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_orders_product_trgm ON orders USING GIN (product_name gin_trgm_ops);
SELECT * FROM orders WHERE product_name LIKE '%Product%';

优化 ORDER BY

-- 问题:排序操作消耗大
SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;

-- 解决:添加索引
CREATE INDEX idx_orders_created_desc ON orders(created_at DESC);

-- 多列排序
SELECT * FROM orders
WHERE customer_id = 100
ORDER BY created_at DESC;

-- 解决:复合索引
CREATE INDEX idx_orders_customer_created ON orders(customer_id, created_at DESC);

优化 LIMIT 和 OFFSET

-- 问题:大 OFFSET 性能差
SELECT * FROM orders ORDER BY id LIMIT 10 OFFSET 100000;

-- 解决:使用键集分页(seek pagination)
SELECT * FROM orders
WHERE id > 100000 -- 上一页最后一条记录的 ID
ORDER BY id
LIMIT 10;

-- 或者使用子查询
SELECT * FROM orders o
JOIN (SELECT id FROM orders ORDER BY id LIMIT 10 OFFSET 100000) t
ON o.id = t.id;

优化子查询

-- 问题:相关子查询效率低
SELECT * FROM orders o1
WHERE total_amount > (
SELECT AVG(total_amount) FROM orders o2
WHERE o2.customer_id = o1.customer_id
);

-- 解决:使用窗口函数
SELECT * FROM (
SELECT *,
AVG(total_amount) OVER (PARTITION BY customer_id) AS avg_amount
FROM orders
) t
WHERE total_amount > avg_amount;

-- 问题:IN 子查询效率低
SELECT * FROM orders
WHERE customer_id IN (SELECT id FROM customers WHERE status = 'active');

-- 解决:使用 JOIN
SELECT o.* FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.status = 'active';

-- 或使用 EXISTS
SELECT * FROM orders o
WHERE EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id AND c.status = 'active');

优化 UNION

-- 问题:UNION 会去重,消耗资源
SELECT name FROM customers
UNION
SELECT name FROM suppliers;

-- 解决:如果确定无重复,使用 UNION ALL
SELECT name FROM customers
UNION ALL
SELECT name FROM suppliers;

索引优化

索引选择性分析

-- 查看列的选择性(唯一值比例)
SELECT
COUNT(DISTINCT customer_id) AS distinct_values,
COUNT(*) AS total_rows,
ROUND(COUNT(DISTINCT customer_id)::NUMERIC / COUNT(*) * 100, 2) AS selectivity_pct
FROM orders;

-- 选择性越高,索引效果越好
-- 选择性 < 10%:索引效果一般
-- 选择性 > 50%:索引效果很好

复合索引设计

-- 复合索引列顺序原则:
-- 1. 等值查询的列放前面
-- 2. 范围查询的列放后面
-- 3. 排序的列可以考虑加入

-- 查询模式:
SELECT * FROM orders
WHERE customer_id = 100
AND status = 'pending'
ORDER BY created_at DESC;

-- 推荐索引:
CREATE INDEX idx_orders_customer_status_created
ON orders(customer_id, status, created_at DESC);

部分索引

-- 只索引满足条件的行
CREATE INDEX idx_orders_active ON orders(customer_id)
WHERE status = 'active';

-- 查询时自动使用部分索引
SELECT * FROM orders
WHERE status = 'active' AND customer_id = 100;

覆盖索引

-- 包含查询所需的所有列,避免回表
CREATE INDEX idx_orders_covering ON orders(customer_id, status)
INCLUDE (total_amount, created_at);

-- 查询可以仅从索引获取数据
SELECT customer_id, status, total_amount, created_at
FROM orders
WHERE customer_id = 100 AND status = 'pending';

索引使用分析

-- 查看索引使用情况
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

-- 找出未使用的索引
SELECT
schemaname || '.' || relname AS table,
indexrelname AS index,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS size
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT i.indisunique
AND idx_scan = 0
AND pg_relation_size(i.indexrelid) > 1024 * 1024
ORDER BY pg_relation_size(i.indexrelid) DESC;

表结构优化

选择合适数据类型

-- 1. 整数类型选择
-- SMALLINT:-32768 ~ 32767
-- INTEGER:-21亿 ~ 21亿(最常用)
-- BIGINT:非常大的数值

-- 2. 字符串类型选择
-- CHAR(n):固定长度,适合定长编码等
-- VARCHAR(n):可变长度,有限制
-- TEXT:可变长度,无限制

-- 3. 数值精度
-- NUMERIC:精确计算(货币)
-- REAL/DOUBLE PRECISION:近似计算

-- 示例:优化前
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name CHAR(200), -- 浪费空间
description TEXT,
price NUMERIC(20, 10), -- 过高精度
stock INTEGER,
status VARCHAR(50) -- 应该用 ENUM
);

-- 优化后
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200), -- 可变长度
description TEXT,
price NUMERIC(10, 2), -- 合适精度
stock INTEGER,
status product_status -- 使用枚举
);

CREATE TYPE product_status AS ENUM ('draft', 'active', 'inactive');

规范化与反规范化

-- 规范化:减少数据冗余
-- 优点:数据一致性好,更新简单
-- 缺点:查询需要多表连接

-- 反规范化:适当冗余提高查询性能
-- 示例:在订单表中冗余客户名称

-- 规范化设计
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
total_amount NUMERIC(10, 2)
);

-- 反规范化设计(适合读多写少场景)
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
customer_name VARCHAR(100), -- 冗余字段
total_amount NUMERIC(10, 2)
);

表分区

对于大表,分区可以提高查询和维护效率:

-- 创建分区表(按范围)
CREATE TABLE orders (
id SERIAL,
customer_id INTEGER,
order_date DATE,
total_amount NUMERIC(10, 2)
) PARTITION BY RANGE (order_date);

-- 创建分区
CREATE TABLE orders_2024_q1 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');

CREATE TABLE orders_2024_q2 PARTITION OF orders
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');

-- 创建默认分区
CREATE TABLE orders_default PARTITION OF orders DEFAULT;

-- 查询时自动分区裁剪
SELECT * FROM orders WHERE order_date = '2024-02-15';
-- 只扫描 orders_2024_q1 分区

配置优化

内存配置

# postgresql.conf 关键内存参数

# 共享内存缓冲区(建议:系统内存的 25%)
shared_buffers = 4GB

# 工作内存(每个排序/哈希操作)
work_mem = 64MB

# 维护工作内存(VACUUM、CREATE INDEX)
maintenance_work_mem = 512MB

# 有效缓存大小( planner 估计)
effective_cache_size = 12GB

# WAL 缓冲区
wal_buffers = 64MB

连接配置

# 最大连接数
max_connections = 200

# 连接池推荐使用 PgBouncer

# 每个连接的内存
# 每个连接大约消耗:work_mem × max_connections

查询规划器配置

# 随机页面成本(SSD 可降低)
random_page_cost = 1.1 # 默认 4.0

# 有效 I/O 并发
effective_io_concurrency = 200

# 并行查询配置
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

日志配置

# 慢查询日志
log_min_duration_statement = 1000 # 毫秒

# 记录执行计划
log_auto_explain_min_duration = 5000

# 记录锁等待
log_lock_waits = on
deadlock_timeout = 1s

统计信息

更新统计信息

-- 更新单个表的统计信息
ANALYZE orders;

-- 更新所有表
ANALYZE;

-- 详细分析(更精确但更慢)
VACUUM ANALYZE orders;

-- 设置统计信息详细程度
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 500;
ANALYZE orders;

查看统计信息

-- 查看表统计信息
SELECT
tablename,
n_live_tup,
n_dead_tup,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze
FROM pg_stat_user_tables;

-- 查看列统计信息
SELECT
attname,
n_distinct,
most_common_vals,
most_common_freqs
FROM pg_stats
WHERE tablename = 'orders';

VACUUM 和清理

为什么需要 VACUUM?

PostgreSQL 使用 MVCC,DELETE 和 UPDATE 不会立即删除旧数据,需要 VACUUM 清理:

-- 手动执行 VACUUM
VACUUM orders;

-- VACUUM ANALYZE(清理并更新统计)
VACUUM ANALYZE orders;

-- VACUUM FULL(重建表,会锁表)
VACUUM FULL orders;

-- 查看表的膨胀情况
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
n_dead_tup,
n_live_tup,
ROUND(n_dead_tup::NUMERIC / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_dead_tup > 0
ORDER BY n_dead_tup DESC;

自动清理配置

# postgresql.conf

# 启用自动清理
autovacuum = on

# 触发阈值
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.1

# 清理速度限制
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_cost_limit = 200

# 分析阈值
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.05

性能监控

慢查询监控

-- 启用 pg_stat_statements 扩展
CREATE EXTENSION pg_stat_statements;

-- 查看最慢的查询
SELECT
calls,
ROUND(total_exec_time::NUMERIC, 2) AS total_time_ms,
ROUND(mean_exec_time::NUMERIC, 2) AS avg_time_ms,
ROUND((100 * total_exec_time / SUM(total_exec_time) OVER ())::NUMERIC, 2) AS percent,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- 查看执行次数最多的查询
SELECT
calls,
ROUND(mean_exec_time::NUMERIC, 2) AS avg_time_ms,
query
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

锁监控

-- 查看当前锁
SELECT
pid,
locktype,
mode,
granted,
query_start,
query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;

-- 查看阻塞的查询
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocked_locks.locktype = blocking_locks.locktype
AND blocked_locks.database IS NOT DISTINCT FROM blocking_locks.database
AND blocked_locks.relation IS NOT DISTINCT FROM blocking_locks.relation
AND blocked_locks.page IS NOT DISTINCT FROM blocking_locks.page
AND blocked_locks.tuple IS NOT DISTINCT FROM blocking_locks.tuple
AND blocked_locks.pid != blocking_locks.pid
JOIN pg_stat_activity blocking ON blocking_locks.pid = blocking.pid
WHERE NOT blocked_locks.granted;

-- 终止阻塞的查询
SELECT pg_terminate_backend(pid);

系统监控视图

-- 查看数据库活动
SELECT * FROM pg_stat_activity;

-- 查看表访问统计
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;

-- 查看索引使用统计
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- 查看数据库大小
SELECT
datname,
pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;

-- 查看表大小
SELECT
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
pg_size_pretty(pg_indexes_size(schemaname || '.' || tablename)) AS indexes_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC;

小结

本章我们学习了 PostgreSQL 性能优化:

  1. 查询分析:EXPLAIN、执行计划解读
  2. 查询优化:避免全表扫描、优化条件、优化排序
  3. 索引优化:选择性分析、复合索引、部分索引
  4. 表结构优化:数据类型选择、规范化、分区
  5. 配置优化:内存、连接、规划器配置
  6. 统计信息:ANALYZE、统计信息查看
  7. VACUUM:清理死元组、自动清理配置
  8. 性能监控:慢查询、锁、系统视图

练习

  1. 使用 EXPLAIN 分析一个复杂查询的执行计划
  2. 找出并优化一个慢查询
  3. 分析索引使用情况,删除未使用的索引
  4. 配置慢查询日志,监控数据库性能

参考资源