监控管理
Spring Boot Actuator 提供了生产级的监控和管理功能,包括健康检查、指标收集、审计、HTTP 追踪等。本章将详细介绍如何使用和扩展 Actuator。
Actuator 概述
什么是 Actuator?
Actuator 是 Spring Boot 的生产就绪功能模块,提供:
- 健康检查:应用健康状态监控
- 指标收集:性能指标、业务指标
- 端点暴露:通过 HTTP 或 JMX 访问
- 审计功能:记录重要事件
- 远程管理:远程配置和调试
添加依赖
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
内置端点
| 端点 | 说明 | 默认暴露 |
|---|---|---|
health | 应用健康状态 | HTTP/JMX |
info | 应用信息 | HTTP/JMX |
beans | Spring Bean 列表 | JMX |
conditions | 自动配置条件报告 | JMX |
configprops | 配置属性 | JMX |
env | 环境变量 | JMX |
loggers | 日志配置 | JMX |
metrics | 指标信息 | JMX |
mappings | URL 映射 | JMX |
shutdown | 优雅关闭应用 | 无 |
threaddump | 线程转储 | JMX |
heapdump | 堆转储 | 无 |
caches | 缓存信息 | JMX |
scheduledtasks | 定时任务 | JMX |
端点配置
暴露端点
management:
endpoints:
web:
exposure:
# 暴露所有端点
include: "*"
# 排除某些端点
exclude: shutdown,heapdump
# JMX 暴露
jmx:
exposure:
include: "*"
推荐做法:
management:
endpoints:
web:
exposure:
# 生产环境只暴露必要端点
include: health,info,metrics,prometheus
端点访问控制(Spring Boot 3.4+)
Spring Boot 3.4 引入了更精细的端点访问控制模型,支持只读访问级别:
management:
endpoints:
access:
default: read-only # 默认访问级别
endpoint:
health:
access: unrestricted # 完全访问
loggers:
access: read-only # 只读访问
shutdown:
access: none # 禁用访问
访问级别说明:
| 访问级别 | 说明 |
|---|---|
none | 禁用端点访问 |
read-only | 只允许读取操作,禁止修改 |
unrestricted | 完全访问,允许读取和修改 |
最大访问权限限制:
management:
endpoints:
access:
max-permitted: read-only # 限制所有端点最大访问权限为只读
配置示例:即使 loggers 端点配置为 unrestricted,由于 max-permitted 设置为 read-only,实际只能读取日志级别,无法修改。
兼容旧配置:
# 旧配置(已弃用但仍可用)
management:
endpoints:
enabled-by-default: true
endpoint:
health:
enabled: true
# 新配置(推荐)
management:
endpoints:
access:
default: read-only
endpoint:
health:
access: unrestricted
端点安全
management:
endpoints:
web:
base-path: /actuator # 默认路径
exposure:
include: health,info
endpoint:
health:
show-details: when-authorized # 仅授权用户显示详情
# show-details: always # 总是显示
# show-details: never # 从不显示
配合 Spring Security:
@Configuration
public class ActuatorSecurityConfig {
@Bean
public SecurityFilterChain actuatorSecurity(HttpSecurity http) throws Exception {
http
.requestMatcher(EndpointRequest.toAnyEndpoint())
.authorizeExchange(auth -> auth
.requestMatchers(EndpointRequest.to("health", "info")).permitAll()
.anyExchange().hasRole("ACTUATOR")
)
.httpBasic(Customizer.withDefaults());
return http.build();
}
}
自定义端点路径
management:
endpoints:
web:
base-path: /management # 修改基础路径
server:
port: 8081 # 使用独立端口
address: 127.0.0.1 # 只允许本地访问
健康检查
基本使用
访问 GET /actuator/health:
{
"status": "UP",
"components": {
"db": {
"status": "UP",
"details": {
"database": "MySQL",
"validationQuery": "isValid()"
}
},
"diskSpace": {
"status": "UP",
"details": {
"total": 107374182400,
"free": 53687091200,
"threshold": 10485760,
"exists": true
}
},
"ping": {
"status": "UP"
},
"redis": {
"status": "UP",
"details": {
"version": "7.0.0"
}
}
}
}
健康状态
| 状态 | 说明 |
|---|---|
| UP | 正常运行 |
| DOWN | 服务不可用 |
| OUT_OF_SERVICE | 服务暂停 |
| UNKNOWN | 未知状态 |
自定义健康检查
@Component
public class CustomHealthIndicator implements HealthIndicator {
@Autowired
private ExternalService externalService;
@Override
public Health health() {
try {
// 检查外部服务
if (externalService.isAvailable()) {
return Health.up()
.withDetail("service", "External Service")
.withDetail("responseTime", "100ms")
.build();
} else {
return Health.down()
.withDetail("service", "External Service")
.withDetail("error", "Service unavailable")
.build();
}
} catch (Exception e) {
return Health.down(e)
.withDetail("service", "External Service")
.build();
}
}
}
组合健康检查
@Component
public class DatabaseHealthIndicator implements HealthIndicator {
@Autowired
private DataSource dataSource;
@Override
public Health health() {
try (Connection conn = dataSource.getConnection()) {
if (conn.isValid(1)) {
return Health.up()
.withDetail("database", "MySQL")
.withDetail("validationQuery", "isValid()")
.build();
}
return Health.down().withDetail("error", "Connection invalid").build();
} catch (SQLException e) {
return Health.down(e).build();
}
}
}
健康检查配置
management:
endpoint:
health:
show-details: always
group:
# 自定义健康组
liveness:
include: ping,diskSpace
readiness:
include: db,redis
probes:
enabled: true # 启用 Kubernetes 探针
SSL 健康检查(Spring Boot 3.4+)
Spring Boot 3.4 新增了 SSL 证书健康检查,可以监控证书有效性:
management:
health:
ssl:
enabled: true # 启用 SSL 健康检查
certificate-validity-warning-threshold: 14d # 证书过期警告阈值
健康检查响应示例:
{
"status": "UP",
"components": {
"ssl": {
"status": "UP",
"details": {
"validChains": 2,
"invalidChains": 0,
"expiringSoonChains": 0
}
}
}
}
证书即将过期时的响应:
{
"status": "OUT_OF_SERVICE",
"components": {
"ssl": {
"status": "OUT_OF_SERVICE",
"details": {
"validChains": 1,
"invalidChains": 1,
"expiringSoonChains": 1,
"expiredCertificates": [
{
"alias": "server-cert",
"expires": "2024-12-15T00:00:00Z"
}
]
}
}
}
}
SSL 信息端点(Spring Boot 3.4+)
SSL 信息会自动显示在 /actuator/info 端点中:
# 启用 SSL 信息(默认启用)
management:
info:
ssl:
enabled: true
访问 /actuator/info 查看 SSL 信息:
{
"ssl": {
"bundles": {
"server": {
"certificateChain": [
{
"subject": "CN=example.com, O=Example Inc",
"issuer": "CN=Let's Encrypt Authority X3",
"notBefore": "2024-01-01T00:00:00Z",
"notAfter": "2024-12-31T23:59:59Z",
"daysUntilExpiry": 180
}
]
}
}
}
}
提示:证书即将过期时会在 info 端点显示警告,方便运维人员及时更新证书。
Kubernetes 探针
Spring Boot 2.3+ 支持 Kubernetes 探针:
management:
endpoint:
health:
probes:
enabled: true
health:
livenessstate:
enabled: true
readinessstate:
enabled: true
端点:
/actuator/health/liveness- 存活探针/actuator/health/readiness- 就绪探针
Kubernetes 配置:
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
指标监控
内置指标
访问 GET /actuator/metrics:
{
"names": [
"jvm.memory.max",
"jvm.memory.used",
"jvm.gc.pause",
"process.cpu.usage",
"system.cpu.usage",
"http.server.requests",
"tomcat.threads.busy"
]
}
查看单个指标
GET /actuator/metrics/jvm.memory.used:
{
"name": "jvm.memory.used",
"description": "The amount of used memory",
"baseUnit": "bytes",
"measurements": [
{
"statistic": "VALUE",
"value": 123456789
}
],
"availableTags": [
{
"tag": "area",
"values": ["heap", "nonheap"]
},
{
"tag": "id",
"values": ["G1 Survivor Space", "G1 Old Gen", "G1 Eden Space"]
}
]
}
按标签过滤
GET /actuator/metrics/jvm.memory.used?tag=area:heap:
自定义指标
@Service
@RequiredArgsConstructor
public class OrderService {
private final MeterRegistry meterRegistry;
private final Counter orderCounter;
private final Timer orderTimer;
public OrderService(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
// 订单计数器
this.orderCounter = Counter.builder("orders.created")
.description("Total orders created")
.tag("type", "online")
.register(meterRegistry);
// 订单处理计时器
this.orderTimer = Timer.builder("orders.processing.time")
.description("Order processing time")
.register(meterRegistry);
}
public Order createOrder(OrderDTO dto) {
return orderTimer.record(() -> {
// 处理订单
Order order = processOrder(dto);
// 增加计数
orderCounter.increment();
return order;
});
}
}
指标类型
| 类型 | 说明 | 使用场景 |
|---|---|---|
| Counter | 只增不减的计数器 | 请求数、错误数 |
| Gauge | 可增可减的值 | 当前连接数、队列大小 |
| Timer | 计时统计 | 请求耗时 |
| DistributionSummary | 分布统计 | 请求大小分布 |
示例:
@Service
public class MetricsService {
private final MeterRegistry registry;
// Counter:计数器
private final Counter requestCounter;
// Gauge:实时值
private final AtomicInteger activeConnections = new AtomicInteger(0);
// Timer:计时器
private final Timer requestTimer;
// DistributionSummary:分布统计
private final DistributionSummary requestSize;
public MetricsService(MeterRegistry registry) {
this.registry = registry;
// 创建 Counter
this.requestCounter = Counter.builder("app.requests")
.description("Total requests")
.tag("endpoint", "/api/orders")
.register(registry);
// 创建 Gauge
registry.gauge("app.connections.active", activeConnections);
// 创建 Timer
this.requestTimer = Timer.builder("app.request.duration")
.description("Request duration")
.publishPercentiles(0.5, 0.95, 0.99)
.register(registry);
// 创建 DistributionSummary
this.requestSize = DistributionSummary.builder("app.request.size")
.description("Request size in bytes")
.baseUnit("bytes")
.register(registry);
}
public void incrementRequest() {
requestCounter.increment();
}
public void recordRequestDuration(long millis) {
requestTimer.record(millis, TimeUnit.MILLISECONDS);
}
public void recordRequestSize(long bytes) {
requestSize.record(bytes);
}
public void connectionAdded() {
activeConnections.incrementAndGet();
}
public void connectionRemoved() {
activeConnections.decrementAndGet();
}
}
HTTP 请求指标
自动收集 HTTP 请求指标:
management:
metrics:
web:
server:
request:
autotime:
enabled: true
percentiles: 0.5,0.95,0.99
Prometheus 集成
添加依赖
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
配置暴露端点
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
metrics:
tags:
application: ${spring.application.name} # 添加应用标签
访问 Prometheus 端点
GET /actuator/prometheus:
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{area="heap",id="G1 Old Gen",} 1.23456789E8
jvm_memory_used_bytes{area="heap",id="G1 Eden Space",} 5.67890123E7
Prometheus 配置
# prometheus.yml
scrape_configs:
- job_name: 'spring-boot'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['localhost:8080']
Grafana 可视化
使用 Grafana 展示指标:
- 添加 Prometheus 数据源
- 导入 Spring Boot Dashboard(ID: 12900)
- 自定义监控面板
应用信息
配置应用信息
info:
app:
name: @project.name@
version: @project.version@
description: @project.description@
java:
version: @java.version@
author: 张三
contact: [email protected]
访问 GET /actuator/info:
{
"app": {
"name": "myapp",
"version": "1.0.0",
"description": "My Application",
"java": {
"version": "17"
}
},
"author": "张三",
"contact": "[email protected]"
}
Git 信息
添加 git-commit-id-plugin:
<plugin>
<groupId>io.github.git-commit-id</groupId>
<artifactId>git-commit-id-maven-plugin</artifactId>
<version>6.0.0</version>
<executions>
<execution>
<goals>
<goal>revision</goal>
</goals>
</execution>
</executions>
</plugin>
启用 Git 信息:
management:
info:
git:
mode: full
自定义端点
创建自定义端点
@Component
@Endpoint(id = "custom")
public class CustomEndpoint {
@ReadOperation
public Map<String, Object> info() {
Map<String, Object> info = new HashMap<>();
info.put("timestamp", System.currentTimeMillis());
info.put("status", "running");
return info;
}
@ReadOperation
public Map<String, Object> detail(@Selector String name) {
Map<String, Object> detail = new HashMap<>();
detail.put("name", name);
detail.put("value", "detail value");
return detail;
}
@WriteOperation
public void update(@Selector String name, @Nullable String value) {
// 更新操作
}
@DeleteOperation
public void delete(@Selector String name) {
// 删除操作
}
}
访问:
GET /actuator/custom- 调用info()GET /actuator/custom/myname- 调用detail("myname")
Web 端点扩展
@Component
@WebEndpoint(id = "customweb")
public class CustomWebEndpoint {
@ReadOperation
public WebEndpointResponse<Map<String, Object>> info() {
Map<String, Object> data = new HashMap<>();
data.put("message", "Hello from custom endpoint");
return new WebEndpointResponse<>(data, HttpStatus.OK.value());
}
}
控制器端点
@Component
@ControllerEndpoint(id = "customcontroller")
public class CustomControllerEndpoint {
@GetMapping("/hello")
@ResponseBody
public String hello(@RequestParam String name) {
return "Hello, " + name;
}
}
访问:GET /actuator/customcontroller/hello?name=World
审计功能
配置审计
management:
audit:
events:
enabled: true
自定义审计事件
@Configuration
public class AuditConfig {
@Bean
public AuditEventRepository auditEventRepository() {
return new InMemoryAuditEventRepository();
}
}
@Service
@RequiredArgsConstructor
public class UserService {
private final AuditEventRepository auditRepository;
public void login(String username, boolean success) {
auditRepository.add(new AuditEvent(
username,
"AUTHENTICATION",
success ? "SUCCESS" : "FAILURE"
));
}
}
最佳实践
1. 安全配置
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
endpoint:
health:
show-details: when-authorized
2. 独立端口
management:
server:
port: 8081
address: 127.0.0.1
3. 指标标签
management:
metrics:
tags:
application: ${spring.application.name}
environment: ${spring.profiles.active}
4. 监控告警
结合 Prometheus Alertmanager:
groups:
- name: spring-boot
rules:
- alert: HighErrorRate
expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) > 0.1
for: 5m
annotations:
summary: "High error rate detected"
可观测性
可观测性(Observability)是从外部观察运行系统内部状态的能力。它由三大支柱组成:日志(Logging)、指标(Metrics)和追踪(Traces)。Spring Boot 通过 Micrometer 提供了完整的可观测性支持。
可观测性三大支柱
| 支柱 | 说明 | 解决的问题 |
|---|---|---|
| 日志 | 记录离散事件 | 发生了什么?什么时候? |
| 指标 | 聚合的数值测量 | 系统状态如何?趋势是什么? |
| 追踪 | 请求的完整路径 | 请求经过了哪些服务?耗时分布? |
Micrometer Observation API
Spring Boot 使用 Micrometer Observation API 统一处理指标和追踪。
创建自定义观测
import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import org.springframework.stereotype.Component;
@Component
public class OrderService {
private final ObservationRegistry observationRegistry;
public OrderService(ObservationRegistry observationRegistry) {
this.observationRegistry = observationRegistry;
}
public Order createOrder(OrderDTO dto) {
// 创建观测点,自动生成指标和追踪
return Observation.createNotStarted("order.create", observationRegistry)
.lowCardinalityKeyValue("type", dto.getType()) // 低基数标签:加入指标和追踪
.highCardinalityKeyValue("userId", dto.getUserId()) // 高基数标签:仅加入追踪
.observe(() -> {
// 实际业务逻辑
return doCreateOrder(dto);
});
}
}
标签基数说明:
- 低基数标签:取值范围有限,如
type、status、method。会同时添加到指标和追踪中。 - 高基数标签:取值范围无限,如
userId、orderId。只添加到追踪中,避免指标爆炸。
观测生命周期
// 方式一:使用 observe() 方法(推荐)
Observation.createNotStarted("my.operation", observationRegistry)
.lowCardinalityKeyValue("key", "value")
.observe(() -> {
// 业务逻辑
});
// 方式二:手动控制生命周期
Observation observation = Observation.createNotStarted("my.operation", observationRegistry)
.lowCardinalityKeyValue("key", "value")
.start();
try {
// 业务逻辑
observation.event(Observation.Event.of("step1", "第一步完成"));
// 更多业务逻辑
} catch (Exception e) {
observation.error(e); // 记录错误
throw e;
} finally {
observation.stop(); // 必须停止
}
自定义观测约定
// 定义观测约定
public class OrderObservationConvention implements GlobalObservationConvention<OrderObservationContext> {
@Override
public String getName() {
return "order.process";
}
@Override
public String getContextualName(OrderObservationContext context) {
return "order-" + context.getOrderType();
}
@Override
public KeyValues getLowCardinalityKeyValues(OrderObservationContext context) {
return KeyValues.of("order.type", context.getOrderType());
}
}
// 自定义观测上下文
public class OrderObservationContext extends Observation.Context {
private String orderType;
public String getOrderType() {
return orderType;
}
public void setOrderType(String orderType) {
this.orderType = orderType;
}
}
分布式追踪
分布式追踪用于跟踪请求在微服务架构中的完整调用链路,帮助定位性能瓶颈和故障。
追踪原理
┌─────────────────────────────────────────────────────────────────────┐
│ 分布式追踪工作原理 │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 服务 A ──────────────────> 服务 B ──────────────────> 服务 C │
│ │ │ │ │
│ TraceId: abc123 TraceId: abc123 TraceId: abc123
│ SpanId: span1 SpanId: span2 SpanId: span3
│ ParentSpanId: - ParentSpanId: span1 ParentSpanId: span2
│ │
│ TraceId:整个请求链路的唯一标识,在所有服务间传递 │
│ SpanId:单个服务处理的标识,每个服务生成新的 SpanId │
│ ParentSpanId:父 Span 的标识,用于构建调用链路树 │
│ │
└─────────────────────────────────────────────────────────────────────┘
添加追踪依赖
OpenTelemetry + OTLP(推荐)
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-opentelemetry</artifactId>
</dependency>
OpenTelemetry + Zipkin
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-zipkin</artifactId>
</dependency>
Brave + Zipkin
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-zipkin</artifactId>
</dependency>
追踪配置
# 追踪采样率配置
management:
tracing:
sampling:
probability: 1.0 # 采样率 100%(生产环境建议 0.1 或更低)
# Baggage 配置
baggage:
remote-fields: userId,tenantId # 跨服务传递的 baggage
correlation:
fields: userId,tenantId # 添加到 MDC 的 baggage
# OpenTelemetry OTLP 配置
opentelemetry:
tracing:
export:
enabled: true
otlp:
endpoint: http://localhost:4318/v1/traces
transport: http # 或 grpc
日志关联 ID
追踪 ID 会自动添加到日志中,方便关联日志和追踪:
logging:
pattern:
correlation: "[%X{traceId:-},%X{spanId:-}] "
日志输出示例:
2024-01-15 10:30:00.000 [abc123,span1] INFO c.e.UserService - 用户登录成功
追踪传播
使用自动配置的 HTTP 客户端构建器,追踪信息会自动传播:
@Service
public class RemoteService {
// 推荐:使用自动配置的构建器
private final RestClient restClient;
public RemoteService(RestClient.Builder restClientBuilder) {
this.restClient = restClientBuilder
.baseUrl("http://remote-service")
.build();
}
public User getUser(Long id) {
// 追踪信息自动传播到远程服务
return restClient.get()
.uri("/users/{id}", id)
.retrieve()
.body(User.class);
}
}
注意:如果直接创建 RestTemplate、RestClient 或 WebClient,追踪信息不会自动传播。
创建自定义 Span
import io.micrometer.tracing.Tracer;
import io.micrometer.tracing.Span;
import org.springframework.stereotype.Service;
@Service
public class PaymentService {
private final Tracer tracer;
public PaymentService(Tracer tracer) {
this.tracer = tracer;
}
public void processPayment(PaymentRequest request) {
// 创建新的 Span
Span span = tracer.nextSpan().name("payment.process");
try (Tracer.SpanInScope ws = tracer.withSpan(span.start())) {
span.tag("payment.method", request.getMethod());
span.event("payment.started");
// 业务逻辑
doPayment(request);
span.event("payment.completed");
} catch (Exception e) {
span.tag("error", true);
span.event("payment.failed: " + e.getMessage());
throw e;
} finally {
span.end();
}
}
}
Baggage 使用
Baggage 用于在追踪链路中传递上下文信息:
import io.micrometer.tracing.Tracer;
import io.micrometer.tracing.BaggageInScope;
import org.springframework.stereotype.Service;
@Service
public class TenantService {
private final Tracer tracer;
public TenantService(Tracer tracer) {
this.tracer = tracer;
}
public void processWithTenant(String tenantId) {
// 创建 baggage,自动传播到下游服务
try (BaggageInScope baggage = tracer.createBaggageInScope("tenantId", tenantId)) {
// 在这个作用域内,tenantId 会传播到所有下游调用
doSomething();
}
}
public String getCurrentTenant() {
// 获取当前 baggage
return tracer.getBaggage("tenantId").get();
}
}
集成 Zipkin
启动 Zipkin
# 使用 Docker 启动 Zipkin
docker run -d -p 9411:9411 openzipkin/zipkin
# 或下载 JAR 直接运行
curl -sSL https://zipkin.io/quickstart.sh | bash -s
java -jar zipkin.jar
配置 Spring Boot
spring:
application:
name: my-service
management:
tracing:
sampling:
probability: 1.0
zipkin:
tracing:
endpoint: http://localhost:9411/api/v2/spans
访问 Zipkin UI
打开 http://localhost:9411,可以查看追踪信息:
- 服务依赖图
- 请求调用链
- 各 Span 耗时分布
集成 Jaeger
启动 Jaeger
docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
配置 Spring Boot
management:
tracing:
sampling:
probability: 1.0
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces
transport: http
访问 Jaeger UI
打开 http://localhost:16686,查看追踪信息。
Grafana LGTM 集成
LGTM(Loki + Grafana + Tempo + Mimir)是一个完整的可观测性技术栈:
# 使用 Grafana LGTM 容器
docker run -d --name lgtm \
-p 3000:3000 \
-p 4317:4317 \
-p 4318:4318 \
grafana/otel-lgtm
配置 Spring Boot:
management:
tracing:
sampling:
probability: 1.0
otlp:
tracing:
endpoint: http://localhost:4317
transport: grpc
logging:
endpoint: http://localhost:4317
访问 Grafana(http://localhost:3000)可以查看:
- 日志
- 指标
- 追踪
- 三者关联分析
OTLP 日志导出(Spring Boot 3.4+)
Spring Boot 3.4 新增了对 OTLP 日志导出的完整支持,可以将日志发送到 OpenTelemetry Collector:
management:
otlp:
logging:
endpoint: http://localhost:4318/v1/logs # OTLP 日志端点
transport: http # 使用 HTTP 传输
# transport: grpc # 或使用 gRPC 传输
connect-timeout: 5s # 连接超时
tracing:
transport: grpc # 追踪使用 gRPC
connect-timeout: 5s
启用/禁用日志导出:
management:
otlp:
logging:
export:
enabled: true # 启用日志导出(默认 true)
完整 OTLP 配置示例:
management:
tracing:
sampling:
probability: 1.0
otlp:
# 日志配置
logging:
endpoint: http://otel-collector:4318/v1/logs
transport: http
connect-timeout: 5s
export:
enabled: true
# 追踪配置
tracing:
endpoint: http://otel-collector:4318/v1/traces
transport: http
connect-timeout: 5s
export:
enabled: true
OpenTelemetry Collector 配置示例:
# otel-collector-config.yaml
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
exporters:
loki:
endpoint: http://loki:3100/loki/api/v1/push
tempo:
endpoint: tempo:4317
tls:
insecure: true
service:
pipelines:
logs:
receivers: [otlp]
exporters: [loki]
traces:
receivers: [otlp]
exporters: [tempo]
Docker Compose 完整示例:
version: '3.8'
services:
app:
build: .
environment:
- MANAGEMENT_OTLP_LOGGING_ENDPOINT=http://otel-collector:4318/v1/logs
- MANAGEMENT_OTLP_TRACING_ENDPOINT=http://otel-collector:4318/v1/traces
depends_on:
- otel-collector
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317"
- "4318:4318"
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
tempo:
image: grafana/tempo:latest
ports:
- "3200:3200"
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
测试追踪
测试时追踪组件不会自动配置。如需测试追踪:
@SpringBootTest
@Import(TestObservationConfig.class)
class TracingTest {
@Autowired
private ObservationRegistry observationRegistry;
@Test
void testObservation() {
// 使用测试配置的 ObservationRegistry
Observation.createNotStarted("test.observation", observationRegistry)
.observe(() -> {
// 测试逻辑
});
}
}
@Configuration
class TestObservationConfig {
@Bean
ObservationRegistry observationRegistry() {
ObservationRegistry registry = ObservationRegistry.create();
registry.observationConfig().observationHandler(new SimpleObservationHandler());
return registry;
}
}
可观测性最佳实践
1. 合理命名观测
// 推荐:使用点分隔的层级命名
Observation.createNotStarted("order.create", observationRegistry)
Observation.createNotStarted("order.payment.process", observationRegistry)
Observation.createNotStarted("db.query.users.findById", observationRegistry)
// 不推荐:随意命名
Observation.createNotStarted("创建订单", observationRegistry)
Observation.createNotStarted("doSomething", observationRegistry)
2. 谨慎使用标签
// 推荐:低基数标签
.lowCardinalityKeyValue("status", "success") // 取值有限:success, failed
.lowCardinalityKeyValue("method", "credit") // 取值有限:credit, debit
// 不推荐:高基数标签用于指标
.lowCardinalityKeyValue("orderId", "12345") // 取值无限,会导致指标爆炸
// 正确:高基数标签只用于追踪
.highCardinalityKeyValue("orderId", "12345") // 只在追踪中使用
3. 采样率配置
# 开发环境:全量采样
management:
tracing:
sampling:
probability: 1.0
# 生产环境:低采样率
management:
tracing:
sampling:
probability: 0.1 # 10% 采样
4. 敏感信息处理
// 不要在追踪中记录敏感信息
// 错误
.highCardinalityKeyValue("password", request.getPassword())
// 正确
.highCardinalityKeyValue("hasPassword", String.valueOf(request.getPassword() != null))
5. 异常追踪
Observation observation = Observation.createNotStarted("my.operation", observationRegistry)
.start();
try {
// 业务逻辑
} catch (BusinessException e) {
// 记录业务异常
observation.lowCardinalityKeyValue("error.type", "business");
observation.highCardinalityKeyValue("error.code", e.getCode());
throw e;
} catch (Exception e) {
// 记录系统异常
observation.error(e); // 自动记录异常堆栈
throw e;
} finally {
observation.stop();
}
小结
本章我们学习了:
- Actuator 概述:了解内置端点
- 端点配置:暴露、安全、路径配置
- 健康检查:自定义健康指示器、Kubernetes 探针
- 指标监控:内置指标、自定义指标
- Prometheus 集成:与监控系统集成
- 可观测性:三大支柱与 Micrometer Observation API
- 分布式追踪:追踪原理、OpenTelemetry、Zipkin、Jaeger 集成
- 自定义端点:扩展监控能力
- 最佳实践:安全、性能、告警
练习
- 配置健康检查端点,显示数据库连接状态
- 创建自定义健康指示器
- 添加自定义业务指标和观测
- 集成 Prometheus 和 Grafana
- 配置分布式追踪并查看调用链路
- 配置 Kubernetes 探针