可观测性
可观测性(Observability)是生产环境 AI 应用的关键能力。本章介绍 Spring AI 中的监控、追踪和指标功能。
概述
AI 应用的可观测性包括三个核心支柱:
┌─────────────────────────────────────────────────────────────┐
│ 可观测性三大支柱 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. 指标 (Metrics) │
│ - 请求数量、延迟、Token 使用量 │
│ - 成功率、错误率 │
│ │
│ 2. 追踪 (Traces) │
│ - 请求链路追踪 │
│ - 性能瓶颈分析 │
│ │
│ 3. 日志 (Logs) │
│ - 详细的执行日志 │
│ - 错误堆栈追踪 │
│ │
└─────────────────────────────────────────────────────────────┘
Spring AI 可观测性集成
Spring AI 基于 Spring Boot 的可观测性框架,支持:
- Micrometer:指标收集
- OpenTelemetry:分布式追踪
- Spring Boot Actuator:监控端点
添加依赖
<!-- Spring Boot Actuator -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- Micrometer (指标) -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<!-- OpenTelemetry (追踪) -->
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-spring-boot-starter</artifactId>
</dependency>
基本配置
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
metrics:
tags:
application: ${spring.application.name}
tracing:
enabled: true
sampling:
probability: 1.0 # 采样率 100%
spring:
ai:
chat:
observations:
include-completion-tokens: true
include-prompt-tokens: true
include-model-name: true
指标收集
AI 模型指标
Spring AI 自动收集以下指标:
| 指标名称 | 说明 |
|---|---|
spring.ai.chat.model.requests | 聊天请求总数 |
spring.ai.chat.model.latency | 请求延迟 |
spring.ai.chat.model.tokens.prompt | 输入 Token 数量 |
spring.ai.chat.model.tokens.completion | 输出 Token 数量 |
spring.ai.chat.model.tokens.total | 总 Token 数量 |
spring.ai.embedding.model.requests | 嵌入请求总数 |
spring.ai.vectorstore.requests | 向量存储请求数 |
访问指标
# 查看所有指标
curl http://localhost:8080/actuator/metrics
# 查看特定指标
curl http://localhost:8080/actuator/metrics/spring.ai.chat.model.tokens.total
# Prometheus 格式
curl http://localhost:8080/actuator/prometheus
自定义指标
@Service
public class CustomMetricsService {
private final MeterRegistry meterRegistry;
private final Counter requestCounter;
private final Timer responseTimer;
public CustomMetricsService(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.requestCounter = Counter.builder("ai.custom.requests")
.description("自定义 AI 请求计数")
.tag("type", "chat")
.register(meterRegistry);
this.responseTimer = Timer.builder("ai.custom.response.time")
.description("AI 响应时间")
.register(meterRegistry);
}
public String chatWithMetrics(String message) {
requestCounter.increment();
return responseTimer.record(() -> {
// 执行 AI 请求
return chatClient.prompt()
.user(message)
.call()
.content();
});
}
}
指标仪表盘
使用 Grafana 创建仪表盘:
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'spring-ai-app'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['localhost:8080']
常用 Grafana 查询:
# 请求速率
rate(spring_ai_chat_model_requests_total[5m])
# 平均延迟
rate(spring_ai_chat_model_latency_sum[5m]) / rate(spring_ai_chat_model_latency_count[5m])
# Token 使用量
sum(spring_ai_chat_model_tokens_total_total) by (model)
# 错误率
rate(spring_ai_chat_model_errors_total[5m])
分布式追踪
配置 OpenTelemetry
management:
tracing:
enabled: true
sampling:
probability: 1.0
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces
spring:
ai:
chat:
observations:
record-prompt: true # 记录提示词内容(生产环境慎用)
追踪数据
Spring AI 自动为以下操作创建 Span:
| Span 名称 | 说明 |
|---|---|
chat_model.call | 聊天模型调用 |
embedding_model.embed | 嵌入模型调用 |
vector_store.add | 向量存储添加 |
vector_store.search | 向量存储搜索 |
rag.retrieve | RAG 检索 |
自定义追踪
@Service
public class TracingService {
@Autowired
private Tracer tracer;
public String chatWithTracing(String message) {
// 创建自定义 Span
Span span = tracer.spanBuilder("custom.chat.operation")
.setAttribute("user.message", message)
.startSpan();
try (Scope scope = span.makeCurrent()) {
String response = chatClient.prompt()
.user(message)
.call()
.content();
span.setAttribute("response.length", response.length());
return response;
} finally {
span.end();
}
}
public String multiStepChat(String message) {
Span parentSpan = tracer.spanBuilder("multi.step.chat").startSpan();
try (Scope scope = parentSpan.makeCurrent()) {
// 步骤 1:预处理
Span step1Span = tracer.spanBuilder("step.1.preprocess").startSpan();
String processedMessage = preprocess(message);
step1Span.end();
// 步骤 2:调用模型
Span step2Span = tracer.spanBuilder("step.2.model.call").startSpan();
String response = chatClient.prompt().user(processedMessage).call().content();
step2Span.end();
// 步骤 3:后处理
Span step3Span = tracer.spanBuilder("step.3.postprocess").startSpan();
String finalResponse = postprocess(response);
step3Span.end();
return finalResponse;
} finally {
parentSpan.end();
}
}
}
追踪可视化
使用 Jaeger 或 Zipkin 查看:
# Jaeger 配置
spring:
application:
name: spring-ai-app
management:
tracing:
sampling:
probability: 1.0
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces
日志记录
配置日志
logging:
level:
org.springframework.ai: DEBUG
org.springframework.ai.chat: TRACE
pattern:
console: "%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n"
使用 SimpleLoggerAdvisor
@Configuration
public class LoggingConfig {
@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
return builder
.defaultAdvisors(new SimpleLoggerAdvisor())
.build();
}
}
自定义日志 Advisor
public class CustomLoggingAdvisor implements RequestResponseAdvisor {
private static final Logger log = LoggerFactory.getLogger(CustomLoggingAdvisor.class);
@Override
public AdvisedRequest adviseRequest(AdvisedRequest request, Map<String, Object> context) {
log.info("Request: userText={}, systemText={}",
request.userText(),
request.systemText());
context.put("startTime", System.currentTimeMillis());
return request;
}
@Override
public ChatResponse adviseResponse(ChatResponse response, Map<String, Object> context) {
long startTime = (long) context.get("startTime");
long duration = System.currentTimeMillis() - startTime;
log.info("Response: duration={}ms, tokens={}",
duration,
response.getMetadata().getUsage().getTotalTokens());
return response;
}
}
监控仪表盘
创建监控服务
@Service
public class AiMonitoringService {
private final MeterRegistry meterRegistry;
public AiMonitoringService(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
}
/**
* 获取 AI 模型统计信息
*/
public AiStats getStats() {
double totalRequests = getCounterValue("spring.ai.chat.model.requests");
double totalTokens = getCounterValue("spring.ai.chat.model.tokens.total");
double avgLatency = getTimerMean("spring.ai.chat.model.latency");
return new AiStats(
(long) totalRequests,
(long) totalTokens,
avgLatency
);
}
private double getCounterValue(String name) {
return meterRegistry.find(name).counter() != null
? meterRegistry.find(name).counter().count()
: 0;
}
private double getTimerMean(String name) {
return meterRegistry.find(name).timer() != null
? meterRegistry.find(name).timer().mean(TimeUnit.MILLISECONDS)
: 0;
}
record AiStats(long totalRequests, long totalTokens, double avgLatencyMs) {}
}
监控端点
@RestController
@RequestMapping("/api/monitoring")
public class MonitoringController {
@Autowired
private AiMonitoringService monitoringService;
@GetMapping("/stats")
public AiStats getStats() {
return monitoringService.getStats();
}
@GetMapping("/health/ai")
public AiHealth checkAiHealth() {
try {
// 测试 AI 连接
String response = chatClient.prompt()
.user("ping")
.call()
.content();
return new AiHealth("UP", "AI 服务正常");
} catch (Exception e) {
return new AiHealth("DOWN", "AI 服务异常: " + e.getMessage());
}
}
record AiHealth(String status, String message) {}
}
告警配置
Prometheus 告警规则
# alerts.yml
groups:
- name: spring-ai-alerts
rules:
# 高延迟告警
- alert: HighLatency
expr: histogram_quantile(0.95, rate(spring_ai_chat_model_latency_bucket[5m])) > 5000
for: 5m
labels:
severity: warning
annotations:
summary: "AI 模型响应延迟过高"
description: "95分位延迟超过 5 秒"
# 错误率告警
- alert: HighErrorRate
expr: rate(spring_ai_chat_model_errors_total[5m]) / rate(spring_ai_chat_model_requests_total[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "AI 模型错误率过高"
description: "错误率超过 10%"
# Token 使用量告警
- alert: HighTokenUsage
expr: sum(rate(spring_ai_chat_model_tokens_total_total[1h])) > 100000
for: 10m
labels:
severity: warning
annotations:
summary: "Token 使用量过高"
description: "每小时 Token 使用量超过 100000"
Spring Boot 健康检查
@Component
public class AiModelHealthIndicator implements HealthIndicator {
@Autowired
private ChatModel chatModel;
@Override
public Health health() {
try {
// 简单的连通性测试
ChatResponse response = chatModel.call(
new Prompt(new UserMessage("health check"))
);
if (response != null && response.getResult() != null) {
return Health.up()
.withDetail("model", "available")
.build();
}
return Health.down()
.withDetail("error", "Empty response")
.build();
} catch (Exception e) {
return Health.down()
.withDetail("error", e.getMessage())
.build();
}
}
}
成本追踪
Token 成本监控
@Service
public class CostTrackingService {
private final Counter tokenCounter;
private final Map<String, ModelPricing> pricing = Map.of(
"gpt-4o", new ModelPricing(0.005, 0.015), // 输入/输出 每千Token
"gpt-4o-mini", new ModelPricing(0.00015, 0.0006),
"gpt-3.5-turbo", new ModelPricing(0.0005, 0.0015),
"claude-3-sonnet", new ModelPricing(0.003, 0.015)
);
public CostTrackingService(MeterRegistry meterRegistry) {
this.tokenCounter = Counter.builder("ai.cost.tokens")
.description("Token 使用成本")
.register(meterRegistry);
}
public CostReport calculateCost(String model, int promptTokens, int completionTokens) {
ModelPricing modelPricing = pricing.getOrDefault(model, new ModelPricing(0, 0));
double inputCost = (promptTokens / 1000.0) * modelPricing.inputPrice();
double outputCost = (completionTokens / 1000.0) * modelPricing.outputPrice();
double totalCost = inputCost + outputCost;
tokenCounter.increment(totalCost);
return new CostReport(
model,
promptTokens,
completionTokens,
inputCost,
outputCost,
totalCost
);
}
record ModelPricing(double inputPrice, double outputPrice) {}
record CostReport(
String model,
int promptTokens,
int completionTokens,
double inputCost,
double outputCost,
double totalCost
) {}
}
最佳实践
1. 生产环境配置
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
metrics:
tags:
application: ${spring.application.name}
environment: ${spring.profiles.active}
tracing:
sampling:
probability: 0.1 # 生产环境降低采样率
spring:
ai:
chat:
observations:
record-prompt: false # 生产环境不记录敏感内容
2. 敏感信息处理
@Configuration
public class ObservabilityConfig {
@Bean
public ObservationRegistryCustomizer<ObservationRegistry> observationRegistryCustomizer() {
return registry -> registry.observationConfig()
.observationPredicate((name, context) -> {
// 不记录敏感信息
if (name.contains("password") || name.contains("token")) {
return false;
}
return true;
});
}
}
3. 性能影响
// 使用异步上报减少性能影响
@Bean
public MeterRegistry meterRegistry() {
return new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
}
小结
本章我们学习了:
- 可观测性概念:指标、追踪、日志
- 指标收集:Token、延迟、请求数等
- 分布式追踪:OpenTelemetry 集成
- 日志记录:SimpleLoggerAdvisor
- 监控仪表盘:Grafana 可视化
- 告警配置:Prometheus AlertManager
- 成本追踪:Token 使用成本监控
练习
- 配置 Prometheus + Grafana 监控 AI 应用
- 实现自定义的健康检查端点
- 创建 Token 成本监控仪表盘