跳到主要内容

可观测性

可观测性(Observability)是生产环境 AI 应用的关键能力。本章介绍 Spring AI 中的监控、追踪和指标功能。

概述

AI 应用的可观测性包括三个核心支柱:

┌─────────────────────────────────────────────────────────────┐
│ 可观测性三大支柱 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. 指标 (Metrics) │
│ - 请求数量、延迟、Token 使用量 │
│ - 成功率、错误率 │
│ │
│ 2. 追踪 (Traces) │
│ - 请求链路追踪 │
│ - 性能瓶颈分析 │
│ │
│ 3. 日志 (Logs) │
│ - 详细的执行日志 │
│ - 错误堆栈追踪 │
│ │
└─────────────────────────────────────────────────────────────┘

Spring AI 可观测性集成

Spring AI 基于 Spring Boot 的可观测性框架,支持:

  • Micrometer:指标收集
  • OpenTelemetry:分布式追踪
  • Spring Boot Actuator:监控端点

添加依赖

<!-- Spring Boot Actuator -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<!-- Micrometer (指标) -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

<!-- OpenTelemetry (追踪) -->
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-spring-boot-starter</artifactId>
</dependency>

基本配置

management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
metrics:
tags:
application: ${spring.application.name}
tracing:
enabled: true
sampling:
probability: 1.0 # 采样率 100%

spring:
ai:
chat:
observations:
include-completion-tokens: true
include-prompt-tokens: true
include-model-name: true

指标收集

AI 模型指标

Spring AI 自动收集以下指标:

指标名称说明
spring.ai.chat.model.requests聊天请求总数
spring.ai.chat.model.latency请求延迟
spring.ai.chat.model.tokens.prompt输入 Token 数量
spring.ai.chat.model.tokens.completion输出 Token 数量
spring.ai.chat.model.tokens.total总 Token 数量
spring.ai.embedding.model.requests嵌入请求总数
spring.ai.vectorstore.requests向量存储请求数

访问指标

# 查看所有指标
curl http://localhost:8080/actuator/metrics

# 查看特定指标
curl http://localhost:8080/actuator/metrics/spring.ai.chat.model.tokens.total

# Prometheus 格式
curl http://localhost:8080/actuator/prometheus

自定义指标

@Service
public class CustomMetricsService {

private final MeterRegistry meterRegistry;
private final Counter requestCounter;
private final Timer responseTimer;

public CustomMetricsService(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;

this.requestCounter = Counter.builder("ai.custom.requests")
.description("自定义 AI 请求计数")
.tag("type", "chat")
.register(meterRegistry);

this.responseTimer = Timer.builder("ai.custom.response.time")
.description("AI 响应时间")
.register(meterRegistry);
}

public String chatWithMetrics(String message) {
requestCounter.increment();

return responseTimer.record(() -> {
// 执行 AI 请求
return chatClient.prompt()
.user(message)
.call()
.content();
});
}
}

指标仪表盘

使用 Grafana 创建仪表盘:

# prometheus.yml
global:
scrape_interval: 15s

scrape_configs:
- job_name: 'spring-ai-app'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['localhost:8080']

常用 Grafana 查询:

# 请求速率
rate(spring_ai_chat_model_requests_total[5m])

# 平均延迟
rate(spring_ai_chat_model_latency_sum[5m]) / rate(spring_ai_chat_model_latency_count[5m])

# Token 使用量
sum(spring_ai_chat_model_tokens_total_total) by (model)

# 错误率
rate(spring_ai_chat_model_errors_total[5m])

分布式追踪

配置 OpenTelemetry

management:
tracing:
enabled: true
sampling:
probability: 1.0
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces

spring:
ai:
chat:
observations:
record-prompt: true # 记录提示词内容(生产环境慎用)

追踪数据

Spring AI 自动为以下操作创建 Span:

Span 名称说明
chat_model.call聊天模型调用
embedding_model.embed嵌入模型调用
vector_store.add向量存储添加
vector_store.search向量存储搜索
rag.retrieveRAG 检索

自定义追踪

@Service
public class TracingService {

@Autowired
private Tracer tracer;

public String chatWithTracing(String message) {
// 创建自定义 Span
Span span = tracer.spanBuilder("custom.chat.operation")
.setAttribute("user.message", message)
.startSpan();

try (Scope scope = span.makeCurrent()) {
String response = chatClient.prompt()
.user(message)
.call()
.content();

span.setAttribute("response.length", response.length());
return response;
} finally {
span.end();
}
}

public String multiStepChat(String message) {
Span parentSpan = tracer.spanBuilder("multi.step.chat").startSpan();

try (Scope scope = parentSpan.makeCurrent()) {
// 步骤 1:预处理
Span step1Span = tracer.spanBuilder("step.1.preprocess").startSpan();
String processedMessage = preprocess(message);
step1Span.end();

// 步骤 2:调用模型
Span step2Span = tracer.spanBuilder("step.2.model.call").startSpan();
String response = chatClient.prompt().user(processedMessage).call().content();
step2Span.end();

// 步骤 3:后处理
Span step3Span = tracer.spanBuilder("step.3.postprocess").startSpan();
String finalResponse = postprocess(response);
step3Span.end();

return finalResponse;
} finally {
parentSpan.end();
}
}
}

追踪可视化

使用 Jaeger 或 Zipkin 查看:

# Jaeger 配置
spring:
application:
name: spring-ai-app
management:
tracing:
sampling:
probability: 1.0
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces

日志记录

配置日志

logging:
level:
org.springframework.ai: DEBUG
org.springframework.ai.chat: TRACE
pattern:
console: "%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n"

使用 SimpleLoggerAdvisor

@Configuration
public class LoggingConfig {

@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
return builder
.defaultAdvisors(new SimpleLoggerAdvisor())
.build();
}
}

自定义日志 Advisor

public class CustomLoggingAdvisor implements RequestResponseAdvisor {

private static final Logger log = LoggerFactory.getLogger(CustomLoggingAdvisor.class);

@Override
public AdvisedRequest adviseRequest(AdvisedRequest request, Map<String, Object> context) {
log.info("Request: userText={}, systemText={}",
request.userText(),
request.systemText());

context.put("startTime", System.currentTimeMillis());

return request;
}

@Override
public ChatResponse adviseResponse(ChatResponse response, Map<String, Object> context) {
long startTime = (long) context.get("startTime");
long duration = System.currentTimeMillis() - startTime;

log.info("Response: duration={}ms, tokens={}",
duration,
response.getMetadata().getUsage().getTotalTokens());

return response;
}
}

监控仪表盘

创建监控服务

@Service
public class AiMonitoringService {

private final MeterRegistry meterRegistry;

public AiMonitoringService(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
}

/**
* 获取 AI 模型统计信息
*/
public AiStats getStats() {
double totalRequests = getCounterValue("spring.ai.chat.model.requests");
double totalTokens = getCounterValue("spring.ai.chat.model.tokens.total");
double avgLatency = getTimerMean("spring.ai.chat.model.latency");

return new AiStats(
(long) totalRequests,
(long) totalTokens,
avgLatency
);
}

private double getCounterValue(String name) {
return meterRegistry.find(name).counter() != null
? meterRegistry.find(name).counter().count()
: 0;
}

private double getTimerMean(String name) {
return meterRegistry.find(name).timer() != null
? meterRegistry.find(name).timer().mean(TimeUnit.MILLISECONDS)
: 0;
}

record AiStats(long totalRequests, long totalTokens, double avgLatencyMs) {}
}

监控端点

@RestController
@RequestMapping("/api/monitoring")
public class MonitoringController {

@Autowired
private AiMonitoringService monitoringService;

@GetMapping("/stats")
public AiStats getStats() {
return monitoringService.getStats();
}

@GetMapping("/health/ai")
public AiHealth checkAiHealth() {
try {
// 测试 AI 连接
String response = chatClient.prompt()
.user("ping")
.call()
.content();

return new AiHealth("UP", "AI 服务正常");
} catch (Exception e) {
return new AiHealth("DOWN", "AI 服务异常: " + e.getMessage());
}
}

record AiHealth(String status, String message) {}
}

告警配置

Prometheus 告警规则

# alerts.yml
groups:
- name: spring-ai-alerts
rules:
# 高延迟告警
- alert: HighLatency
expr: histogram_quantile(0.95, rate(spring_ai_chat_model_latency_bucket[5m])) > 5000
for: 5m
labels:
severity: warning
annotations:
summary: "AI 模型响应延迟过高"
description: "95分位延迟超过 5 秒"

# 错误率告警
- alert: HighErrorRate
expr: rate(spring_ai_chat_model_errors_total[5m]) / rate(spring_ai_chat_model_requests_total[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "AI 模型错误率过高"
description: "错误率超过 10%"

# Token 使用量告警
- alert: HighTokenUsage
expr: sum(rate(spring_ai_chat_model_tokens_total_total[1h])) > 100000
for: 10m
labels:
severity: warning
annotations:
summary: "Token 使用量过高"
description: "每小时 Token 使用量超过 100000"

Spring Boot 健康检查

@Component
public class AiModelHealthIndicator implements HealthIndicator {

@Autowired
private ChatModel chatModel;

@Override
public Health health() {
try {
// 简单的连通性测试
ChatResponse response = chatModel.call(
new Prompt(new UserMessage("health check"))
);

if (response != null && response.getResult() != null) {
return Health.up()
.withDetail("model", "available")
.build();
}

return Health.down()
.withDetail("error", "Empty response")
.build();
} catch (Exception e) {
return Health.down()
.withDetail("error", e.getMessage())
.build();
}
}
}

成本追踪

Token 成本监控

@Service
public class CostTrackingService {

private final Counter tokenCounter;
private final Map<String, ModelPricing> pricing = Map.of(
"gpt-4o", new ModelPricing(0.005, 0.015), // 输入/输出 每千Token
"gpt-4o-mini", new ModelPricing(0.00015, 0.0006),
"gpt-3.5-turbo", new ModelPricing(0.0005, 0.0015),
"claude-3-sonnet", new ModelPricing(0.003, 0.015)
);

public CostTrackingService(MeterRegistry meterRegistry) {
this.tokenCounter = Counter.builder("ai.cost.tokens")
.description("Token 使用成本")
.register(meterRegistry);
}

public CostReport calculateCost(String model, int promptTokens, int completionTokens) {
ModelPricing modelPricing = pricing.getOrDefault(model, new ModelPricing(0, 0));

double inputCost = (promptTokens / 1000.0) * modelPricing.inputPrice();
double outputCost = (completionTokens / 1000.0) * modelPricing.outputPrice();
double totalCost = inputCost + outputCost;

tokenCounter.increment(totalCost);

return new CostReport(
model,
promptTokens,
completionTokens,
inputCost,
outputCost,
totalCost
);
}

record ModelPricing(double inputPrice, double outputPrice) {}
record CostReport(
String model,
int promptTokens,
int completionTokens,
double inputCost,
double outputCost,
double totalCost
) {}
}

最佳实践

1. 生产环境配置

management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
metrics:
tags:
application: ${spring.application.name}
environment: ${spring.profiles.active}
tracing:
sampling:
probability: 0.1 # 生产环境降低采样率

spring:
ai:
chat:
observations:
record-prompt: false # 生产环境不记录敏感内容

2. 敏感信息处理

@Configuration
public class ObservabilityConfig {

@Bean
public ObservationRegistryCustomizer<ObservationRegistry> observationRegistryCustomizer() {
return registry -> registry.observationConfig()
.observationPredicate((name, context) -> {
// 不记录敏感信息
if (name.contains("password") || name.contains("token")) {
return false;
}
return true;
});
}
}

3. 性能影响

// 使用异步上报减少性能影响
@Bean
public MeterRegistry meterRegistry() {
return new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
}

小结

本章我们学习了:

  1. 可观测性概念:指标、追踪、日志
  2. 指标收集:Token、延迟、请求数等
  3. 分布式追踪:OpenTelemetry 集成
  4. 日志记录:SimpleLoggerAdvisor
  5. 监控仪表盘:Grafana 可视化
  6. 告警配置:Prometheus AlertManager
  7. 成本追踪:Token 使用成本监控

练习

  1. 配置 Prometheus + Grafana 监控 AI 应用
  2. 实现自定义的健康检查端点
  3. 创建 Token 成本监控仪表盘

参考资源