可观测性

可观测性（Observability）是生产环境 AI 应用的关键能力。本章介绍 Spring AI 中的监控、追踪和指标功能。

概述

AI 应用的可观测性包括三个核心支柱：

┌─────────────────────────────────────────────────────────────┐
│                    可观测性三大支柱                          │
├─────────────────────────────────────────────────────────────┤
│                                                            │
│   1. 指标 (Metrics)                                        │
│      - 请求数量、延迟、Token 使用量                        │
│      - 成功率、错误率                                      │
│                                                            │
│   2. 追踪 (Traces)                                         │
│      - 请求链路追踪                                        │
│      - 性能瓶颈分析                                        │
│                                                            │
│   3. 日志 (Logs)                                           │
│      - 详细的执行日志                                      │
│      - 错误堆栈追踪                                        │
│                                                            │
└─────────────────────────────────────────────────────────────┘

Spring AI 可观测性集成

Spring AI 基于 Spring Boot 的可观测性框架，支持：

Micrometer：指标收集
OpenTelemetry：分布式追踪
Spring Boot Actuator：监控端点

添加依赖

<!-- Spring Boot Actuator -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<!-- Micrometer (指标) -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

<!-- OpenTelemetry (追踪) -->
<dependency>
    <groupId>io.opentelemetry.instrumentation</groupId>
    <artifactId>opentelemetry-spring-boot-starter</artifactId>
</dependency>

基本配置

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  metrics:
    tags:
      application: ${spring.application.name}
  tracing:
    enabled: true
    sampling:
      probability: 1.0  # 采样率 100%

spring:
  ai:
    chat:
      observations:
        include-completion-tokens: true
        include-prompt-tokens: true
        include-model-name: true

指标收集

AI 模型指标

Spring AI 自动收集以下指标：

指标名称	说明
`spring.ai.chat.model.requests`	聊天请求总数
`spring.ai.chat.model.latency`	请求延迟
`spring.ai.chat.model.tokens.prompt`	输入 Token 数量
`spring.ai.chat.model.tokens.completion`	输出 Token 数量
`spring.ai.chat.model.tokens.total`	总 Token 数量
`spring.ai.embedding.model.requests`	嵌入请求总数
`spring.ai.vectorstore.requests`	向量存储请求数

访问指标

# 查看所有指标
curl http://localhost:8080/actuator/metrics

# 查看特定指标
curl http://localhost:8080/actuator/metrics/spring.ai.chat.model.tokens.total

# Prometheus 格式
curl http://localhost:8080/actuator/prometheus

自定义指标

@Service
public class CustomMetricsService {

    private final MeterRegistry meterRegistry;
    private final Counter requestCounter;
    private final Timer responseTimer;

    public CustomMetricsService(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
        
        this.requestCounter = Counter.builder("ai.custom.requests")
            .description("自定义 AI 请求计数")
            .tag("type", "chat")
            .register(meterRegistry);
        
        this.responseTimer = Timer.builder("ai.custom.response.time")
            .description("AI 响应时间")
            .register(meterRegistry);
    }

    public String chatWithMetrics(String message) {
        requestCounter.increment();
        
        return responseTimer.record(() -> {
            // 执行 AI 请求
            return chatClient.prompt()
                .user(message)
                .call()
                .content();
        });
    }
}

指标仪表盘

使用 Grafana 创建仪表盘：

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'spring-ai-app'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['localhost:8080']

常用 Grafana 查询：

# 请求速率
rate(spring_ai_chat_model_requests_total[5m])

# 平均延迟
rate(spring_ai_chat_model_latency_sum[5m]) / rate(spring_ai_chat_model_latency_count[5m])

# Token 使用量
sum(spring_ai_chat_model_tokens_total_total) by (model)

# 错误率
rate(spring_ai_chat_model_errors_total[5m])

分布式追踪

配置 OpenTelemetry

management:
  tracing:
    enabled: true
    sampling:
      probability: 1.0
  otlp:
    tracing:
      endpoint: http://localhost:4318/v1/traces

spring:
  ai:
    chat:
      observations:
        record-prompt: true  # 记录提示词内容（生产环境慎用）

追踪数据

Spring AI 自动为以下操作创建 Span：

Span 名称	说明
`chat_model.call`	聊天模型调用
`embedding_model.embed`	嵌入模型调用
`vector_store.add`	向量存储添加
`vector_store.search`	向量存储搜索
`rag.retrieve`	RAG 检索

自定义追踪

@Service
public class TracingService {

    @Autowired
    private Tracer tracer;

    public String chatWithTracing(String message) {
        // 创建自定义 Span
        Span span = tracer.spanBuilder("custom.chat.operation")
            .setAttribute("user.message", message)
            .startSpan();
        
        try (Scope scope = span.makeCurrent()) {
            String response = chatClient.prompt()
                .user(message)
                .call()
                .content();
            
            span.setAttribute("response.length", response.length());
            return response;
        } finally {
            span.end();
        }
    }

    public String multiStepChat(String message) {
        Span parentSpan = tracer.spanBuilder("multi.step.chat").startSpan();
        
        try (Scope scope = parentSpan.makeCurrent()) {
            // 步骤 1：预处理
            Span step1Span = tracer.spanBuilder("step.1.preprocess").startSpan();
            String processedMessage = preprocess(message);
            step1Span.end();
            
            // 步骤 2：调用模型
            Span step2Span = tracer.spanBuilder("step.2.model.call").startSpan();
            String response = chatClient.prompt().user(processedMessage).call().content();
            step2Span.end();
            
            // 步骤 3：后处理
            Span step3Span = tracer.spanBuilder("step.3.postprocess").startSpan();
            String finalResponse = postprocess(response);
            step3Span.end();
            
            return finalResponse;
        } finally {
            parentSpan.end();
        }
    }
}

追踪可视化

使用 Jaeger 或 Zipkin 查看：

# Jaeger 配置
spring:
  application:
    name: spring-ai-app
management:
  tracing:
    sampling:
      probability: 1.0
  otlp:
    tracing:
      endpoint: http://localhost:4318/v1/traces

日志记录

配置日志

logging:
  level:
    org.springframework.ai: DEBUG
    org.springframework.ai.chat: TRACE
  pattern:
    console: "%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n"

使用 SimpleLoggerAdvisor

@Configuration
public class LoggingConfig {

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) {
        return builder
            .defaultAdvisors(new SimpleLoggerAdvisor())
            .build();
    }
}

自定义日志 Advisor

public class CustomLoggingAdvisor implements RequestResponseAdvisor {

    private static final Logger log = LoggerFactory.getLogger(CustomLoggingAdvisor.class);

    @Override
    public AdvisedRequest adviseRequest(AdvisedRequest request, Map<String, Object> context) {
        log.info("Request: userText={}, systemText={}", 
            request.userText(), 
            request.systemText());
        
        context.put("startTime", System.currentTimeMillis());
        
        return request;
    }

    @Override
    public ChatResponse adviseResponse(ChatResponse response, Map<String, Object> context) {
        long startTime = (long) context.get("startTime");
        long duration = System.currentTimeMillis() - startTime;
        
        log.info("Response: duration={}ms, tokens={}", 
            duration,
            response.getMetadata().getUsage().getTotalTokens());
        
        return response;
    }
}

监控仪表盘

创建监控服务

@Service
public class AiMonitoringService {

    private final MeterRegistry meterRegistry;

    public AiMonitoringService(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
    }

    /**
     * 获取 AI 模型统计信息
     */
    public AiStats getStats() {
        double totalRequests = getCounterValue("spring.ai.chat.model.requests");
        double totalTokens = getCounterValue("spring.ai.chat.model.tokens.total");
        double avgLatency = getTimerMean("spring.ai.chat.model.latency");
        
        return new AiStats(
            (long) totalRequests,
            (long) totalTokens,
            avgLatency
        );
    }

    private double getCounterValue(String name) {
        return meterRegistry.find(name).counter() != null 
            ? meterRegistry.find(name).counter().count() 
            : 0;
    }

    private double getTimerMean(String name) {
        return meterRegistry.find(name).timer() != null 
            ? meterRegistry.find(name).timer().mean(TimeUnit.MILLISECONDS) 
            : 0;
    }

    record AiStats(long totalRequests, long totalTokens, double avgLatencyMs) {}
}

监控端点

@RestController
@RequestMapping("/api/monitoring")
public class MonitoringController {

    @Autowired
    private AiMonitoringService monitoringService;

    @GetMapping("/stats")
    public AiStats getStats() {
        return monitoringService.getStats();
    }

    @GetMapping("/health/ai")
    public AiHealth checkAiHealth() {
        try {
            // 测试 AI 连接
            String response = chatClient.prompt()
                .user("ping")
                .call()
                .content();
            
            return new AiHealth("UP", "AI 服务正常");
        } catch (Exception e) {
            return new AiHealth("DOWN", "AI 服务异常: " + e.getMessage());
        }
    }

    record AiHealth(String status, String message) {}
}

告警配置

Prometheus 告警规则

# alerts.yml
groups:
  - name: spring-ai-alerts
    rules:
      # 高延迟告警
      - alert: HighLatency
        expr: histogram_quantile(0.95, rate(spring_ai_chat_model_latency_bucket[5m])) > 5000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "AI 模型响应延迟过高"
          description: "95分位延迟超过 5 秒"
      
      # 错误率告警
      - alert: HighErrorRate
        expr: rate(spring_ai_chat_model_errors_total[5m]) / rate(spring_ai_chat_model_requests_total[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "AI 模型错误率过高"
          description: "错误率超过 10%"
      
      # Token 使用量告警
      - alert: HighTokenUsage
        expr: sum(rate(spring_ai_chat_model_tokens_total_total[1h])) > 100000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Token 使用量过高"
          description: "每小时 Token 使用量超过 100000"

Spring Boot 健康检查

@Component
public class AiModelHealthIndicator implements HealthIndicator {

    @Autowired
    private ChatModel chatModel;

    @Override
    public Health health() {
        try {
            // 简单的连通性测试
            ChatResponse response = chatModel.call(
                new Prompt(new UserMessage("health check"))
            );
            
            if (response != null && response.getResult() != null) {
                return Health.up()
                    .withDetail("model", "available")
                    .build();
            }
            
            return Health.down()
                .withDetail("error", "Empty response")
                .build();
        } catch (Exception e) {
            return Health.down()
                .withDetail("error", e.getMessage())
                .build();
        }
    }
}

成本追踪

Token 成本监控

@Service
public class CostTrackingService {

    private final Counter tokenCounter;
    private final Map<String, ModelPricing> pricing = Map.of(
        "gpt-4o", new ModelPricing(0.005, 0.015),      // 输入/输出 每千Token
        "gpt-4o-mini", new ModelPricing(0.00015, 0.0006),
        "gpt-3.5-turbo", new ModelPricing(0.0005, 0.0015),
        "claude-3-sonnet", new ModelPricing(0.003, 0.015)
    );

    public CostTrackingService(MeterRegistry meterRegistry) {
        this.tokenCounter = Counter.builder("ai.cost.tokens")
            .description("Token 使用成本")
            .register(meterRegistry);
    }

    public CostReport calculateCost(String model, int promptTokens, int completionTokens) {
        ModelPricing modelPricing = pricing.getOrDefault(model, new ModelPricing(0, 0));
        
        double inputCost = (promptTokens / 1000.0) * modelPricing.inputPrice();
        double outputCost = (completionTokens / 1000.0) * modelPricing.outputPrice();
        double totalCost = inputCost + outputCost;
        
        tokenCounter.increment(totalCost);
        
        return new CostReport(
            model,
            promptTokens,
            completionTokens,
            inputCost,
            outputCost,
            totalCost
        );
    }

    record ModelPricing(double inputPrice, double outputPrice) {}
    record CostReport(
        String model,
        int promptTokens,
        int completionTokens,
        double inputCost,
        double outputCost,
        double totalCost
    ) {}
}

最佳实践

1. 生产环境配置

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  metrics:
    tags:
      application: ${spring.application.name}
      environment: ${spring.profiles.active}
  tracing:
    sampling:
      probability: 0.1  # 生产环境降低采样率

spring:
  ai:
    chat:
      observations:
        record-prompt: false  # 生产环境不记录敏感内容

2. 敏感信息处理

@Configuration
public class ObservabilityConfig {

    @Bean
    public ObservationRegistryCustomizer<ObservationRegistry> observationRegistryCustomizer() {
        return registry -> registry.observationConfig()
            .observationPredicate((name, context) -> {
                // 不记录敏感信息
                if (name.contains("password") || name.contains("token")) {
                    return false;
                }
                return true;
            });
    }
}

3. 性能影响

// 使用异步上报减少性能影响
@Bean
public MeterRegistry meterRegistry() {
    return new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
}

小结

本章我们学习了：

可观测性概念：指标、追踪、日志
指标收集：Token、延迟、请求数等
分布式追踪：OpenTelemetry 集成
日志记录：SimpleLoggerAdvisor
监控仪表盘：Grafana 可视化
告警配置：Prometheus AlertManager
成本追踪：Token 使用成本监控

练习

配置 Prometheus + Grafana 监控 AI 应用
实现自定义的健康检查端点
创建 Token 成本监控仪表盘

概述​

Spring AI 可观测性集成​

添加依赖​

基本配置​

指标收集​

AI 模型指标​

访问指标​

自定义指标​

指标仪表盘​

分布式追踪​

配置 OpenTelemetry​

追踪数据​

自定义追踪​

追踪可视化​

日志记录​

配置日志​

使用 SimpleLoggerAdvisor​

自定义日志 Advisor​

监控仪表盘​

创建监控服务​

监控端点​

告警配置​

Prometheus 告警规则​

Spring Boot 健康检查​

成本追踪​

Token 成本监控​

最佳实践​

1. 生产环境配置​

2. 敏感信息处理​

3. 性能影响​

小结​

练习​

参考资源​

概述