图像生成

图像生成是 AI 的重要应用领域，可以根据文本描述创建图片。本章介绍 Spring AI 中的图像生成功能。

概述

图像生成模型（Text-to-Image）可以将文字描述转换为图像：

┌─────────────────────────────────────────────────────────────┐
│                    图像生成流程                              │
├─────────────────────────────────────────────────────────────┤
│                                                            │
│   文本提示词                                                │
│   "一只橙色的猫坐在窗台上，阳光照射，水彩画风格"            │
│          │                                                  │
│          ▼                                                  │
│   ┌─────────────────────────────────────┐                  │
│   │         图像生成模型                  │                  │
│   │    (DALL-E / Stable Diffusion)      │                  │
│   └─────────────────────────────────────┘                  │
│          │                                                  │
│          ▼                                                  │
│   生成的图像                                                │
│   [PNG/JPEG/WEBP]                                          │
│                                                            │
└─────────────────────────────────────────────────────────────┘

支持的模型

模型	提供商	特点
DALL-E 3	OpenAI	高质量，理解能力强
DALL-E 2	OpenAI	速度快，可编辑
Stable Diffusion	Ollama/本地	开源，可自定义
Flux	Ollama	高质量开源模型

ImageModel 接口

Spring AI 提供统一的 ImageModel 接口：

public interface ImageModel extends Model<ImagePrompt, ImageResponse> {
    
    // 生成图像
    ImageResponse call(ImagePrompt prompt);
    
    // 获取图像响应
    default ImageResponse generate(String prompt) {
        return call(new ImagePrompt(prompt));
    }
}

配置

OpenAI DALL-E

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      image:
        options:
          model: dall-e-3  # 或 dall-e-2
          quality: standard  # standard 或 hd
          size: 1024x1024  # 图像尺寸
          style: vivid  # vivid 或 natural
          response-format: url  # url 或 b64_json

Ollama 本地模型

spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      image:
        model: stable-diffusion  # 或 flux

# 拉取模型
ollama pull stable-diffusion

基本使用

简单图像生成

@Service
public class ImageGenerationService {

    @Autowired
    private ImageModel imageModel;

    /**
     * 生成图像
     */
    public String generateImage(String prompt) {
        ImageResponse response = imageModel.call(
            new ImagePrompt(prompt)
        );
        
        // 返回图像 URL
        return response.getResult().getOutput().getUrl();
    }
    
    /**
     * 生成图像并获取字节数据
     */
    public byte[] generateImageBytes(String prompt) {
        ImageResponse response = imageModel.call(
            new ImagePrompt(prompt)
        );
        
        // 获取 base64 编码的图像数据
        String b64 = response.getResult().getOutput().getB64Json();
        return Base64.getDecoder().decode(b64);
    }
}

使用 ChatClient 生成图像

@GetMapping("/generate")
public String generate(@RequestParam String prompt) {
    return imageModel.call(new ImagePrompt(prompt))
        .getResult()
        .getOutput()
        .getUrl();
}

图像选项配置

OpenAI 图像选项

@GetMapping("/generate-options")
public String generateWithOptions(@RequestParam String prompt) {
    ImageOptions options = OpenAiImageOptionsBuilder.builder()
        .withModel("dall-e-3")
        .withQuality("hd")           // standard 或 hd
        .withSize("1024x1024")       // 256x256, 512x512, 1024x1024
        .withStyle("vivid")          // vivid 或 natural
        .withN(1)                    // 生成数量
        .withResponseFormat("url")   // url 或 b64_json
        .build();
    
    ImagePrompt imagePrompt = new ImagePrompt(prompt, options);
    ImageResponse response = imageModel.call(imagePrompt);
    
    return response.getResult().getOutput().getUrl();
}

选项参数说明

参数	说明	可选值
`model`	模型名称	dall-e-2, dall-e-3
`quality`	图像质量	standard, hd
`size`	图像尺寸	256x256, 512x512, 1024x1024, 1792x1024, 1024x1792
`style`	图像风格	vivid(鲜艳), natural(自然)
`n`	生成数量	1-10 (DALL-E 2)
`responseFormat`	响应格式	url, b64_json

尺寸对比

// DALL-E 3 支持的尺寸
// - 1024x1024 (正方形)
// - 1792x1024 (横向)
// - 1024x1792 (纵向)

// DALL-E 2 支持的尺寸
// - 256x256
// - 512x512
// - 1024x1024

实际应用示例

1. 图像生成服务

@Service
public class CreativeImageService {

    @Autowired
    private ImageModel imageModel;

    /**
     * 生成艺术风格图像
     */
    public GeneratedImage generateArtwork(String subject, String style) {
        String prompt = String.format(
            "一幅%s风格的%s艺术品，高质量，细节丰富",
            style, subject
        );
        
        ImageOptions options = OpenAiImageOptionsBuilder.builder()
            .withModel("dall-e-3")
            .withQuality("hd")
            .withSize("1024x1024")
            .withStyle("vivid")
            .build();
        
        ImageResponse response = imageModel.call(
            new ImagePrompt(prompt, options)
        );
        
        Image result = response.getResult().getOutput();
        
        return new GeneratedImage(
            result.getUrl(),
            prompt,
            LocalDateTime.now()
        );
    }

    /**
     * 生成 Logo 概念
     */
    public List<String> generateLogoConcepts(String companyType, int count) {
        String prompt = String.format(
            "为%s公司设计一个简洁现代的logo，极简风格，适合商业使用",
            companyType
        );
        
        // DALL-E 3 每次只生成一张图，需要多次调用
        return IntStream.range(0, count)
            .mapToObj(i -> {
                ImageResponse response = imageModel.call(
                    new ImagePrompt(prompt + ", 变体" + (i + 1))
                );
                return response.getResult().getOutput().getUrl();
            })
            .toList();
    }

    record GeneratedImage(String url, String prompt, LocalDateTime createdAt) {}
}

2. 图片变体生成

@Service
public class ImageVariationService {

    @Autowired
    private ImageModel imageModel;

    /**
     * 基于提示词生成多个变体
     */
    public List<String> generateVariations(String basePrompt, int count) {
        List<String> variations = new ArrayList<>();
        
        String[] styleModifiers = {
            "，水彩画风格",
            "，油画风格",
            "，数字艺术风格",
            "，素描风格",
            "，卡通风格"
        };
        
        for (int i = 0; i < Math.min(count, styleModifiers.length); i++) {
            String prompt = basePrompt + styleModifiers[i];
            
            ImageResponse response = imageModel.call(
                new ImagePrompt(prompt)
            );
            
            variations.add(response.getResult().getOutput().getUrl());
        }
        
        return variations;
    }
}

3. 图像编辑（DALL-E 2）

@Service
public class ImageEditService {

    /**
     * 编辑图像（需要 DALL-E 2）
     * 注意：DALL-E 3 不支持编辑功能
     */
    public String editImage(byte[] imageData, byte[] maskData, String editPrompt) {
        // OpenAI 的编辑 API 需要通过 HTTP 直接调用
        // Spring AI 目前主要支持生成功能
        
        // 可以使用 OpenAI API 直接调用编辑端点
        return "DALL-E 3 不支持编辑，请使用 DALL-E 2";
    }
}

4. 批量生成

@Service
public class BatchImageService {

    @Autowired
    private ImageModel imageModel;

    @Async
    public CompletableFuture<List<GeneratedImage>> batchGenerate(
            List<String> prompts) {
        
        List<GeneratedImage> results = prompts.stream()
            .map(prompt -> {
                ImageResponse response = imageModel.call(
                    new ImagePrompt(prompt)
                );
                
                Image result = response.getResult().getOutput();
                
                return new GeneratedImage(
                    result.getUrl(),
                    prompt,
                    LocalDateTime.now()
                );
            })
            .toList();
        
        return CompletableFuture.completedFuture(results);
    }

    /**
     * 批量生成并保存到文件
     */
    public List<Path> generateAndSave(List<String> prompts, 
                                       String outputDir) throws IOException {
        Path dir = Paths.get(outputDir);
        Files.createDirectories(dir);
        
        List<Path> savedPaths = new ArrayList<>();
        
        for (int i = 0; i < prompts.size(); i++) {
            String prompt = prompts.get(i);
            
            ImageOptions options = OpenAiImageOptionsBuilder.builder()
                .withResponseFormat("b64_json")
                .build();
            
            ImageResponse response = imageModel.call(
                new ImagePrompt(prompt, options)
            );
            
            String b64 = response.getResult().getOutput().getB64Json();
            byte[] imageData = Base64.getDecoder().decode(b64);
            
            Path imagePath = dir.resolve("image_" + i + ".png");
            Files.write(imagePath, imageData);
            savedPaths.add(imagePath);
        }
        
        return savedPaths;
    }

    record GeneratedImage(String url, String prompt, LocalDateTime createdAt) {}
}

5. 图像生成 API

@RestController
@RequestMapping("/api/images")
public class ImageController {

    @Autowired
    private ImageModel imageModel;

    /**
     * 简单生成
     */
    @PostMapping("/generate")
    public ImageResponse generateImage(@RequestBody ImageRequest request) {
        ImageOptions options = OpenAiImageOptionsBuilder.builder()
            .withModel(request.model() != null ? request.model() : "dall-e-3")
            .withQuality(request.quality() != null ? request.quality() : "standard")
            .withSize(request.size() != null ? request.size() : "1024x1024")
            .withStyle(request.style() != null ? request.style() : "vivid")
            .build();
        
        return imageModel.call(new ImagePrompt(request.prompt(), options));
    }

    /**
     * 生成并下载
     */
    @PostMapping("/generate-download")
    public ResponseEntity<byte[]> generateAndDownload(
            @RequestBody ImageRequest request) {
        
        ImageOptions options = OpenAiImageOptionsBuilder.builder()
            .withResponseFormat("b64_json")
            .build();
        
        ImageResponse response = imageModel.call(
            new ImagePrompt(request.prompt(), options)
        );
        
        String b64 = response.getResult().getOutput().getB64Json();
        byte[] imageData = Base64.getDecoder().decode(b64);
        
        return ResponseEntity.ok()
            .header(HttpHeaders.CONTENT_DISPOSITION, 
                   "attachment; filename=generated-image.png")
            .contentType(MediaType.IMAGE_PNG)
            .body(imageData);
    }

    /**
     * 批量生成
     */
    @PostMapping("/batch")
    public List<ImageResult> batchGenerate(
            @RequestBody List<String> prompts) {
        
        return prompts.stream()
            .map(prompt -> {
                ImageResponse response = imageModel.call(
                    new ImagePrompt(prompt)
                );
                
                return new ImageResult(
                    prompt,
                    response.getResult().getOutput().getUrl()
                );
            })
            .toList();
    }
}

record ImageRequest(
    String prompt,
    String model,
    String quality,
    String size,
    String style
) {}

record ImageResult(String prompt, String imageUrl) {}

提示词工程

高质量提示词技巧

@Service
public class PromptEnhancementService {

    /**
     * 增强提示词
     */
    public String enhancePrompt(String userPrompt, ImageStyle style) {
        StringBuilder enhanced = new StringBuilder(userPrompt);
        
        // 添加风格修饰
        enhanced.append(", ").append(style.getDescription());
        
        // 添加质量修饰
        enhanced.append(", high quality, detailed");
        
        return enhanced.toString();
    }

    /**
     * 根据场景生成提示词
     */
    public String generateScenePrompt(SceneRequest request) {
        return String.format(
            "%s, %s lighting, %s atmosphere, %s composition, %s style, %s quality",
            request.subject(),
            request.lighting(),
            request.mood(),
            request.composition(),
            request.style(),
            request.quality()
        );
    }
}

enum ImageStyle {
    PHOTOGRAPH("professional photography, realistic"),
    DIGITAL_ART("digital art, vibrant colors"),
    OIL_PAINTING("oil painting, classical art style"),
    WATERCOLOR("watercolor painting, soft colors"),
    SKETCH("pencil sketch, hand-drawn"),
    ANIME("anime style, manga art"),
    MINIMALIST("minimalist design, clean lines"),
    VINTAGE("vintage poster, retro style");

    private final String description;

    ImageStyle(String description) {
        this.description = description;
    }

    public String getDescription() {
        return description;
    }
}

record SceneRequest(
    String subject,
    String lighting,
    String mood,
    String composition,
    String style,
    String quality
) {}

费用和限制

OpenAI DALL-E 定价

模型	质量	尺寸	价格/张
DALL-E 3	Standard	1024×1024	$0.040
DALL-E 3	Standard	1024×1792	$0.080
DALL-E 3	HD	1024×1024	$0.080
DALL-E 3	HD	1024×1792	$0.120
DALL-E 2	Standard	1024×1024	$0.020

速率限制

@Service
public class RateLimitedImageService {

    @Autowired
    private ImageModel imageModel;

    private final RateLimiter rateLimiter = RateLimiter.create(5.0); // 5张/分钟

    public String generateWithRateLimit(String prompt) {
        rateLimiter.acquire();
        return imageModel.call(new ImagePrompt(prompt))
            .getResult()
            .getOutput()
            .getUrl();
    }
}

最佳实践

1. 缓存生成结果

@Service
public class CachedImageService {

    @Autowired
    private ImageModel imageModel;

    private final Cache<String, String> imageCache = 
        Caffeine.newBuilder()
            .maximumSize(1000)
            .expireAfterAccess(Duration.ofDays(7))
            .build();

    public String generateImage(String prompt) {
        String cacheKey = DigestUtils.md5Hex(prompt);
        
        return imageCache.get(cacheKey, key -> {
            ImageResponse response = imageModel.call(new ImagePrompt(prompt));
            return response.getResult().getOutput().getUrl();
        });
    }
}

2. 异步生成

@Service
public class AsyncImageService {

    @Autowired
    private ImageModel imageModel;

    @Async
    public CompletableFuture<String> generateAsync(String prompt) {
        ImageResponse response = imageModel.call(new ImagePrompt(prompt));
        return CompletableFuture.completedFuture(
            response.getResult().getOutput().getUrl()
        );
    }
}

3. 错误处理

@Service
public class ResilientImageService {

    @Autowired
    private ImageModel imageModel;

    @Retryable(
        value = {ApiException.class},
        maxAttempts = 3,
        backoff = @Backoff(delay = 1000, multiplier = 2)
    )
    public String generateWithRetry(String prompt) {
        try {
            ImageResponse response = imageModel.call(new ImagePrompt(prompt));
            return response.getResult().getOutput().getUrl();
        } catch (ApiException e) {
            log.error("Image generation failed: {}", e.getMessage());
            throw e;
        }
    }

    @Recover
    public String recoverFromFailure(Exception e, String prompt) {
        return "Image generation temporarily unavailable";
    }
}

小结

本章我们学习了：

图像生成概念：文本到图像的转换
支持的模型：DALL-E、Stable Diffusion 等
基本使用：生成图像、配置选项
实际应用：艺术创作、Logo 设计、批量生成
提示词工程：高质量提示词技巧
最佳实践：缓存、异步、错误处理

练习

创建一个艺术图片生成服务
实现批量图片生成并保存功能
设计一个提示词增强系统

概述​

支持的模型​

ImageModel 接口​

配置​

OpenAI DALL-E​

Ollama 本地模型​

基本使用​

简单图像生成​

使用 ChatClient 生成图像​

图像选项配置​

OpenAI 图像选项​

选项参数说明​

尺寸对比​

实际应用示例​

1. 图像生成服务​

2. 图片变体生成​

3. 图像编辑（DALL-E 2）​

4. 批量生成​

5. 图像生成 API​

提示词工程​

高质量提示词技巧​

费用和限制​

OpenAI DALL-E 定价​

速率限制​

最佳实践​

1. 缓存生成结果​

2. 异步生成​

3. 错误处理​

小结​

练习​

参考资源​

概述

支持的模型

ImageModel 接口

配置

OpenAI DALL-E

Ollama 本地模型

基本使用

简单图像生成

使用 ChatClient 生成图像

图像选项配置

OpenAI 图像选项

选项参数说明

尺寸对比

实际应用示例

1. 图像生成服务

2. 图片变体生成

3. 图像编辑（DALL-E 2）

4. 批量生成

5. 图像生成 API

提示词工程

高质量提示词技巧

费用和限制

OpenAI DALL-E 定价

速率限制

最佳实践

1. 缓存生成结果

2. 异步生成

3. 错误处理

小结

练习

参考资源