跳到主要内容

图像生成

图像生成是 AI 的重要应用领域,可以根据文本描述创建图片。本章介绍 Spring AI 中的图像生成功能。

概述

图像生成模型(Text-to-Image)可以将文字描述转换为图像:

┌─────────────────────────────────────────────────────────────┐
│ 图像生成流程 │
├─────────────────────────────────────────────────────────────┤
│ │
│ 文本提示词 │
│ "一只橙色的猫坐在窗台上,阳光照射,水彩画风格" │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ 图像生成模型 │ │
│ │ (DALL-E / Stable Diffusion) │ │
│ └─────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 生成的图像 │
│ [PNG/JPEG/WEBP] │
│ │
└─────────────────────────────────────────────────────────────┘

支持的模型

模型提供商特点
DALL-E 3OpenAI高质量,理解能力强
DALL-E 2OpenAI速度快,可编辑
Stable DiffusionOllama/本地开源,可自定义
FluxOllama高质量开源模型

ImageModel 接口

Spring AI 提供统一的 ImageModel 接口:

public interface ImageModel extends Model<ImagePrompt, ImageResponse> {

// 生成图像
ImageResponse call(ImagePrompt prompt);

// 获取图像响应
default ImageResponse generate(String prompt) {
return call(new ImagePrompt(prompt));
}
}

配置

OpenAI DALL-E

<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
image:
options:
model: dall-e-3 # 或 dall-e-2
quality: standard # standard 或 hd
size: 1024x1024 # 图像尺寸
style: vivid # vivid 或 natural
response-format: url # url 或 b64_json

Ollama 本地模型

spring:
ai:
ollama:
base-url: http://localhost:11434
image:
model: stable-diffusion # 或 flux
# 拉取模型
ollama pull stable-diffusion

基本使用

简单图像生成

@Service
public class ImageGenerationService {

@Autowired
private ImageModel imageModel;

/**
* 生成图像
*/
public String generateImage(String prompt) {
ImageResponse response = imageModel.call(
new ImagePrompt(prompt)
);

// 返回图像 URL
return response.getResult().getOutput().getUrl();
}

/**
* 生成图像并获取字节数据
*/
public byte[] generateImageBytes(String prompt) {
ImageResponse response = imageModel.call(
new ImagePrompt(prompt)
);

// 获取 base64 编码的图像数据
String b64 = response.getResult().getOutput().getB64Json();
return Base64.getDecoder().decode(b64);
}
}

使用 ChatClient 生成图像

@GetMapping("/generate")
public String generate(@RequestParam String prompt) {
return imageModel.call(new ImagePrompt(prompt))
.getResult()
.getOutput()
.getUrl();
}

图像选项配置

OpenAI 图像选项

@GetMapping("/generate-options")
public String generateWithOptions(@RequestParam String prompt) {
ImageOptions options = OpenAiImageOptionsBuilder.builder()
.withModel("dall-e-3")
.withQuality("hd") // standard 或 hd
.withSize("1024x1024") // 256x256, 512x512, 1024x1024
.withStyle("vivid") // vivid 或 natural
.withN(1) // 生成数量
.withResponseFormat("url") // url 或 b64_json
.build();

ImagePrompt imagePrompt = new ImagePrompt(prompt, options);
ImageResponse response = imageModel.call(imagePrompt);

return response.getResult().getOutput().getUrl();
}

选项参数说明

参数说明可选值
model模型名称dall-e-2, dall-e-3
quality图像质量standard, hd
size图像尺寸256x256, 512x512, 1024x1024, 1792x1024, 1024x1792
style图像风格vivid(鲜艳), natural(自然)
n生成数量1-10 (DALL-E 2)
responseFormat响应格式url, b64_json

尺寸对比

// DALL-E 3 支持的尺寸
// - 1024x1024 (正方形)
// - 1792x1024 (横向)
// - 1024x1792 (纵向)

// DALL-E 2 支持的尺寸
// - 256x256
// - 512x512
// - 1024x1024

实际应用示例

1. 图像生成服务

@Service
public class CreativeImageService {

@Autowired
private ImageModel imageModel;

/**
* 生成艺术风格图像
*/
public GeneratedImage generateArtwork(String subject, String style) {
String prompt = String.format(
"一幅%s风格的%s艺术品,高质量,细节丰富",
style, subject
);

ImageOptions options = OpenAiImageOptionsBuilder.builder()
.withModel("dall-e-3")
.withQuality("hd")
.withSize("1024x1024")
.withStyle("vivid")
.build();

ImageResponse response = imageModel.call(
new ImagePrompt(prompt, options)
);

Image result = response.getResult().getOutput();

return new GeneratedImage(
result.getUrl(),
prompt,
LocalDateTime.now()
);
}

/**
* 生成 Logo 概念
*/
public List<String> generateLogoConcepts(String companyType, int count) {
String prompt = String.format(
"为%s公司设计一个简洁现代的logo,极简风格,适合商业使用",
companyType
);

// DALL-E 3 每次只生成一张图,需要多次调用
return IntStream.range(0, count)
.mapToObj(i -> {
ImageResponse response = imageModel.call(
new ImagePrompt(prompt + ", 变体" + (i + 1))
);
return response.getResult().getOutput().getUrl();
})
.toList();
}

record GeneratedImage(String url, String prompt, LocalDateTime createdAt) {}
}

2. 图片变体生成

@Service
public class ImageVariationService {

@Autowired
private ImageModel imageModel;

/**
* 基于提示词生成多个变体
*/
public List<String> generateVariations(String basePrompt, int count) {
List<String> variations = new ArrayList<>();

String[] styleModifiers = {
",水彩画风格",
",油画风格",
",数字艺术风格",
",素描风格",
",卡通风格"
};

for (int i = 0; i < Math.min(count, styleModifiers.length); i++) {
String prompt = basePrompt + styleModifiers[i];

ImageResponse response = imageModel.call(
new ImagePrompt(prompt)
);

variations.add(response.getResult().getOutput().getUrl());
}

return variations;
}
}

3. 图像编辑(DALL-E 2)

@Service
public class ImageEditService {

/**
* 编辑图像(需要 DALL-E 2)
* 注意:DALL-E 3 不支持编辑功能
*/
public String editImage(byte[] imageData, byte[] maskData, String editPrompt) {
// OpenAI 的编辑 API 需要通过 HTTP 直接调用
// Spring AI 目前主要支持生成功能

// 可以使用 OpenAI API 直接调用编辑端点
return "DALL-E 3 不支持编辑,请使用 DALL-E 2";
}
}

4. 批量生成

@Service
public class BatchImageService {

@Autowired
private ImageModel imageModel;

@Async
public CompletableFuture<List<GeneratedImage>> batchGenerate(
List<String> prompts) {

List<GeneratedImage> results = prompts.stream()
.map(prompt -> {
ImageResponse response = imageModel.call(
new ImagePrompt(prompt)
);

Image result = response.getResult().getOutput();

return new GeneratedImage(
result.getUrl(),
prompt,
LocalDateTime.now()
);
})
.toList();

return CompletableFuture.completedFuture(results);
}

/**
* 批量生成并保存到文件
*/
public List<Path> generateAndSave(List<String> prompts,
String outputDir) throws IOException {
Path dir = Paths.get(outputDir);
Files.createDirectories(dir);

List<Path> savedPaths = new ArrayList<>();

for (int i = 0; i < prompts.size(); i++) {
String prompt = prompts.get(i);

ImageOptions options = OpenAiImageOptionsBuilder.builder()
.withResponseFormat("b64_json")
.build();

ImageResponse response = imageModel.call(
new ImagePrompt(prompt, options)
);

String b64 = response.getResult().getOutput().getB64Json();
byte[] imageData = Base64.getDecoder().decode(b64);

Path imagePath = dir.resolve("image_" + i + ".png");
Files.write(imagePath, imageData);
savedPaths.add(imagePath);
}

return savedPaths;
}

record GeneratedImage(String url, String prompt, LocalDateTime createdAt) {}
}

5. 图像生成 API

@RestController
@RequestMapping("/api/images")
public class ImageController {

@Autowired
private ImageModel imageModel;

/**
* 简单生成
*/
@PostMapping("/generate")
public ImageResponse generateImage(@RequestBody ImageRequest request) {
ImageOptions options = OpenAiImageOptionsBuilder.builder()
.withModel(request.model() != null ? request.model() : "dall-e-3")
.withQuality(request.quality() != null ? request.quality() : "standard")
.withSize(request.size() != null ? request.size() : "1024x1024")
.withStyle(request.style() != null ? request.style() : "vivid")
.build();

return imageModel.call(new ImagePrompt(request.prompt(), options));
}

/**
* 生成并下载
*/
@PostMapping("/generate-download")
public ResponseEntity<byte[]> generateAndDownload(
@RequestBody ImageRequest request) {

ImageOptions options = OpenAiImageOptionsBuilder.builder()
.withResponseFormat("b64_json")
.build();

ImageResponse response = imageModel.call(
new ImagePrompt(request.prompt(), options)
);

String b64 = response.getResult().getOutput().getB64Json();
byte[] imageData = Base64.getDecoder().decode(b64);

return ResponseEntity.ok()
.header(HttpHeaders.CONTENT_DISPOSITION,
"attachment; filename=generated-image.png")
.contentType(MediaType.IMAGE_PNG)
.body(imageData);
}

/**
* 批量生成
*/
@PostMapping("/batch")
public List<ImageResult> batchGenerate(
@RequestBody List<String> prompts) {

return prompts.stream()
.map(prompt -> {
ImageResponse response = imageModel.call(
new ImagePrompt(prompt)
);

return new ImageResult(
prompt,
response.getResult().getOutput().getUrl()
);
})
.toList();
}
}

record ImageRequest(
String prompt,
String model,
String quality,
String size,
String style
) {}

record ImageResult(String prompt, String imageUrl) {}

提示词工程

高质量提示词技巧

@Service
public class PromptEnhancementService {

/**
* 增强提示词
*/
public String enhancePrompt(String userPrompt, ImageStyle style) {
StringBuilder enhanced = new StringBuilder(userPrompt);

// 添加风格修饰
enhanced.append(", ").append(style.getDescription());

// 添加质量修饰
enhanced.append(", high quality, detailed");

return enhanced.toString();
}

/**
* 根据场景生成提示词
*/
public String generateScenePrompt(SceneRequest request) {
return String.format(
"%s, %s lighting, %s atmosphere, %s composition, %s style, %s quality",
request.subject(),
request.lighting(),
request.mood(),
request.composition(),
request.style(),
request.quality()
);
}
}

enum ImageStyle {
PHOTOGRAPH("professional photography, realistic"),
DIGITAL_ART("digital art, vibrant colors"),
OIL_PAINTING("oil painting, classical art style"),
WATERCOLOR("watercolor painting, soft colors"),
SKETCH("pencil sketch, hand-drawn"),
ANIME("anime style, manga art"),
MINIMALIST("minimalist design, clean lines"),
VINTAGE("vintage poster, retro style");

private final String description;

ImageStyle(String description) {
this.description = description;
}

public String getDescription() {
return description;
}
}

record SceneRequest(
String subject,
String lighting,
String mood,
String composition,
String style,
String quality
) {}

费用和限制

OpenAI DALL-E 定价

模型质量尺寸价格/张
DALL-E 3Standard1024×1024$0.040
DALL-E 3Standard1024×1792$0.080
DALL-E 3HD1024×1024$0.080
DALL-E 3HD1024×1792$0.120
DALL-E 2Standard1024×1024$0.020

速率限制

@Service
public class RateLimitedImageService {

@Autowired
private ImageModel imageModel;

private final RateLimiter rateLimiter = RateLimiter.create(5.0); // 5张/分钟

public String generateWithRateLimit(String prompt) {
rateLimiter.acquire();
return imageModel.call(new ImagePrompt(prompt))
.getResult()
.getOutput()
.getUrl();
}
}

最佳实践

1. 缓存生成结果

@Service
public class CachedImageService {

@Autowired
private ImageModel imageModel;

private final Cache<String, String> imageCache =
Caffeine.newBuilder()
.maximumSize(1000)
.expireAfterAccess(Duration.ofDays(7))
.build();

public String generateImage(String prompt) {
String cacheKey = DigestUtils.md5Hex(prompt);

return imageCache.get(cacheKey, key -> {
ImageResponse response = imageModel.call(new ImagePrompt(prompt));
return response.getResult().getOutput().getUrl();
});
}
}

2. 异步生成

@Service
public class AsyncImageService {

@Autowired
private ImageModel imageModel;

@Async
public CompletableFuture<String> generateAsync(String prompt) {
ImageResponse response = imageModel.call(new ImagePrompt(prompt));
return CompletableFuture.completedFuture(
response.getResult().getOutput().getUrl()
);
}
}

3. 错误处理

@Service
public class ResilientImageService {

@Autowired
private ImageModel imageModel;

@Retryable(
value = {ApiException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 1000, multiplier = 2)
)
public String generateWithRetry(String prompt) {
try {
ImageResponse response = imageModel.call(new ImagePrompt(prompt));
return response.getResult().getOutput().getUrl();
} catch (ApiException e) {
log.error("Image generation failed: {}", e.getMessage());
throw e;
}
}

@Recover
public String recoverFromFailure(Exception e, String prompt) {
return "Image generation temporarily unavailable";
}
}

小结

本章我们学习了:

  1. 图像生成概念:文本到图像的转换
  2. 支持的模型:DALL-E、Stable Diffusion 等
  3. 基本使用:生成图像、配置选项
  4. 实际应用:艺术创作、Logo 设计、批量生成
  5. 提示词工程:高质量提示词技巧
  6. 最佳实践:缓存、异步、错误处理

练习

  1. 创建一个艺术图片生成服务
  2. 实现批量图片生成并保存功能
  3. 设计一个提示词增强系统

参考资源