跳到主要内容

状态管理

Terraform 使用状态(State)来跟踪管理的基础设施资源。理解状态的工作原理对于正确使用 Terraform 至关重要。本章将详细介绍状态的概念、存储方式和最佳实践。

什么是 Terraform 状态

Terraform 状态是一个 JSON 格式的文件,记录了 Terraform 管理的所有资源及其属性的当前状态。

状态文件的作用

  1. 资源映射:将配置中的资源名称映射到实际基础设施的 ID
  2. 元数据存储:存储资源依赖关系、Provider 配置等元数据
  3. 性能优化:缓存资源属性,避免频繁查询 API
  4. 变更检测:比较当前状态与期望状态,确定需要执行的变更

本地状态文件

默认情况下,Terraform 将状态存储在本地 terraform.tfstate 文件中:

{
"version": 4,
"terraform_version": "1.6.0",
"serial": 15,
"lineage": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"outputs": {
"instance_ip": {
"value": "54.123.45.67",
"type": "string"
}
},
"resources": [
{
"mode": "managed",
"type": "aws_instance",
"name": "web",
"provider": "provider[\"registry.terraform.io/hashicorp/aws\"]",
"instances": [
{
"schema_version": 1,
"attributes": {
"ami": "ami-0c55b159cbfafe1f0",
"instance_type": "t2.micro",
"id": "i-0123456789abcdef0",
"public_ip": "54.123.45.67",
"tags": {
"Name": "WebServer"
}
}
}
]
}
]
}

重要提示:状态文件可能包含敏感信息(如密码、密钥),不应提交到版本控制。

远程状态存储

在团队协作中,需要使用远程状态存储来共享状态文件。

S3 后端(推荐)

terraform {
backend "s3" {
# S3 存储桶配置
bucket = "my-terraform-state-bucket"
key = "prod/terraform.tfstate"
region = "us-east-1"

# 启用状态加密
encrypt = true

# DynamoDB 表用于状态锁定
dynamodb_table = "terraform-state-lock"

# 可选:使用 KMS 密钥加密
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/my-key"
}
}

创建 S3 后端基础设施

# backend-setup.tf
# 用于创建状态存储基础设施

# S3 存储桶
resource "aws_s3_bucket" "terraform_state" {
bucket = "my-terraform-state-bucket"
}

# 启用版本控制
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id

versioning_configuration {
status = "Enabled"
}
}

# 启用加密
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id

rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}

# 阻止公共访问
resource "aws_s3_bucket_public_access_block" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id

block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}

# DynamoDB 表用于状态锁定
resource "aws_dynamodb_table" "terraform_lock" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"

attribute {
name = "LockID"
type = "S"
}
}

Azure Blob 存储后端

terraform {
backend "azurerm" {
resource_group_name = "terraform-state-rg"
storage_account_name = "terraformstate"
container_name = "tfstate"
key = "prod.terraform.tfstate"

# 启用加密
use_msi = true
}
}

GCS 后端(Google Cloud)

terraform {
backend "gcs" {
bucket = "my-terraform-state-bucket"
prefix = "prod"

# 启用加密
encryption_key = "base64-encoded-key"
}
}

Terraform Cloud 后端

terraform {
cloud {
organization = "my-organization"

workspaces {
name = "my-workspace"
}
}
}

HTTP 后端

terraform {
backend "http" {
address = "https://my-terraform-backend.example.com/state"
lock_address = "https://my-terraform-backend.example.com/state/lock"
unlock_address = "https://my-terraform-backend.example.com/state/unlock"
}
}

状态锁定

状态锁定防止多个用户或进程同时修改状态,避免状态损坏。

锁定机制

当运行 terraform apply 时:

  1. Terraform 尝试获取状态锁
  2. 如果锁已被占用,操作会等待或失败
  3. 操作完成后,锁被释放

S3 + DynamoDB 锁定示例

DynamoDB 表自动处理锁定:

terraform {
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}

强制解锁

如果锁定异常(如进程崩溃),可以强制解锁:

# 查看锁定信息
terraform force-unlock -force <LOCK_ID>

警告:强制解锁可能导致状态损坏,仅在确认无其他进程运行时使用。

状态命令

terraform state list

列出所有管理的资源:

# 列出所有资源
terraform state list

# 输出示例:
# aws_instance.web
# aws_security_group.web
# aws_vpc.main

# 使用通配符过滤
terraform state list 'aws_instance.*'

terraform state show

显示特定资源的详细状态:

terraform state show aws_instance.web

# 输出示例:
# resource "aws_instance" "web" {
# ami = "ami-0c55b159cbfafe1f0"
# instance_type = "t2.micro"
# id = "i-0123456789abcdef0"
# public_ip = "54.123.45.67"
# tags = {
# "Name" = "WebServer"
# }
# }

terraform state pull

输出当前状态的原始 JSON:

terraform state pull > current-state.json

terraform state push

手动上传状态文件(谨慎使用):

terraform state push current-state.json

terraform state rm

从状态中移除资源(不销毁实际资源):

# 移除单个资源
terraform state rm aws_instance.web

# 批量移除
terraform state rm 'aws_instance.web[0]'

使用场景:

  • 将资源转移到其他 Terraform 配置
  • 停止管理特定资源
  • 修复状态损坏

terraform state mv

移动或重命名状态中的资源:

# 重命名资源
terraform state mv aws_instance.web aws_instance.frontend

# 移动模块中的资源
terraform state mv module.vpc.aws_subnet.public module.network.aws_subnet.public

terraform state replace-provider

替换 Provider:

terraform state replace-provider \
registry.terraform.io/-/aws \
registry.terraform.io/hashicorp/aws

terraform import

将现有资源导入 Terraform 管理:

# 导入 AWS EC2 实例
terraform import aws_instance.web i-0123456789abcdef0

# 导入 AWS VPC
terraform import aws_vpc.main vpc-0123456789abcdef0

# 导入多个资源(使用 for_each)
terraform import 'aws_instance.web["a"]' i-0123456789abcdef0

导入步骤:

  1. 在配置中定义资源(不包含所有属性)
  2. 运行 terraform import 导入资源
  3. 运行 terraform plan 检查配置是否匹配
  4. 调整配置直到 plan 显示无变更

状态工作区

工作区允许在同一配置下管理多个独立的状态。

工作区命令

# 列出工作区
terraform workspace list

# 创建新工作区
terraform workspace new dev

# 选择工作区
terraform workspace select dev

# 显示当前工作区
terraform workspace show

# 删除工作区
terraform workspace delete dev

工作区使用场景

# 根据工作区设置不同配置
locals {
environment = terraform.workspace

config = {
dev = {
instance_type = "t2.micro"
instance_count = 1
}
staging = {
instance_type = "t2.small"
instance_count = 2
}
prod = {
instance_type = "t2.medium"
instance_count = 3
}
}
}

resource "aws_instance" "web" {
count = local.config[local.environment].instance_count

ami = "ami-0c55b159cbfafe1f0"
instance_type = local.config[local.environment].instance_type

tags = {
Name = "web-${local.environment}-${count.index + 1}"
}
}

工作区 vs 文件结构

方式适用场景优点缺点
工作区环境差异小配置复用,简单权限控制困难
文件结构环境差异大权限隔离,清晰配置重复

推荐的项目结构(文件结构方式):

terraform-project/
├── modules/
│ └── vpc/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── backend.tf
│ ├── staging/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── backend.tf
│ └── prod/
│ ├── main.tf
│ ├── variables.tf
│ └── backend.tf
└── global/
└── iam/

状态安全

敏感数据处理

状态文件可能包含敏感信息:

# 标记敏感输出
output "db_password" {
value = aws_db_instance.main.password
sensitive = true
}

# 使用敏感变量
variable "api_key" {
type = string
sensitive = true
}

状态加密

S3 后端加密

terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "terraform.tfstate"
region = "us-east-1"
encrypt = true
kms_key_id = "arn:aws:kms:us-east-1:123456789012:alias/terraform"
}
}

本地状态加密(使用 Terraform Cloud):

terraform {
cloud {
organization = "my-org"
workspaces {
name = "my-workspace"
}
}
}

状态备份

S3 版本控制自动备份状态:

resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id

versioning_configuration {
status = "Enabled"
}
}

访问控制

S3 存储桶策略

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:role/TerraformRole"
},
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-terraform-state-bucket/*"
}
]
}

状态最佳实践

1. 使用远程状态

始终使用远程状态存储(S3、Terraform Cloud 等),避免本地状态:

terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "project/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}

2. 状态分离

按环境或项目分离状态,避免单一状态文件过大:

# 不同环境使用不同的状态文件
environments/
├── dev/
│ └── backend.tf # key = "dev/terraform.tfstate"
├── staging/
│ └── backend.tf # key = "staging/terraform.tfstate"
└── prod/
└── backend.tf # key = "prod/terraform.tfstate"

3. 状态锁定

始终启用状态锁定,防止并发修改:

terraform {
backend "s3" {
# ...
dynamodb_table = "terraform-state-lock"
}
}

4. 版本控制

启用 S3 版本控制,保留状态历史:

resource "aws_s3_bucket_versioning" "state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}

5. 敏感信息处理

  • 使用 sensitive = true 标记敏感变量和输出
  • 使用远程状态加密
  • 定期审计状态文件访问日志

6. 状态文件管理

# 定期清理旧版本(保留最近 30 个版本)
aws s3api put-bucket-lifecycle-configuration \
--bucket my-terraform-state \
--lifecycle-configuration file://lifecycle.json

故障排除

状态损坏恢复

  1. 从备份恢复

    # 从 S3 下载历史版本
    aws s3 cp s3://my-bucket/terraform.tfstate.12345 terraform.tfstate
    terraform state push terraform.tfstate
  2. 手动修复

    # 导出状态
    terraform state pull > state.json
    # 编辑 state.json
    terraform state push state.json

锁定问题

# 查看锁定信息
terraform force-unlock -force <LOCK_ID>

# 在 DynamoDB 中手动删除锁定记录
aws dynamodb delete-item \
--table-name terraform-state-lock \
--key '{"LockID": {"S": "my-bucket/terraform.tfstate-md5"}}'

下一步

掌握了状态管理后,我们将学习 模块,了解如何创建和使用可重用的 Terraform 配置。