跳到主要内容

Provisioners 配置器

Provisioners(配置器)允许在资源创建后或销毁前执行额外的操作,例如上传文件、运行脚本或配置服务。然而,Provisioners 是 Terraform 中一种"最后手段"的功能,应该谨慎使用。

概述

什么是 Provisioner

Provisioner 是在资源生命周期特定阶段运行的代码块,用于执行无法通过 Provider 直接完成的操作。主要用途包括:

  • 在实例上执行配置脚本
  • 上传配置文件到远程服务器
  • 在本地执行命令
  • 在资源销毁前执行清理操作

为什么 Provisioner 是"最后手段"

官方文档明确指出 Provisioner 是"最后手段"(last resort),原因如下:

  1. 不可预测性:Terraform 无法完全模拟 Provisioner 的行为
  2. 状态问题:Provisioner 失败可能导致资源处于不一致状态
  3. 安全性:需要直接网络访问和凭证,增加安全风险
  4. 可维护性:难以调试和版本控制

推荐的替代方案

在使用 Provisioner 之前,应该优先考虑以下方案:

场景推荐方案
实例初始化配置使用 user_datacloud-init
预装软件使用 Packer 构建自定义镜像
配置管理使用 Ansible、Chef、Puppet 等工具
文件分发使用对象存储 + 启动脚本下载

Provisioner 类型

local-exec

在运行 Terraform 的本地机器上执行命令。

resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"

# 创建实例后在本地执行命令
provisioner "local-exec" {
command = "echo Instance ${self.public_ip} created >> instances.log"
}
}

常用场景

# 调用 API
provisioner "local-exec" {
command = "curl -X POST https://api.example.com/notify -d 'instance=${self.public_ip}'"
}

# 运行 Ansible
provisioner "local-exec" {
command = "ansible-playbook -i ${self.public_ip}, playbook.yml"
}

# 执行本地脚本
provisioner "local-exec" {
command = "${path.module}/scripts/setup.sh ${self.public_ip}"
}

环境变量

provisioner "local-exec" {
command = "echo $ENV_VAR $INSTANCE_IP"

environment = {
ENV_VAR = "value"
INSTANCE_IP = self.public_ip
}
}

# 在 Windows 上运行
provisioner "local-exec" {
command = "echo %INSTANCE_IP%"
when = create
working_dir = "C:/scripts"

environment = {
INSTANCE_IP = self.public_ip
}
}

remote-exec

在远程资源上通过 SSH 或 WinRM 执行命令。

resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
key_name = "my-key"

# 连接配置
connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/my-key.pem")
host = self.public_ip
}

# 在远程实例上执行命令
provisioner "remote-exec" {
inline = [
"sudo yum update -y",
"sudo yum install -y nginx",
"sudo systemctl start nginx",
]
}
}

内联命令与脚本

# 方式一:内联命令
provisioner "remote-exec" {
inline = [
"mkdir -p /opt/myapp",
"echo 'Hello World' > /opt/myapp/config.txt",
]
}

# 方式二:执行脚本文件
provisioner "remote-exec" {
script = "${path.module}/scripts/setup.sh"
}

# 方式三:带参数的脚本
provisioner "remote-exec" {
inline = [
"chmod +x /tmp/setup.sh",
"/tmp/setup.sh ${var.environment} ${var.region}",
]
}

file

将文件或目录从本地复制到远程资源。

resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
key_name = "my-key"

connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/my-key.pem")
host = self.public_ip
}

# 复制单个文件
provisioner "file" {
source = "${path.module}/files/app.conf"
destination = "/tmp/app.conf"
}

# 复制目录
provisioner "file" {
source = "${path.module}/files/config/"
destination = "/tmp/config/"
}

# 从内容创建文件
provisioner "file" {
content = "server {\n listen 80;\n}"
destination = "/tmp/nginx.conf"
}
}

完整示例:部署 Web 应用

resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
key_name = var.key_name

vpc_security_group_ids = [aws_security_group.web.id]
subnet_id = aws_subnet.public.id

connection {
type = "ssh"
user = "ec2-user"
private_key = file(var.private_key_path)
host = self.public_ip
}

# 步骤 1:上传配置文件
provisioner "file" {
source = "${path.module}/files/"
destination = "/tmp/app"
}

# 步骤 2:上传应用代码
provisioner "file" {
source = "${path.module}/app/"
destination = "/tmp/app/code"
}

# 步骤 3:执行部署脚本
provisioner "remote-exec" {
inline = [
"chmod +x /tmp/app/deploy.sh",
"sudo /tmp/app/deploy.sh ${var.environment}",
]
}

# 步骤 4:本地记录部署信息
provisioner "local-exec" {
command = "echo 'Deployed to ${self.public_ip}' >> deploy.log"
}
}

连接配置

connection 块定义如何连接到远程资源。

SSH 连接

connection {
type = "ssh"
user = "ec2-user" # SSH 用户名
password = var.password # 密码认证(可选)
private_key = file("~/.ssh/key.pem") # 密钥认证(推荐)
host = self.public_ip # 目标主机
port = 22 # SSH 端口(默认 22)
timeout = "5m" # 连接超时
}

常用云平台的 SSH 用户

平台默认用户
Amazon Linuxec2-user
Ubuntuubuntu
CentOScentos
Debianadmindebian
RHELec2-userroot
阿里云 Linuxroot

WinRM 连接(Windows)

connection {
type = "winrm"
user = "Administrator"
password = var.admin_password
host = self.public_ip
port = 5985 # HTTP 端口
insecure = true # 忽略证书验证
timeout = "10m"
}

# 使用 HTTPS
connection {
type = "winrm"
user = "Administrator"
password = var.admin_password
host = self.public_ip
port = 5986
https = true
insecure = true
}

Windows 实例需要配置 WinRM

# 在 user_data 中配置 WinRM
resource "aws_instance" "windows" {
ami = data.aws_ami.windows.id
instance_type = "t2.micro"

user_data = <<-EOF
<powershell>
Enable-PSRemoting -Force
Set-Item WSMan:\localhost\Service\Auth\Basic $true
Set-Item WSMan:\localhost\Service\AllowUnencrypted $true
winrm set winrm/config/service '@{AllowUnencrypted="true"}'
winrm set winrm/config/service/auth '@{Basic="true"}'
netsh advfirewall firewall add rule name="WinRM HTTP" protocol=TCP dir=in localport=5985 action=allow
</powershell>
EOF

connection {
type = "winrm"
user = "Administrator"
password = var.admin_password
host = self.public_ip
}
}

通过堡垒机连接

connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
host = self.private_ip # 目标主机的私有 IP

# 堡垒机配置
bastion_host = "bastion.example.com"
bastion_user = "bastion-user"
bastion_private_key = file("~/.ssh/bastion-key.pem")
bastion_port = 22
}

通过代理连接

connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
host = self.public_ip

# HTTP/SOCKS5 代理
proxy_scheme = "http" # 或 "https", "socks5"
proxy_host = "proxy.example.com"
proxy_port = 8080
proxy_user_name = "proxy_user" # 可选
proxy_user_password = "proxy_pass" # 可选
}

Provisioner 执行时机

创建时执行(默认)

默认情况下,Provisioner 在资源创建后立即执行:

resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"

provisioner "local-exec" {
command = "echo 'Instance created'"
when = create # 可省略,默认就是 create
}
}

销毁时执行

设置 when = destroy 可以在资源销毁前执行:

resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"

connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
host = self.public_ip
}

# 销毁前执行清理
provisioner "remote-exec" {
when = destroy

inline = [
"sudo systemctl stop myapp",
"sudo rm -rf /opt/myapp/data",
]
}
}

注意事项

  1. 销毁 Provisioner 在资源销毁之前执行
  2. 如果 Provisioner 失败,销毁操作会被阻止
  3. 如果资源配置被完全删除(从 .tf 文件中移除),销毁 Provisioner 不会执行

安全移除带有销毁 Provisioner 的资源

# 步骤 1:设置 count = 0
resource "aws_instance" "web" {
count = 0 # 添加此行
# ...其他配置...
}

# 步骤 2:应用以触发销毁 Provisioner
terraform apply

# 步骤 3:完全移除资源块
# 删除整个 resource "aws_instance" "web" 块

# 步骤 4:再次应用
terraform apply

失败处理

on_failure 选项

控制 Provisioner 失败时的行为:

resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"

# 失败时继续(默认行为是失败)
provisioner "local-exec" {
command = "curl https://api.example.com/notify || true"
on_failure = continue # 失败时继续
}

# 失败时标记资源为污点
provisioner "local-exec" {
command = "important-command"
on_failure = fail # 默认行为
}
}

选项说明

选项行为
continue忽略错误,继续执行
fail返回错误,标记资源为污点(默认)

Provisioner 失败后的恢复

当创建时 Provisioner 失败时:

  1. Terraform 会将资源标记为"污点"(tainted)
  2. 下次 terraform apply 时会重建该资源
  3. 重建时 Provisioner 会再次执行
# 查看污点资源
terraform state list

# 手动取消污点(如果确认资源正常)
terraform untaint aws_instance.web

# 手动标记污点(强制重建)
terraform taint aws_instance.web

使用 self 引用

在 Provisioner 中使用 self 对象引用父资源的属性:

resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"

provisioner "local-exec" {
# 使用 self 引用实例属性
command = "echo ${self.id} ${self.public_ip} ${self.private_ip}"
}

# 在 connection 块中也使用 self
connection {
host = self.public_ip
}
}

为什么不能直接使用资源名称

# 错误示例 - 这会导致循环依赖
resource "aws_instance" "web" {
provisioner "local-exec" {
command = "echo ${aws_instance.web.public_ip}" # 错误!
}
}

# 正确示例 - 使用 self
resource "aws_instance" "web" {
provisioner "local-exec" {
command = "echo ${self.public_ip}" # 正确
}
}

terraform_data 资源

当需要在没有特定资源的情况下运行 Provisioner 时,可以使用 terraform_data 资源(Terraform 1.4+):

# 用于运行独立的 Provisioner
resource "terraform_data" "bootstrap" {
# 当集群实例变化时触发重新执行
triggers_replace = [
aws_instance.cluster[*].id
]

connection {
host = aws_instance.cluster[0].public_ip
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
}

provisioner "remote-exec" {
inline = [
"cluster-init.sh ${join(" ", aws_instance.cluster[*].private_ip)}",
]
}
}

使用场景

# 1. 在基础设施创建完成后运行初始化
resource "terraform_data" "init_cluster" {
depends_on = [
aws_eks_cluster.main,
aws_eks_node_group.workers
]

provisioner "local-exec" {
command = "kubectl apply -f manifests/"

environment = {
KUBECONFIG = "kubeconfig.yaml"
}
}
}

# 2. 使用 input 传递数据
resource "terraform_data" "deploy" {
input = {
cluster_endpoint = aws_eks_cluster.main.endpoint
cluster_name = aws_eks_cluster.main.name
}

provisioner "local-exec" {
command = "deploy.sh"

environment = {
CLUSTER_ENDPOINT = self.input.cluster_endpoint
CLUSTER_NAME = self.input.cluster_name
}
}
}

敏感数据处理

Provisioner 可能涉及敏感数据(密码、密钥等),需要正确处理:

使用 sensitive 变量

variable "db_password" {
type = string
sensitive = true
}

resource "aws_instance" "db" {
provisioner "remote-exec" {
inline = [
# 敏感值在日志中会被隐藏
"export DB_PASSWORD='${var.db_password}'",
"configure-db.sh",
]
}
}

使用 ephemeral 值(Terraform 1.10+)

ephemeral "aws_secretsmanager_secret_version" "db_password" {
secret_id = "prod/db/password"
}

resource "aws_instance" "db" {
provisioner "remote-exec" {
inline = [
"export DB_PASSWORD='${ephemeral.aws_secretsmanager_secret_version.db_password.secret_string}'",
]
}
}

避免在命令行中暴露

# 危险 - 密码可能出现在进程列表中
provisioner "remote-exec" {
inline = [
"mysql -u root -p${var.db_password}", # 不安全!
]
}

# 安全 - 使用环境变量或文件
provisioner "remote-exec" {
inline = [
"export MYSQL_PWD='${var.db_password}'", # 使用环境变量
"mysql -u root",
]
}

# 更安全 - 使用配置文件
provisioner "file" {
content = "[client]\npassword=${var.db_password}"
destination = "/tmp/.my.cnf"
}

provisioner "remote-exec" {
inline = [
"mysql --defaults-file=/tmp/.my.cnf -u root",
"rm /tmp/.my.cnf", # 使用后删除
]
}

完整示例

示例 1:部署 Web 应用

variable "environment" {
default = "production"
}

variable "private_key_path" {
default = "~/.ssh/app-key.pem"
}

# 获取最新 Amazon Linux AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]

filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}

# 安全组
resource "aws_security_group" "web" {
name = "web-sg"
description = "Web server security group"

ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}

# Web 服务器实例
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
key_name = "app-key"
vpc_security_group_ids = [aws_security_group.web.id]

tags = {
Name = "web-${var.environment}"
Environment = var.environment
}

# SSH 连接配置
connection {
type = "ssh"
user = "ec2-user"
private_key = file(var.private_key_path)
host = self.public_ip
timeout = "5m"
}

# 步骤 1:上传应用文件
provisioner "file" {
source = "${path.module}/app/"
destination = "/tmp/app"
}

# 步骤 2:上传配置文件
provisioner "file" {
content = templatefile("${path.module}/config/app.conf.tftpl", {
environment = var.environment
server_port = 8080
})
destination = "/tmp/app/app.conf"
}

# 步骤 3:安装依赖并启动应用
provisioner "remote-exec" {
inline = [
"set -e",
"sudo yum update -y",
"sudo yum install -y python3",
"cd /tmp/app",
"pip3 install -r requirements.txt",
"sudo mv /tmp/app /opt/webapp",
"sudo systemctl daemon-reload",
]
}

# 步骤 4:本地记录部署信息
provisioner "local-exec" {
command = <<-EOT
echo "[$(date)] Deployed web server" >> deploy.log
echo " Instance ID: ${self.id}" >> deploy.log
echo " Public IP: ${self.public_ip}" >> deploy.log
echo " Environment: ${var.environment}" >> deploy.log
EOT
}

# 步骤 5:销毁时清理
provisioner "remote-exec" {
when = destroy

inline = [
"sudo systemctl stop webapp || true",
"sudo rm -rf /opt/webapp || true",
]
}
}

# 输出
output "web_public_ip" {
value = aws_instance.web.public_ip
}

示例 2:多服务器集群初始化

variable "cluster_size" {
default = 3
}

# 创建多个实例
resource "aws_instance" "cluster" {
count = var.cluster_size
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"

connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
host = self.public_ip
}

# 单独配置每个实例
provisioner "remote-exec" {
inline = [
"echo 'Initializing node ${count.index}'",
"hostnamectl set-hostname node-${count.index}",
]
}
}

# 集群初始化(在所有节点就绪后)
resource "terraform_data" "cluster_init" {
depends_on = [aws_instance.cluster]

triggers_replace = [
join(",", aws_instance.cluster[*].id)
]

connection {
host = aws_instance.cluster[0].public_ip
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
}

provisioner "remote-exec" {
inline = [
"cluster-join.sh ${join(" ", aws_instance.cluster[*].private_ip)}",
]
}
}

最佳实践

1. 优先使用替代方案

# 不推荐:使用 Provisioner 安装软件
resource "aws_instance" "web" {
provisioner "remote-exec" {
inline = [
"sudo yum install -y nginx",
"sudo systemctl start nginx",
]
}
}

# 推荐:使用 user_data
resource "aws_instance" "web" {
user_data = <<-EOF
#!/bin/bash
yum install -y nginx
systemctl start nginx
EOF
}

2. Provisioner 应该是幂等的

# 幂等操作:重复执行不会出错
provisioner "remote-exec" {
inline = [
"mkdir -p /opt/app", # 如果存在不会报错
"test -f /opt/app/config || cp /tmp/config /opt/app/config", # 条件复制
]
}

3. 使用详细的错误处理

provisioner "remote-exec" {
inline = [
"set -e", # 遇到错误立即退出
"set -x", # 打印执行的命令

"echo 'Starting deployment at $(date)'",
"sudo yum install -y nginx || { echo 'Failed to install nginx'; exit 1; }",
"sudo systemctl start nginx || { echo 'Failed to start nginx'; exit 1; }",
"echo 'Deployment completed successfully'",
]
}

4. 记录操作日志

provisioner "local-exec" {
command = <<-EOT
exec > >(tee -a deploy.log) 2>&1
echo "=== Deployment started at $(date) ==="
echo "Instance: ${self.id}"
echo "IP: ${self.public_ip}"
ansible-playbook -i ${self.public_ip}, playbook.yml
echo "=== Deployment completed at $(date) ==="
EOT
}

5. 处理连接超时

resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"

# 等待实例完全启动
provisioner "local-exec" {
command = "sleep 30" # 给实例启动时间
}

connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
host = self.public_ip
timeout = "10m" # 增加超时时间
}

provisioner "remote-exec" {
inline = ["echo 'Connected!'"]
}
}

6. 分离配置和 Provisioner

# 将 Provisioner 逻辑放入模块或独立资源
module "web_server" {
source = "./modules/web-server"
instance_type = "t2.micro"
provision_commands = [
"yum install -y nginx",
"systemctl start nginx",
]
}

故障排除

常见错误

1. 连接超时

Error: timeout - last error: ssh: handshake failed

解决方法:

  • 检查安全组是否允许 SSH 访问
  • 增加连接超时时间
  • 添加等待时间

2. 认证失败

Error: ssh: unable to authenticate

解决方法:

  • 检查用户名是否正确
  • 验证私钥文件路径
  • 确认密钥对与实例匹配

3. 命令执行失败

Error: Error running command

解决方法:

  • 使用 set -x 打印详细信息
  • 检查命令在目标系统上是否存在
  • 确认用户有执行权限

调试技巧

# 启用详细日志
TF_LOG=DEBUG terraform apply

# 查看连接详情
connection {
# ...
timeout = "10m"
}

# 在 Provisioner 中添加调试
provisioner "remote-exec" {
inline = [
"echo 'Current user: $(whoami)'",
"echo 'Current directory: $(pwd)'",
"echo 'Environment:' && env",
]
}

下一步

了解了 Provisioner 后,可以继续学习 Import 导入,了解如何将现有资源导入 Terraform 管理。