Provisioners 配置器
Provisioners(配置器)允许在资源创建后或销毁前执行额外的操作,例如上传文件、运行脚本或配置服务。然而,Provisioners 是 Terraform 中一种"最后手段"的功能,应该谨慎使用。
概述
什么是 Provisioner
Provisioner 是在资源生命周期特定阶段运行的代码块,用于执行无法通过 Provider 直接完成的操作。主要用途包括:
- 在实例上执行配置脚本
- 上传配置文件到远程服务器
- 在本地执行命令
- 在资源销毁前执行清理操作
为什么 Provisioner 是"最后手段"
官方文档明确指出 Provisioner 是"最后手段"(last resort),原因如下:
- 不可预测性:Terraform 无法完全模拟 Provisioner 的行为
- 状态问题:Provisioner 失败可能导致资源处于不一致状态
- 安全性:需要直接网络访问和凭证,增加安全风险
- 可维护性:难以调试和版本控制
推荐的替代方案
在使用 Provisioner 之前,应该优先考虑以下方案:
| 场景 | 推荐方案 |
|---|---|
| 实例初始化配置 | 使用 user_data 或 cloud-init |
| 预装软件 | 使用 Packer 构建自定义镜像 |
| 配置管理 | 使用 Ansible、Chef、Puppet 等工具 |
| 文件分发 | 使用对象存储 + 启动脚本下载 |
Provisioner 类型
local-exec
在运行 Terraform 的本地机器上执行命令。
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
# 创建实例后在本地执行命令
provisioner "local-exec" {
command = "echo Instance ${self.public_ip} created >> instances.log"
}
}
常用场景:
# 调用 API
provisioner "local-exec" {
command = "curl -X POST https://api.example.com/notify -d 'instance=${self.public_ip}'"
}
# 运行 Ansible
provisioner "local-exec" {
command = "ansible-playbook -i ${self.public_ip}, playbook.yml"
}
# 执行本地脚本
provisioner "local-exec" {
command = "${path.module}/scripts/setup.sh ${self.public_ip}"
}
环境变量:
provisioner "local-exec" {
command = "echo $ENV_VAR $INSTANCE_IP"
environment = {
ENV_VAR = "value"
INSTANCE_IP = self.public_ip
}
}
# 在 Windows 上运行
provisioner "local-exec" {
command = "echo %INSTANCE_IP%"
when = create
working_dir = "C:/scripts"
environment = {
INSTANCE_IP = self.public_ip
}
}
remote-exec
在远程资源上通过 SSH 或 WinRM 执行命令。
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
key_name = "my-key"
# 连接配置
connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/my-key.pem")
host = self.public_ip
}
# 在远程实例上执行命令
provisioner "remote-exec" {
inline = [
"sudo yum update -y",
"sudo yum install -y nginx",
"sudo systemctl start nginx",
]
}
}
内联命令与脚本:
# 方式一:内联命令
provisioner "remote-exec" {
inline = [
"mkdir -p /opt/myapp",
"echo 'Hello World' > /opt/myapp/config.txt",
]
}
# 方式二:执行脚本文件
provisioner "remote-exec" {
script = "${path.module}/scripts/setup.sh"
}
# 方式三:带参数的脚本
provisioner "remote-exec" {
inline = [
"chmod +x /tmp/setup.sh",
"/tmp/setup.sh ${var.environment} ${var.region}",
]
}
file
将文件或目录从本地复制到远程资源。
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
key_name = "my-key"
connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/my-key.pem")
host = self.public_ip
}
# 复制单个文件
provisioner "file" {
source = "${path.module}/files/app.conf"
destination = "/tmp/app.conf"
}
# 复制目录
provisioner "file" {
source = "${path.module}/files/config/"
destination = "/tmp/config/"
}
# 从内容创建文件
provisioner "file" {
content = "server {\n listen 80;\n}"
destination = "/tmp/nginx.conf"
}
}
完整示例:部署 Web 应用:
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
key_name = var.key_name
vpc_security_group_ids = [aws_security_group.web.id]
subnet_id = aws_subnet.public.id
connection {
type = "ssh"
user = "ec2-user"
private_key = file(var.private_key_path)
host = self.public_ip
}
# 步骤 1:上传配置文件
provisioner "file" {
source = "${path.module}/files/"
destination = "/tmp/app"
}
# 步骤 2:上传应用代码
provisioner "file" {
source = "${path.module}/app/"
destination = "/tmp/app/code"
}
# 步骤 3:执行部署脚本
provisioner "remote-exec" {
inline = [
"chmod +x /tmp/app/deploy.sh",
"sudo /tmp/app/deploy.sh ${var.environment}",
]
}
# 步骤 4:本地记录部署信息
provisioner "local-exec" {
command = "echo 'Deployed to ${self.public_ip}' >> deploy.log"
}
}
连接配置
connection 块定义如何连接到远程资源。
SSH 连接
connection {
type = "ssh"
user = "ec2-user" # SSH 用户名
password = var.password # 密码认证(可选)
private_key = file("~/.ssh/key.pem") # 密钥认证(推荐)
host = self.public_ip # 目标主机
port = 22 # SSH 端口(默认 22)
timeout = "5m" # 连接超时
}
常用云平台的 SSH 用户:
| 平台 | 默认用户 |
|---|---|
| Amazon Linux | ec2-user |
| Ubuntu | ubuntu |
| CentOS | centos |
| Debian | admin 或 debian |
| RHEL | ec2-user 或 root |
| 阿里云 Linux | root |
WinRM 连接(Windows)
connection {
type = "winrm"
user = "Administrator"
password = var.admin_password
host = self.public_ip
port = 5985 # HTTP 端口
insecure = true # 忽略证书验证
timeout = "10m"
}
# 使用 HTTPS
connection {
type = "winrm"
user = "Administrator"
password = var.admin_password
host = self.public_ip
port = 5986
https = true
insecure = true
}
Windows 实例需要配置 WinRM:
# 在 user_data 中配置 WinRM
resource "aws_instance" "windows" {
ami = data.aws_ami.windows.id
instance_type = "t2.micro"
user_data = <<-EOF
<powershell>
Enable-PSRemoting -Force
Set-Item WSMan:\localhost\Service\Auth\Basic $true
Set-Item WSMan:\localhost\Service\AllowUnencrypted $true
winrm set winrm/config/service '@{AllowUnencrypted="true"}'
winrm set winrm/config/service/auth '@{Basic="true"}'
netsh advfirewall firewall add rule name="WinRM HTTP" protocol=TCP dir=in localport=5985 action=allow
</powershell>
EOF
connection {
type = "winrm"
user = "Administrator"
password = var.admin_password
host = self.public_ip
}
}
通过堡垒机连接
connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
host = self.private_ip # 目标主机的私有 IP
# 堡垒机配置
bastion_host = "bastion.example.com"
bastion_user = "bastion-user"
bastion_private_key = file("~/.ssh/bastion-key.pem")
bastion_port = 22
}
通过代理连接
connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
host = self.public_ip
# HTTP/SOCKS5 代理
proxy_scheme = "http" # 或 "https", "socks5"
proxy_host = "proxy.example.com"
proxy_port = 8080
proxy_user_name = "proxy_user" # 可选
proxy_user_password = "proxy_pass" # 可选
}
Provisioner 执行时机
创建时执行(默认)
默认情况下,Provisioner 在资源创建后立即执行:
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
provisioner "local-exec" {
command = "echo 'Instance created'"
when = create # 可省略,默认就是 create
}
}
销毁时执行
设置 when = destroy 可以在资源销毁前执行:
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
host = self.public_ip
}
# 销毁前执行清理
provisioner "remote-exec" {
when = destroy
inline = [
"sudo systemctl stop myapp",
"sudo rm -rf /opt/myapp/data",
]
}
}
注意事项:
- 销毁 Provisioner 在资源销毁之前执行
- 如果 Provisioner 失败,销毁操作会被阻止
- 如果资源配置被完全删除(从 .tf 文件中移除),销毁 Provisioner 不会执行
安全移除带有销毁 Provisioner 的资源:
# 步骤 1:设置 count = 0
resource "aws_instance" "web" {
count = 0 # 添加此行
# ...其他配置...
}
# 步骤 2:应用以触发销毁 Provisioner
terraform apply
# 步骤 3:完全移除资源块
# 删除整个 resource "aws_instance" "web" 块
# 步骤 4:再次应用
terraform apply
失败处理
on_failure 选项
控制 Provisioner 失败时的行为:
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
# 失败时继续(默认行为是失败)
provisioner "local-exec" {
command = "curl https://api.example.com/notify || true"
on_failure = continue # 失败时继续
}
# 失败时标记资源为污点
provisioner "local-exec" {
command = "important-command"
on_failure = fail # 默认行为
}
}
选项说明:
| 选项 | 行为 |
|---|---|
continue | 忽略错误,继续执行 |
fail | 返回错误,标记资源为污点(默认) |
Provisioner 失败后的恢复
当创建时 Provisioner 失败时:
- Terraform 会将资源标记为"污点"(tainted)
- 下次
terraform apply时会重建该资源 - 重建时 Provisioner 会再次执行
# 查看污点资源
terraform state list
# 手动取消污点(如果确认资源正常)
terraform untaint aws_instance.web
# 手动标记污点(强制重建)
terraform taint aws_instance.web
使用 self 引用
在 Provisioner 中使用 self 对象引用父资源的属性:
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
provisioner "local-exec" {
# 使用 self 引用实例属性
command = "echo ${self.id} ${self.public_ip} ${self.private_ip}"
}
# 在 connection 块中也使用 self
connection {
host = self.public_ip
}
}
为什么不能直接使用资源名称:
# 错误示例 - 这会导致循环依赖
resource "aws_instance" "web" {
provisioner "local-exec" {
command = "echo ${aws_instance.web.public_ip}" # 错误!
}
}
# 正确示例 - 使用 self
resource "aws_instance" "web" {
provisioner "local-exec" {
command = "echo ${self.public_ip}" # 正确
}
}
terraform_data 资源
当需要在没有特定资源的情况下运行 Provisioner 时,可以使用 terraform_data 资源(Terraform 1.4+):
# 用于运行独立的 Provisioner
resource "terraform_data" "bootstrap" {
# 当集群实例变化时触发重新执行
triggers_replace = [
aws_instance.cluster[*].id
]
connection {
host = aws_instance.cluster[0].public_ip
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
}
provisioner "remote-exec" {
inline = [
"cluster-init.sh ${join(" ", aws_instance.cluster[*].private_ip)}",
]
}
}
使用场景:
# 1. 在基础设施创建完成后运行初始化
resource "terraform_data" "init_cluster" {
depends_on = [
aws_eks_cluster.main,
aws_eks_node_group.workers
]
provisioner "local-exec" {
command = "kubectl apply -f manifests/"
environment = {
KUBECONFIG = "kubeconfig.yaml"
}
}
}
# 2. 使用 input 传递数据
resource "terraform_data" "deploy" {
input = {
cluster_endpoint = aws_eks_cluster.main.endpoint
cluster_name = aws_eks_cluster.main.name
}
provisioner "local-exec" {
command = "deploy.sh"
environment = {
CLUSTER_ENDPOINT = self.input.cluster_endpoint
CLUSTER_NAME = self.input.cluster_name
}
}
}
敏感数据处理
Provisioner 可能涉及敏感数据(密码、密钥等),需要正确处理:
使用 sensitive 变量
variable "db_password" {
type = string
sensitive = true
}
resource "aws_instance" "db" {
provisioner "remote-exec" {
inline = [
# 敏感值在日志中会被隐藏
"export DB_PASSWORD='${var.db_password}'",
"configure-db.sh",
]
}
}
使用 ephemeral 值(Terraform 1.10+)
ephemeral "aws_secretsmanager_secret_version" "db_password" {
secret_id = "prod/db/password"
}
resource "aws_instance" "db" {
provisioner "remote-exec" {
inline = [
"export DB_PASSWORD='${ephemeral.aws_secretsmanager_secret_version.db_password.secret_string}'",
]
}
}
避免在命令行中暴露
# 危险 - 密码可能出现在进程列表中
provisioner "remote-exec" {
inline = [
"mysql -u root -p${var.db_password}", # 不安全!
]
}
# 安全 - 使用环境变量或文件
provisioner "remote-exec" {
inline = [
"export MYSQL_PWD='${var.db_password}'", # 使用环境变量
"mysql -u root",
]
}
# 更安全 - 使用配置文件
provisioner "file" {
content = "[client]\npassword=${var.db_password}"
destination = "/tmp/.my.cnf"
}
provisioner "remote-exec" {
inline = [
"mysql --defaults-file=/tmp/.my.cnf -u root",
"rm /tmp/.my.cnf", # 使用后删除
]
}
完整示例
示例 1:部署 Web 应用
variable "environment" {
default = "production"
}
variable "private_key_path" {
default = "~/.ssh/app-key.pem"
}
# 获取最新 Amazon Linux AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
# 安全组
resource "aws_security_group" "web" {
name = "web-sg"
description = "Web server security group"
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# Web 服务器实例
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
key_name = "app-key"
vpc_security_group_ids = [aws_security_group.web.id]
tags = {
Name = "web-${var.environment}"
Environment = var.environment
}
# SSH 连接配置
connection {
type = "ssh"
user = "ec2-user"
private_key = file(var.private_key_path)
host = self.public_ip
timeout = "5m"
}
# 步骤 1:上传应用文件
provisioner "file" {
source = "${path.module}/app/"
destination = "/tmp/app"
}
# 步骤 2:上传配置文件
provisioner "file" {
content = templatefile("${path.module}/config/app.conf.tftpl", {
environment = var.environment
server_port = 8080
})
destination = "/tmp/app/app.conf"
}
# 步骤 3:安装依赖并启动应用
provisioner "remote-exec" {
inline = [
"set -e",
"sudo yum update -y",
"sudo yum install -y python3",
"cd /tmp/app",
"pip3 install -r requirements.txt",
"sudo mv /tmp/app /opt/webapp",
"sudo systemctl daemon-reload",
]
}
# 步骤 4:本地记录部署信息
provisioner "local-exec" {
command = <<-EOT
echo "[$(date)] Deployed web server" >> deploy.log
echo " Instance ID: ${self.id}" >> deploy.log
echo " Public IP: ${self.public_ip}" >> deploy.log
echo " Environment: ${var.environment}" >> deploy.log
EOT
}
# 步骤 5:销毁时清理
provisioner "remote-exec" {
when = destroy
inline = [
"sudo systemctl stop webapp || true",
"sudo rm -rf /opt/webapp || true",
]
}
}
# 输出
output "web_public_ip" {
value = aws_instance.web.public_ip
}
示例 2:多服务器集群初始化
variable "cluster_size" {
default = 3
}
# 创建多个实例
resource "aws_instance" "cluster" {
count = var.cluster_size
ami = data.aws_ami.amazon_linux.id
instance_type = "t2.micro"
connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
host = self.public_ip
}
# 单独配置每个实例
provisioner "remote-exec" {
inline = [
"echo 'Initializing node ${count.index}'",
"hostnamectl set-hostname node-${count.index}",
]
}
}
# 集群初始化(在所有节点就绪后)
resource "terraform_data" "cluster_init" {
depends_on = [aws_instance.cluster]
triggers_replace = [
join(",", aws_instance.cluster[*].id)
]
connection {
host = aws_instance.cluster[0].public_ip
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
}
provisioner "remote-exec" {
inline = [
"cluster-join.sh ${join(" ", aws_instance.cluster[*].private_ip)}",
]
}
}
最佳实践
1. 优先使用替代方案
# 不推荐:使用 Provisioner 安装软件
resource "aws_instance" "web" {
provisioner "remote-exec" {
inline = [
"sudo yum install -y nginx",
"sudo systemctl start nginx",
]
}
}
# 推荐:使用 user_data
resource "aws_instance" "web" {
user_data = <<-EOF
#!/bin/bash
yum install -y nginx
systemctl start nginx
EOF
}
2. Provisioner 应该是幂等的
# 幂等操作:重复执行不会出错
provisioner "remote-exec" {
inline = [
"mkdir -p /opt/app", # 如果存在不会报错
"test -f /opt/app/config || cp /tmp/config /opt/app/config", # 条件复制
]
}
3. 使用详细的错误处理
provisioner "remote-exec" {
inline = [
"set -e", # 遇到错误立即退出
"set -x", # 打印执行的命令
"echo 'Starting deployment at $(date)'",
"sudo yum install -y nginx || { echo 'Failed to install nginx'; exit 1; }",
"sudo systemctl start nginx || { echo 'Failed to start nginx'; exit 1; }",
"echo 'Deployment completed successfully'",
]
}
4. 记录操作日志
provisioner "local-exec" {
command = <<-EOT
exec > >(tee -a deploy.log) 2>&1
echo "=== Deployment started at $(date) ==="
echo "Instance: ${self.id}"
echo "IP: ${self.public_ip}"
ansible-playbook -i ${self.public_ip}, playbook.yml
echo "=== Deployment completed at $(date) ==="
EOT
}
5. 处理连接超时
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
# 等待实例完全启动
provisioner "local-exec" {
command = "sleep 30" # 给实例启动时间
}
connection {
type = "ssh"
user = "ec2-user"
private_key = file("~/.ssh/key.pem")
host = self.public_ip
timeout = "10m" # 增加超时时间
}
provisioner "remote-exec" {
inline = ["echo 'Connected!'"]
}
}
6. 分离配置和 Provisioner
# 将 Provisioner 逻辑放入模块或独立资源
module "web_server" {
source = "./modules/web-server"
instance_type = "t2.micro"
provision_commands = [
"yum install -y nginx",
"systemctl start nginx",
]
}
故障排除
常见错误
1. 连接超时
Error: timeout - last error: ssh: handshake failed
解决方法:
- 检查安全组是否允许 SSH 访问
- 增加连接超时时间
- 添加等待时间
2. 认证失败
Error: ssh: unable to authenticate
解决方法:
- 检查用户名是否正确
- 验证私钥文件路径
- 确认密钥对与实例匹配
3. 命令执行失败
Error: Error running command
解决方法:
- 使用
set -x打印详细信息 - 检查命令在目标系统上是否存在
- 确认用户有执行权限
调试技巧
# 启用详细日志
TF_LOG=DEBUG terraform apply
# 查看连接详情
connection {
# ...
timeout = "10m"
}
# 在 Provisioner 中添加调试
provisioner "remote-exec" {
inline = [
"echo 'Current user: $(whoami)'",
"echo 'Current directory: $(pwd)'",
"echo 'Environment:' && env",
]
}
下一步
了解了 Provisioner 后,可以继续学习 Import 导入,了解如何将现有资源导入 Terraform 管理。