You will be surprised how many things must be considered when deploying a simple lambda function with Terraform. I´ve faced a lot of issues in the past and try to summarize most of them in this post. You will find a short description of the problem domains and separate sections on how to tackle those issues. The necessary code snippets will help you to improve the quality of your lambda function deployment. I think it´s a good practice to understand the following use cases which enables you to build a custom lambda module that satisfies exactly what you need - nothing more, nothing less.
Don´t rely on Terraforms native provider resources for complex objects like Lambda, API Gateway, S3 and many more. Try to define the use case for your purpose and design your own modules.
Definition of the use case
I typically use lambda functions for infrastructure components within multiple AWS Accounts inside my organization. My requirements for a proper lambda deployment are:
The module should be optimized for python
The module must be deployable via different stages of my AWS CI/CD (Codepipeline)
The module should be lightweight and must not use S3 as backend for lambda code (zip files)
The module must support lambda with cross-account functions
The module should simplify the lambda deployment and reduce the lines of code needed in terraform projects
The module should pack external libraries into a lambda layer
Updates of the lambda functions can happen via an in-place deployment. I do not need any fancy update logic (blue/green, rolling)
I do not need versioning / aliases - one lambda version at a time is sufficient
The module should implement frequently used patterns
A separate module for lambda layers should be available
Compare Terraform against other IaC Tools
There are different ways and tools to deploy a lambda function. Typically 2 ways to publish your code can be consumed. First, you create a S3 Bucket and store a zipped version of your source code in the Bucket. Second, you zip the lambda function locally and upload it via the AWS API. There is a hard zip size quota of 50 MB [zipped] in place. This limitation will hit you hard if you try to use bigger libraries in your lambda function. I recommend using the lambda layer feature to prevent such issues - keep your code lean and hide all external dependencies in layers. Of course, nowadays it´s best practice to let others do the work and make use of IaC. Next to Terraform, we have AWS native options to upload and deploy lambdas.
AWS Codedeploy is a great service to deploy business-critical / highly frequently used lambdas in production. Most people make use of AWS SAM in order to develop their code. SAM is optimized for serverless workloads and integrates very well with Codedeploy (see also: Tutorial: Deploy an updated Lambda function with CodeDeploy and the AWS Serverless Application Model - AWS CodeDeploy (amazon.com)).
There are also other frameworks and tools like the AWS CDK or the serverless framework which are using cloudformation in the backend to publish a lambda. These tools, like Terraform, aren´t designed for lambda deployments which offers space for us engineers to optimize our deployments.
How to fix the problems / use cases
Run your lambda in a CICD Pipe? Resolve the zipping problem
As mentioned above we need a zipped version of a lambda either locally or on S3 to be able to deploy our function. However, a CICD Pipeline in the cloud may contain several steps hosted by different containers. This means there is a need for some kind of logic that does the zipping and ensures that the generated zip is available in all dependent stages of the pipeline. The poor man´s approach would be to store a zipped version of the code in your repository - this is a very bad practice as it will blur your repository and adds redundant content which doesn´t align with the DRY principle. Instead, I´ve decided to face this problem in Terraform itself:
variable "runtime" {
description = "Runtime of the lambda"
type = string
default = "python3.9"
}
variable "source_path" {
description = <<EOT
Root directory path of the lambda function:
<lambda-function-name = root>
├──requirements.txt
└──src
├──<lambda-function-name>.py
├──...
E.g. `modules/avm/lambda/myfancylambda`
EOT
type = string
}
locals {
is_python_lambda = substr(var.runtime, 0, 6) == "python"
is_requirements_file_present = fileexists("${var.source_path}/requirements.txt")
deploy_local_python_layer = local.is_python_lambda && local.is_requirements_file_present
}
resource "random_uuid" "lambda_src_hash" {
keepers = {
for filename in setunion(
fileset(var.source_path, "src/*.py"),
fileset(var.source_path, "src/**/*.py")
) :
filename => filemd5("${var.source_path}/${filename}")
}
}
resource "random_uuid" "lambda_external_dependencies_hash" {
keepers = {
for filename in setunion(
fileset(var.source_path, "requirements.txt")
) :
filename => filemd5("${var.source_path}/${filename}")
}
}
# See also https://docs.aws.amazon.com/lambda/latest/dg/python-package.html#python-package-dependencies
resource "null_resource" "install_dependencies" {
count = local.deploy_local_python_layer ? 1 : 0
provisioner "local-exec" {
command = "${var.runtime} -m pip install -r ${var.source_path}/requirements.txt -t ${var.source_path}/.layer/python --upgrade"
}
triggers = {
external_dependency_hash = random_uuid.lambda_external_dependencies_hash.result
}
}
# This zipping happens on every terraform run. I´ve found no solution to trigger it only on demand
data "archive_file" "lambda_source_package" {
type = "zip"
source_dir = "${var.source_path}/src"
output_path = "${var.source_path}/${module.lambda_label.id}.zip"
output_file_mode = "0755"
excludes = var.excluded_files_in_source_package
}
# This zipping happens on every terraform run. I´ve found no solution to trigger it only on demand
# When no layer is needed, it creates an empty zip containing only the .info file
data "archive_file" "lambda_layer_package" {
type = "zip"
source_dir = local.deploy_local_python_layer ? "${var.source_path}/.layer" : null
output_path = "${var.source_path}/${module.lambda_label.id}-layer.zip"
output_file_mode = "0755"
# dummy file required to allow source_dir to be empty when no layer is needed
dynamic "source" {
for_each = local.deploy_local_python_layer ? [] : [1]
content {
content = "Generated by Terraforms archive provider"
filename = ".info"
}
}
excludes = [
".gitkeep"
]
depends_on = [
null_resource.install_dependencies[0]
]
}
This code will ensure that your lambda only gets deployed/updated when there was a change in a file under the src folder. The hash function comes in handy in order to check if some file contents changed. The cool thing is: You have full visibility of your zip file's content and hash values via the Terraform keepers resource. The solution still has drawbacks:
There is a dependency on pip: You need to run the terraform null resource to trigger the installation of external dependencies via pip. To my knowledge, there is no way around it. This code will resolve the pip version based on the provided runtime. Depending on your setup you may need to add an alias.
Terraform will only trigger the installation of external dependencies when there is a change on requirements.txt. This can cause some unwanted behavior when terraform replaces a lambda. One example could be when you change the name of a lambda function which is part of its arn. In order to fix this issue you can taint the null resource and trigger the download for external dependencies.
The zipping for your lambda code will always happen: Terraforms zipping provider has no conditional zipping implemented. This means that code will always get zipped. The only optimization implemented in this solution is to not load external dependencies. I think this is a good tradeoff, as your code typically isn´t going to be that big (otherwise you most probably hurt the purpose of lambda usage). The zipping in some of my projects with 100+ lambdas is done in under 3 seconds.
Apply a proper naming for your lambda functions
Lambda functions aren´t supporting attribute-based access control as of now. This means if you want to protect your function at scale you may want to apply a common naming scheme. I´ve borrowed the labeling module from Cloudposse in order to have the same naming convention for all of my lambdas. This also enables proper baseline tagging for my resources.
# More information on github.com/cloudposse/terraform-null-label
module "lambda_label" {
source = "./modules/cloudposse-terraform-null-label/"
namespace = var.namespace
stage = var.stage
name = var.name
regex_replace_chars = var.regex_replace_chars
}
variable "namespace" {
type = string
default = null
description = "ID element. Usually an abbreviation of the product, e.g. 'fancyfunction'."
}
variable "stage" {
type = string
default = null
description = "ID element. Usually used to indicate role or environment, e.g. 'prod', 'dev', 'test', 'staging'"
}
variable "name" {
type = string
default = null
description = "ID element. Usually the purpose of the lambda function. e.g. 'notifications'"
}
variable "regex_replace_chars" {
type = string
default = "/[^a-zA-Z0-9-_]/"
description = <<-EOT
Terraform regular expression (regex) string.
Characters matching the regex will be removed from the ID elements.
If not set, `"/[^a-zA-Z0-9-_]/"` is used to remove all characters other than hyphens, underscore, letters and digits.
EOT
}
Ensure that code is visible in the lambda console by moving external logic into a lambda layer
If your lambda deployment package is too big you will not be able to see any code in the lambda console. Yes, I know this is not needed or should be done extensively. However, sometimes it really is easier to troubleshoot or fix some minor issues within a lambda through the console. Since you have full control over the environment variables it´s easy to change the logging and search/fix bugs directly in the console.
In addition, I enable AWS Powertools as a default for all of my lambdas. The Powertools is a great set of libraries that will help you to increase the quality of your code. Here is a link to the documentation: Homepage - AWS Lambda Powertools for Python.
I can only recommend using its functionality. Especially the logging is pretty cool as you can log additional data in a json format and parse your log groups via cloudwatch insights. My solution was to install all external dependencies in a separate layer - for each function:
locals {
is_requirements_file_present = fileexists("${var.source_path}/requirements.txt")
deploy_local_python_layer = local.is_python_lambda && local.is_requirements_file_present
local_python_layer = (
local.deploy_local_python_layer
? [aws_lambda_layer_version.lambda_layer[0].arn]
: []
)
powertools_layer_arn = (
contains(var.architectures, "arm64")
? "arn:aws:lambda:${data.aws_region.current.name}:017000801446:layer:AWSLambdaPowertoolsPythonV2-Arm64:${var.powertools_layer_version}"
: "arn:aws:lambda:${data.aws_region.current.name}:017000801446:layer:AWSLambdaPowertoolsPythonV2:${var.powertools_layer_version}"
)
powertools_layer = (
var.powertools_layer_version != null
? [local.powertools_layer_arn]
: []
)
python_layer_list = concat(
var.additional_layers,
local.local_python_layer,
local.powertools_layer
)
}
resource "aws_lambda_layer_version" "lambda_layer" {
count = local.deploy_local_python_layer ? 1 : 0
depends_on = [null_resource.install_dependencies]
filename = data.archive_file.lambda_layer_package.output_path
description = "This layer provides external dependencies in the lambda function."
layer_name = "${module.lambda_label.id}-${random_uuid.lambda_external_dependencies_hash.result}"
compatible_architectures = var.architectures
compatible_runtimes = [var.runtime]
}
Support of custom lambda layers
Sometimes you will face the challenge that multiple lambda functions are sharing the same business logic. In such cases, I recommend building custom lambda layers and linking them to your lambda function.
During my tests, I´ve seen that there are also some pitfalls when using layers. I had the idea to share some of my layers with my organization. Since layers are managed in versions you may not want to delete "old" versions as they may are linked with some lambdas. So be sure to enable the "skip_destroy" option of the layer. A second problem is the logic of AWS versioning: AWS will always add the layer version by 1 - If you delete the layer and redeploy it with the same arn it will continue the count where it left off. This fact makes it difficult to determine the versions to share in an automated fashion - trust me: You will run into circular dependencies whenever you try to determine the "latest" or "current" version number. As a result, I´ve gone with an optimistic approach and just share all versions based on a variable. I am really not satisfied with this kind of solution as it needs your contribution every time a new version gets published. In addition, the solution breaks if you delete and redeploy the lambda (as the first few versions will not be available). As an alternative, you could also provide a list of versions to share - however, I didn´t want to implement this approach since the version numbers in different environments may differ. If you make it explicit developers may tend to believe that version x is the same over all stages.
variable "organization_id" {
description = "Organization ID in which the lambda gets shared"
type = string
default = null
}
variable "share_up_to_version" {
description = "Version number up to which layer gets shared with the organization"
type = string
default = "1"
}
locals {
versions_to_share = var.organization_id == null ? 0 : var.share_up_to_version
}
# ATTENTION: Lambda layers will not be deleted by terraform - this is necessary
# in order to guarantee compatibility for all consumers when sharing the layer.
# DO NOT DELETE LAYERS manually as this may break your consumers deployments.
# There is no way to get the same layer version up and running after the layer is
# destroyed. AWS will always provision the next version id!
resource "aws_lambda_layer_version" "layer" {
filename = data.archive_file.lambda_layer_package.output_path
source_code_hash = data.archive_file.lambda_layer_package.output_base64sha256
description = var.description
layer_name = module.lambda_layer_label.id
compatible_architectures = var.compatible_architectures
skip_destroy = true #Even if this resource gets destroyed by terraform the layer will continue to exist!
compatible_runtimes = [var.runtime]
}
resource "aws_lambda_layer_version_permission" "lambda_layer_permission" {
depends_on = [
aws_lambda_layer_version.layer
]
count = tonumber(local.versions_to_share)
layer_name = module.lambda_layer_label.id
version_number = count.index + 1
principal = "*"
organization_id = var.organization_id
action = "lambda:GetLayerVersion"
statement_id = module.lambda_layer_label.id
}
Simplify the deployment of your lambda role and enable cross-account support
When working with a lambda you typically interact with a lot of different AWS services. Especially for deployments around governance, you may want to jump into a different account in order to retrieve some information or maintain resources gathered in service accounts. In general, I try to prevent such scenarios. However, as an example just imagine your lambda needs to do some actions in the management account like loading account tags. Due to cyclic dependencies, I had to put some data about the cross-account role as external information provided in a module variable. In addition, the code is pretty flexible and gives you room to decide whether you want to maintain your iam policies outside or inside of the lambda. You can specify it directly inline when calling the module or you can build your own policy and reuse it for multiple lambdas.
The code also integrates the base policies needed by lambda to write logs. This enables you to concentrate on what´s important: The access you need for your lambda.
variable "role_name" {
description = "Existing role name to be attached to the Lambda function, instead of generating one."
type = string
default = null
}
variable "role_path" {
description = "Path of role. Only supported if no role_arn is defined."
type = string
default = "/terraform/"
}
variable "policy_arns" {
description = "Policies to be attached to the Lambda role, generated by the module."
type = list(string)
default = []
}
variable "policy_documents" {
description = <<EOT
List of data.aws_iam_policy_document documents in json format.
Usage: [data.aws_iam_policy_document.example1.json, data.aws_iam_policy_document.example2.json]
EOT
type = list(string)
default = []
}
variable "cross_account_assume_role" {
description = <<EOT
Add support for cross Account Role.
Usage for role_path: Include a starting and trailing slash.
e.g.: '/' for no path
'/terraform/' for terraform/ path
EOT
type = object({
account_id = string
role_path = string
role_name = string
})
default = null
}
data "aws_iam_policy_document" "lambda_function_basic_execution_policy_document" {
statement {
sid = "AllowCreateLogGroup"
actions = ["logs:CreateLogGroup"]
resources = ["arn:aws:logs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:*"]
effect = "Allow"
}
statement {
sid = "AllowPutLogs"
actions = ["logs:CreateLogStream", "logs:PutLogEvents"]
resources = ["arn:aws:logs:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:log-group:/aws/lambda/${module.lambda_label.id}:*"]
effect = "Allow"
}
}
locals {
basic_execution_policy = (
var.cross_account_assume_role == null
? concat(
[data.aws_iam_policy_document.lambda_function_basic_execution_policy_document.json],
var.policy_documents
)
: concat(
[data.aws_iam_policy_document.lambda_function_basic_execution_policy_document.json],
var.policy_documents,
[data.aws_iam_policy_document.lambda_sts_cross_account[0].json]
)
)
cross_account_assume_role_arn = (
var.cross_account_assume_role != null
? "arn:aws:iam::${var.cross_account_assume_role["account_id"]}:role${var.cross_account_assume_role["role_path"]}${var.cross_account_assume_role["role_name"]}"
: null
)
env_vars_cross_account_assume_role = {
REGION = data.aws_region.current.name
ASSUME_ROLE_ARN = local.cross_account_assume_role_arn
}
}
module "lambda_execution_role" {
enabled = var.role_name == null
source = "./modules/cloudposse-terraform-aws-iam-role/"
use_fullname = local.use_fullname
namespace = var.namespace
stage = var.stage
name = var.name
regex_replace_chars = var.regex_replace_chars
path = var.role_path
role_description = var.description
principals = {
Service = ["lambda.amazonaws.com"]
}
policy_documents = local.basic_execution_policy
}
# Attaching policies directly by managed_policy_arns to the role can lead into cyclic dependency
# aws_iam_role_policy_attachment is used to avoid that
resource "aws_iam_role_policy_attachment" "lambda_role_additional_policy_attachments" {
count = var.role_name == null ? length(var.policy_arns) : 0
# The role is bound indirectly by the role name. If the role is deployed after the policy, it can not find it.
depends_on = [module.lambda_execution_role]
role = module.lambda_label.id
policy_arn = var.policy_arns[count.index]
}
# Attach additional policies to the external role
resource "aws_iam_role_policy_attachment" "lambda_external_role_additional_policy_attachments" {
count = var.role_name != null ? length(var.policy_arns) : 0
role = var.role_name
policy_arn = var.policy_arns[count.index]
}
# When an external role is given, attach the basic execution policy together with the provided policy_documents
data "aws_iam_policy_document" "lambda_function_basic_execution_policy" {
count = var.role_name != null ? 1 : 0
override_policy_documents = local.basic_execution_policy
}
resource "aws_iam_policy" "lambda_function_basic_execution_policy" {
count = var.role_name != null ? 1 : 0
name = "${module.lambda_label.id}-basic-execution"
policy = join("", data.aws_iam_policy_document.lambda_function_basic_execution_policy[*].json)
}
resource "aws_iam_role_policy_attachment" "lambda_role_basic_execution_policy_attachment" {
count = var.role_name != null ? 1 : 0
role = var.role_name
policy_arn = aws_iam_policy.lambda_function_basic_execution_policy[count.index].arn
}
data "aws_iam_policy_document" "lambda_sts_cross_account" {
count = local.cross_account_assume_role_arn != null ? 1 : 0
statement {
sid = "AllowAssumeCrossAccountRole"
actions = ["sts:AssumeRole"]
resources = [local.cross_account_assume_role_arn]
effect = "Allow"
}
}
Enable AWS Service usage
This section is about lambdas resource policy. A resource policy can be seen as an inbound policy for your lambda function. If you want to invoke a lambda function via a 3rd party service like API GW or Eventbridge you need to allow this in your resource policy. This is how you can enable the integration:
variable "create_trigger" {
description = "Wether to allow triggers or not"
type = bool
default = false
}
variable "allowed_triggers" {
description = <<EOF
Map of allowed triggers to create Lambda permissions. Example:
allowed_triggers = {
approval_requests_creation_eventbus_rule = {
statement_id = "AllowExecutionFromEventBridge"
principal = "events.amazonaws.com"
source_arn = "arn:aws:events:eu-central-1::rule/eventbus/approval-requests-creation-rule"
}
}
EOF
type = map(any)
default = {}
}
resource "aws_lambda_permission" "lambda_permission" {
for_each = { for trigger, values in var.allowed_triggers : trigger => values if var.create_trigger }
function_name = aws_lambda_function.lambda_function.function_name
statement_id = try(each.value.statement_id, each.key)
action = try(each.value.action, "lambda:InvokeFunction")
principal = try(each.value.principal, format("%s.amazonaws.com", try(each.value.service, "")))
source_arn = try(each.value.source_arn, null)
source_account = try(each.value.source_account, null)
event_source_token = try(each.value.event_source_token, null)
}
Optimize the lambda cost
Did you know that lambda also can be deployed either via arm or x86 architectures? Terraforms default is x86 - I assume this is to prevent any compatibility issues. However, if you just use plain Python code you typically do not even need to worry about such issues. Hence I recommend enabling arm as your default architecture. This will save you some cost and also gives you a performance boost.
Optimize logging
Typically the log group for lambda functions has no retention enabled. Since this can cause some unnecessary costs I have the following solution to enable retention for your logs:
variable "log_retention" {
description = "Log retention in days. A null value results in infinite logging"
type = number
default = 30
}
resource "aws_cloudwatch_log_group" "lambda_log_group" {
count = var.log_retention != null ? 1 : 0
name = "/aws/lambda/${module.lambda_label.id}"
retention_in_days = var.log_retention
}
Summary
This article confirms that a lambda deployment is not as simple as just uploading some code and relying on aws to optimize and configure everything for you. In my opinion, Terraform isn´t designed for big serverless applications. However, with some additional effort, you can build a pretty solid deployment that can also offer a lot of the hidden features for serverless frameworks like AWS SAM. I hope you could learn something new or can use one of the code snippets in your future Terraform deployments.
Comments