Home

Awesome

AWS Lambda Power Tuning

Build Status Coverage Status Maintenance GitHub issues Open Source Love svg2

AWS Lambda Power Tuning is a state machine powered by AWS Step Functions that helps you optimize your Lambda functions for cost and/or performance in a data-driven way.

The state machine is designed to be easy to deploy and fast to execute. Also, it's language agnostic so you can optimize any Lambda functions in your account.

Basically, you can provide a Lambda function ARN as input and the state machine will invoke that function with multiple power configurations (from 128MB to 10GB, you decide which values). Then it will analyze all the execution logs and suggest you the best power configuration to minimize cost and/or maximize performance.

Please note that the input function will be executed in your AWS account - performing real HTTP requests, SDK calls, cold starts, etc. The state machine also supports cross-region invocations and you can enable parallel execution to generate results in just a few seconds.

What does the state machine look like?

It's pretty simple and you can visually inspect each step in the AWS management console.

state-machine

What results can I expect from Lambda Power Tuning?

The state machine will generate a visualization of average cost and speed for each power configuration.

For example, this is what the results look like for two CPU-intensive functions, which become cheaper AND faster with more power:

visualization1

How to interpret the chart above: execution time goes from 35s with 128MB to less than 3s with 1.5GB, while being 14% cheaper to run.

visualization2

How to interpret the chart above: execution time goes from 2.4s with 128MB to 300ms with 1GB, for the very same average cost.

How to deploy the state machine

There are 5 deployment options for deploying the tool using Infrastructure as Code (IaC).

  1. The easiest way is to deploy the app via the AWS Serverless Application Repository (SAR).
  2. Using the AWS SAM CLI
  3. Using the AWS CDK
  4. Using Terraform by Hashicorp and SAR
  5. Using native Terraform

Read more about the deployment options here.

State machine configuration (at deployment time)

The CloudFormation template (used for option 1 to 4) accepts the following parameters:

<div style="width:260px">Parameter</div>Description
PowerValues<br>type: list of numbers<br>default: [128,256,512,1024,1536,3008]These power values (in MB) will be used as the default in case no powerValues input parameter is provided at execution time.
visualizationURL<br>type: string<br>default: lambda-power-tuning.showThe base URL for the visualization tool, you can bring your own visualization tool.
totalExecutionTimeout<br>type: number<br>default: 300The timeout in seconds applied to all functions of the state machine.
lambdaResource<br>type: string<br>default: *The Resource used in IAM policies; it's * by default but you could restrict it to a prefix or a specific function ARN.
permissionsBoundary<br>type: string<br>The ARN of a permissions boundary (policy), applied to all functions of the state machine.
payloadS3Bucket<br>type: string<br>The S3 bucket name used for large payloads (>256KB); if provided, it's added to a custom managed IAM policy that grants read-only permission to the S3 bucket; more details in the S3 payloads section.
payloadS3Key<br>type: string<br>default: *The S3 object key used for large payloads (>256KB); the default value grants access to all S3 objects in the bucket specified with payloadS3Bucket; more details in the S3 payloads section.
layerSdkName<br>type: string<br>The name of the SDK layer, in case you need to customize it (optional).
logGroupRetentionInDays<br>type: number<br>default: 7The number of days to retain log events in the Lambda log groups (a week by default).
securityGroupIds<br>type: list of SecurityGroup IDs<br>List of Security Groups to use in every Lambda function's VPC Configuration (optional); please note that your VPC should be configured to allow public internet access (via NAT Gateway) or include VPC Endpoints to the Lambda service.
subnetIds<br>type: list of Subnet IDs<br>List of Subnets to use in every Lambda function's VPC Configuration (optional); please note that your VPC should be configured to allow public internet access (via NAT Gateway) or include VPC Endpoints to the Lambda service.
stateMachineNamePrefix<br>type: string<br>default: powerTuningStateMachineAllows you to customize the name of the state machine. Maximum 43 characters, only alphanumeric (plus - and _). The last portion of the AWS::StackId will be appended to this value, so the full name will look like powerTuningStateMachine-89549da0-a4f9-11ee-844d-12a2895ed91f. Note: StateMachineName has a maximum of 80 characters and 36+1 from the StackId are appended, allowing 43 for a custom prefix.

Please note that the total execution time should stay below 300 seconds (5 min), which is the default timeout. You can estimate the total execution timeout based on the average duration of your functions. For example, if your function's average execution time is 5 seconds and you haven't enabled parallelInvocation, you should set totalExecutionTimeout to at least num * 5: 50 seconds if num=10, 500 seconds if num=100, and so on. If you have enabled parallelInvocation, usually you don't need to tune the value of totalExecutionTimeout unless your average execution time is above 5 min. If you have set a sleep between invocations, remember to include that in your timeout calculations.

How to run the state machine

You can run the state machine manually or programmatically, see the documentation here.

State machine input (at execution time)

Each execution of the state machine will require an input where you can define the following input parameters:

<div style="width:260px">Parameter</div>Description
lambdaARN (required)<br/>type: string<br/>Unique identifier of the Lambda function you want to optimize.
num (required)<br/>type: integerThe # of invocations for each power configuration (minimum 5, recommended: between 10 and 100).
powerValues<br/>type: string or list of integersThe list of power values to be tested; if not provided, the default values configured at deploy-time are used; you can provide any power values between 128MB and 10,240MB. ⚠️ New AWS accounts have reduced concurrency and memory quotas (3008MB max).
payload<br/>type: string, object, or listThe static payload that will be used for every invocation (object or string); when using a list, a weighted payload is expected in the shape of [{"payload": {...}, "weight": X }, {"payload": {...}, "weight": Y }, {"payload": {...}, "weight": Z }], where the weights X, Y, and Z are treated as relative weights (not percentages); more details in the Weighted Payloads section.
payloadS3<br/>type: stringAn Amazon S3 object reference for large payloads (>256KB), formatted as s3://bucket/key; it requires read-only IAM permissions, see payloadS3Bucket and payloadS3Key below and find more details in the S3 payloads section.
parallelInvocation<br/>type: boolean<br/>default: falseIf true, all the invocations will run in parallel. ⚠️ Note: depending on the value of num, you might experience throttling.
strategy<br/>type: string<br/>default: "cost"It can be "cost" or "speed" or "balanced"; if you use "cost" the state machine will suggest the cheapest option (disregarding its performance), while if you use "speed" the state machine will suggest the fastest option (disregarding its cost). When using "balanced" the state machine will choose a compromise between "cost" and "speed" according to the parameter "balancedWeight".
balancedWeight<br/>type: number<br/>default: 0.5Parameter that represents the trade-off between cost and speed. Value is between 0 and 1, where 0.0 is equivalent to "speed" strategy, 1.0 is equivalent to "cost" strategy.
autoOptimize<br/>type: boolean<br/>default: falseIf true, the state machine will apply the optimal configuration at the end of its execution.
autoOptimizeAlias<br/>type: stringIf provided - and only if autoOptimize is true, the state machine will create or update this alias with the new optimal power value.
dryRun<br/>type: boolean<br/>default: falseIf true, the state machine will invoke the input function only once and disable every functionality related to logs analysis, auto-tuning, and visualization; this is intended for testing purposes, for example to verify that IAM permissions are set up correctly.
preProcessorARN<br/>type: stringThe ARN of a Lambda function that will be invoked before every invocation of lambdaARN; more details in the Pre/Post-processing functions section.
postProcessorARN<br/>type: stringThe ARN of a Lambda function that will be invoked after every invocation of lambdaARN; more details in the Pre/Post-processing functions section.
discardTopBottom<br/>type: number<br/>default: 0.2By default, the state machine will discard the top/bottom 20% of "outlier invocations" (the fastest and slowest) to filter out the effects of cold starts and remove any bias from overall averages. You can customize this parameter by providing a value between 0 and 0.4, where 0 means no results are discarded and 0.4 means 40% of the top/bottom results are discarded (i.e. only 20% of the results are considered).
sleepBetweenRunsMs<br/>type: integerIf provided, the time in milliseconds that the tuner will sleep/wait after invoking your function, but before carrying out the Post-Processing step, should that be provided. This could be used if you have aggressive downstream rate limits you need to respect. By default this will be set to 0 and the function won't sleep between invocations. This has no effect if running the invocations in parallel.
disablePayloadLogs<br/>type: boolean<br/>default: falseIf true, suppresses payload from error messages and logs. If preProcessorARN is provided, this also suppresses the output payload of the pre-processor.
includeOutputResults<br/>type: boolean<br/>default: falseIf true, the average cost and average duration for every power value configuration will be included in the state machine output.
onlyColdStarts<br/>type: boolean<br/>default: falseIf true, the tool will force all invocations to be cold starts. The initialization phase will be considerably slower as num versions/aliases need to be created for each power value.

Here's a typical state machine input with basic parameters:

{
    "lambdaARN": "your-lambda-function-arn",
    "powerValues": [128, 256, 512, 1024],
    "num": 50,
    "payload": {}
}

State Machine Output

The state machine will return the following output:

{
  "results": {
    "power": "128",
    "cost": 0.0000002083,
    "duration": 2.9066666666666667,
    "stateMachine": {
      "executionCost": 0.00045,
      "lambdaCost": 0.0005252,
      "visualization": "https://lambda-power-tuning.show/#<encoded_data>"
    },
    "stats": [{ "averagePrice": 0.0000002083, "averageDuration": 2.9066666666666667, "value": 128}, ... ]
  }
}

More details on each value:

Data visualization

You can visually inspect the tuning results to identify the optimal tradeoff between cost and performance.

visualization

The data visualization tool has been built by the community: it's a static website deployed via AWS Amplify Console and it's free to use. If you don't want to use the visualization tool, you can simply ignore the visualization URL provided in the execution output. No data is ever shared or stored by this tool.

Website repository: matteo-ronchetti/aws-lambda-power-tuning-ui

Optionally, you could deploy your own custom visualization tool and configure the CloudFormation Parameter named visualizationURL with your own URL.

Additional features, considerations, and internals

Here you can find out more about some advanced features of this project, its internals, and some considerations about security and execution cost.

Contributing

Feature requests and pull requests are more than welcome!

How to get started with local development?

For this repository, install dev dependencies with npm install. You can run tests with npm test, linting with npm run lint, and coverage with npm run coverage. Unit tests will run automatically on every commit and PR.