Awesome

Processing objects from S3 Events with Lambda function

Structure

handler.py is the entrypoint which drives the behavior of the processor.py module.

Use Cases

This project could be used as a template to receive files on a given bucket. Files that are successfully processed are copied to a "processed" bucket, while errored files go to the "bad data" bucket. Object keys may need to be rewritten in order to avoid collisions and overwriting.

Setup

virtualenv -p python3 env, source env/bin/activate, and pip install boto3 to get your local environment ready. The AWS Lambda runtime already includes Boto3
Make sure you name the service that's likely to be globally unique. S3 bucket names are derived from the service name
Customize the processor.py to how you want to process the incoming files. Two output buckets are available for processed data and errored data. Optionally keep a record of objects with DynamoDB.

Deploy

sls deploy!
Check the CloudFormation outputs to find the name of the bucket that accepts incoming files.

Scaling

AWS Lambda

By default, AWS Lambda limits the total concurrent executions across all functions within a given region to 100. The default limit is a safety limit that protects you from costs due to potential runaway or recursive functions during initial development and testing. To increase this limit above the default, follow the steps in To request a limit increase for concurrent executions.

DynamoDB

When you create a table, you specify how much provisioned throughput capacity you want to reserve for reads and writes. DynamoDB will reserve the necessary resources to meet your throughput needs while ensuring consistent, low-latency performance. You can change the provisioned throughput and increasing or decreasing capacity as needed.

This is can be done via settings in the serverless.yml.

  ProvisionedThroughput:
    ReadCapacityUnits: 1
    WriteCapacityUnits: 1