Home

Awesome

Lambda Stash

npm version Travis CI test status Coverage Status

lambda-stash is an AWS Lambda script for shipping data from S3 or other cloud data sources to data stores, like Elasticsearch.

Features

Set Up

  1. Set up a Lambda deployment package that includes lambda-stash and a script with your configuration. See the example included (under example/) and the configuration documentation below to get started.

  2. Use the AWS Management Console to create a Lambda function using the Node.js 12.x runtime. Upload your package, configure it with event sources as desired (S3 buckets or CloudWatch logs). Be sure the Lambda function has an IAM role with any necessary permissions, like getting data from an S3 bucket or accessing an AWS Elasticsearch domain.

  3. Check CloudWatch Logs, where a Log Group should be created once the Lambda script runs the first time. Review the logs to make sure the script finishes successfully within the time allowed. You can set up a CloudWatch Alarm to be notified if an error occurs and to track for how long the function is running. The elasticsearch client timeout default is 30 seconds. You can set config.elasticsearch.requestTimeout to milliseconds or 'Infinity'.

Configuration

lambda-stash is intended to be implemented by a script that provides the configuration, such as what processors to use for specific events.

See the included example in example/ and the handlers documentation below for details on how to implement the handlers.

Handlers

convertString

Converts an array of objects to key-value strings with the format: prefix key1="value1" key2="value2" ... suffix

Inputs:

Outputs:

decodeBase64

Decodes a Base64 encoded string.

Inputs:

Outputs:

decompressGzip

Decompresses gzipped data.

Inputs:

Outputs:

formatCloudfront

Processes parsed data from Cloudfront access logs and normalizes it in key-value objects.

Raw Cloudfront access log files in S3 should be processed with decompressGzip and parseTabs before using this format handler.

Inputs:

Outputs:

formatCloudtrail

Processes parsed data from Cloudtrail logs and normalizes it in key-value objects.

Raw Cloudtrail access log files in S3 should be processed with decompressGzip and parseJson before using this format handler.

Inputs:

Outputs:

formatCloudwatchLogs

Processes parsed data from CloudWatch logs and normalizes it in key-value objects. Includes handling for Lambda function start, end, and report logs.

This handler is meant to be used with data provided through a CloudWatch Logs subscription event, which is automatically processed with decodeBase64, decompressGzip, and parseJson.

Inputs:

Outputs:

formatConfig

Processes parsed data from AWS Config logs and normalizes it in key-value objects.

Raw Config log files in S3 should be processed with decompressGzip and parseJson before using this format handler.

Inputs:

Outputs:

formatELBv1

Processes parsed data from AWS Elastic Load Balancer (ELB) Classic Load Balancer (ELB version 1) logs and normalizes it in key-value objects.

Raw ELB log files in S3 should be processed with parseSpaces before using this format handler.

Inputs:

Outputs:

formatELBv2

Processes parsed data from AWS Elastic Load Balancer (ELB) Application Load Balancer (ELB version 2) logs and normalizes it in key-value objects.

Raw ELB log files in S3 should be processed with parseSpaces before using this format handler.

Inputs:

Outputs:

formatS3Access

Processes parsed data from AWS Simple Storage Service (S3) access logs and normalizes it in key-value objects.

Raw S3 log files should be processed with parseSpaces before using this format handler.

Inputs:

Outputs:

getS3Object

Fetches an object from S3.

Inputs:

Outputs:

outputJsonLines

Transforms output data into a series of JSON strings delimited by a newline. This output format is compatible with the Loggly HTTP/S bulk endpoint.

Inputs:

Outputs:

parseCsv

Parses CSV data into an array of records, each of which is an array of column values.

Inputs:

Outputs:

parseJson

Parses a JSON string into its equivalent JavaScript object.

Inputs:

Outputs:

parseTabs

Parses tab separated value data into an array of records, each of which is an array of column values.

Inputs:

Outputs:

shipElasticsearch

Ships the data to an Elasticsearch index. The Elasticsearch _index and _type values are configurable, and support for the AWS Elasticsearch request signing process is provided.

Inputs:

shipHttp

Ships the data to an HTTP/S endpoint.

Inputs:

shipTcp

Ships the data to a TCP service.

Inputs:

Recipes

Ship Cloudtrail logs to AWS Elasticsearch

The following script can be used to ship Cloudtrail logs from S3 to an AWS Elasticsearch instance.

var shipper = require('lambda-stash');
var config = {
  elasticsearch: {
    host: 'https://search-test-es-abcdef...',
    region: 'us-east-1',
    useAWS: true
  },
  mappings: [
    {
      bucket: 'my-cloudtrail-logs',
      processors: [
        'decompressGzip',
        'parseJson',
        'formatCloudtrail',
        'shipElasticsearch'
      ],
      elasticsearch: {
        index: 'logs',
        type: 'cloudtrail'
      },
      dateField: 'date'
    }
  ]
};
shipper.handler(config, event, context, callback);