Home

Awesome

Lambda function to convert a tar-gzipped set of pnm files into one OCRed PDF and upload it go Google Drive.

Designed to use it as processing step after scanning. See this very complete blogpost how to use it together with a scanner via raspberry pi.

Installation

Prerequisites

Before you start, you'll need..

Lambda function

Set up a lambda function with

You have two options for storing your OCRed PDFs:

  1. Google drive:
    • You need to first create a google api in the developers console, and turn on the google drive api as described here.
    • Copy the resulting client_secret.json into this projects root, then pip install oauth2client and then run python scripts/get_drive_credentials. Now, copy-paste the resulting values into the environment variables. This grants your lambda function to create files in your google drive and to access the files it created (which it won't need). See here for more details about the right you're granting.
    • Optional: If you wish your PDFs to be stored in a specific folder, go to that folder in your google drive, copy the part in the url after /folders/ and put that into an additional environment variabled named GDRIVE_FOLDER
  2. S3: This is a lot easier as you'll only need to create an s3 bucket (in the same region as your lambda function) and add these lines to your policy (replace <dest-bucket>):
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": ["arn:aws:s3:::<dest-bucket>/*"]
    }
    
    Then set the environment variables UPLOAD_TYPE=s3 and S3_BUCKET to the name of the just created bucket.

Now, upload the zip file to lambda:

Test

Upload tar.gz with 1 or more pnm files into <bucket-name>. then add new test handler with this:

{
  "Records": [
    {
      "s3": {
        "bucket": {
          "name": "<source-bucket>"
        },
        "object": {
          "key": "<filename-you-just-uploaded>.tar.gz"
        }
      }
    }
  ]
}

Add trigger

From the Add triggers menu on the left choose S3, then in Configure trigger dialogue:

Build lambda function

cd root/of/repo
virtualenv --python=python3.6 .
pip install -r requirements.txt
scripts/zip.sh
aws s3 cp ocr-lambda.zip s3://<s3-bucket>/
aws lambda update-function-code --function-name <lambda-name> --s3-bucket <s3-bucket> --s3-key ocr-lambda.zip

Further docs