Home

Awesome

onyx-amazon-s3

Onyx plugin for Amazon S3.

Installation

In your project file:

[org.onyxplatform/onyx-amazon-s3 "0.14.5.1-SNAPSHOT"]

Functions

Input Task

In your peer boot-up namespace:

(:require [onyx.plugin.s3-input])

Catalog entry:

{:onyx/name task-name
 :onyx/plugin :onyx.plugin.s3-input/input
 :onyx/type :input
 :onyx/medium :s3
 :onyx/batch-size 20
 :onyx/max-peers 1
 :s3/bucket "mybucket"
 :s3/prefix "filter-prefix/example/"
 :s3/deserializer-fn :my.ns/deserializer-fn
 :s3/buffer-size-bytes 10000000
 :onyx/doc "Reads segments from keys in an S3 bucket."}

Lifecycle entry:

{:lifecycle/task <<TASK_NAME>>
 :lifecycle/calls :onyx.plugin.s3-input/s3-input-calls}

Attributes

keytypedescription
:s3/bucketstringThe name of the s3 bucket to read objects from.
:s3/deserializer-fnkeywordA namespaced keyword pointing to a fully qualified function that will deserialize from bytes to segments. Currently only reading from newline separated values is supported, thus the serializer must deserialize line by line.
:s3/prefixstringFilter the keys to be read by a supplied prefix.
:s3/file-keystringWhen set, includes the S3 key of file from which the segment's line was read under this key.
Output Task

In your peer boot-up namespace:

(:require [onyx.plugin.s3-output])

Catalog entry:

{:onyx/name <<TASK_NAME>>
 :onyx/plugin :onyx.plugin.s3-output/output
 :s3/bucket <<BUCKET_NAME>>
 :s3/encryption :none
 :s3/serializer-fn :my.ns/serializer-fn
 :s3/key-naming-fn :onyx.plugin.s3-output/default-naming-fn
 :s3/prefix "filter-prefix/example/"
 :s3/prefix-separator "/"
 :s3/serialize-per-element? false
 :s3/max-concurrent-uploads 20
 :onyx/type :output
 :onyx/medium :s3
 :onyx/batch-size 20
 :onyx/doc "Writes segments to s3 files, one file per batch"}

Segments received by this task must be serialized to bytes by the :s3/serializer-fn, into a file per batch, placed at a key in the bucket which is named via the function defined at :s3/key-naming-fn. This function takes an event map and returns a string. Using the default naming function, :onyx.plugin.s3-output/default-naming-fn, will name keys in the following format in UTC time format: "yyyy-MM-dd-hh.mm.ss.SSS_batch_BATCH_UUID".

You can define :s3/encryption to be :aes256 if your S3 bucket has encryption enabled. The default value is :none.

When :s3/serialize-per-element? is set to true, the serializer will be called on each individual segmnt, rather than the whole batch, and will be separated by the string value set in :s3/serialize-per-element-separator.

An alternative upload mode can be selected via :s3/multi-upload documented below. When this option is used segments will be partitioned into different objects via a grouping key.

Lifecycle entry:

{:lifecycle/task <<TASK_NAME>>
 :lifecycle/calls :onyx.plugin.s3-output/s3-output-calls}

Attributes

keytypedescription
:s3/bucketstringThe name of the s3 bucket to write to
:s3/serializer-fnkeywordA namespaced keyword pointing to a fully qualified function that will serialize the batch of segments to bytes
:s3/key-naming-fnkeywordA namespaced keyword pointing to a fully qualified function that be supplied with the Onyx event map, and produce an s3 key for the batch.
:s3/prefixstringA prefix to prepend to the keys generated by :s3/key-naming-fn.
:s3/prefix-separatorstringA separator to add after :s3/prefix and before the result of :s3/key-naming-fn. Defaults to "/".
:s3/multi-uploadbooleanFlag that causes the plugin to group the batch of segments by :s3/prefix-key, and upload an object per group.
:s3/prefix-keyanyUsed with s3/multi-upload. Key to batch segments into prefixed objects via e.g. [{:a 3 :k "batch1"}{:a 2 :k "batch2"}] will cause two objects to be uploaded under prefix "batch1" and "batch2".
:s3/content-typestringOptional content type for value
:s3/encryptionkeywordOptional server side encryption setting. One of :sse256 or :none.
:s3/regionstringThe S3 region to write objects to.
:s3/endpoint-urlstringThe S3 endpoint-url to connect to (for S3-compatible storage solutions).
:s3/max-concurrent-uploadsintegerMaximum number of simultaneous uploads.
:s3/serialize-per-element?booleanFlag for whether to serialize as an entire batch, or serialize per element and separate by newline characters.
:s3/serialize-per-element-separatorstringString to separate per element strings with. Defaults to newline charactor.

Acknowledgments

Many thanks to AdGoji for allowing this work to be open sourced and contributed back to the Onyx Platform community.

Contributing

Pull requests into the master branch are welcomed.

License

Copyright © 2017 Distributed Masonry LLC

Distributed under the Eclipse Public License, the same as Clojure.