Home

Awesome

Gitter chat Issue Tracker CI status Maven metadata URL Docker Pulls

Content

  1. Introduction<br/>
  2. Features<br/>
  3. Deployment<br/>   3.1. Basic<br/>   3.2. Docker<br/>     3.2.1. Standalone<br/>     3.2.2. Distributed<br/>       3.2.2.1. Additional Node<br/>       3.2.2.2. Entry Node<br/>
  4. Configuration<br/>   4.1. Specific Options<br/>   4.2. Tuning<br/>     4.2.1. Concurrency<br/>     4.2.2. Base Storage Driver Usage Warnings<br/>
  5. Usage<br/>   5.1. Event Stream Operations<br/>     5.1.1. Create<br/>       5.1.1.1. Transactional<br/>     5.1.2. Read<br/>     5.1.3. Update<br/>     5.1.4. Delete<br/>     5.1.5. End-to-end Latency<br/>   5.2. Byte Stream Operations<br/>     5.2.1. Create<br/>     5.2.2. Read<br/>     5.2.3. Update<br/>     5.2.4. Delete<br/>   5.3. Misc<br/>     5.3.1. Manual Scaling<br/>     5.3.2. Multiple Destination Streams<br/>
  6. Open Issues<br/>
  7. Development<br/>   7.1. Build<br/>   7.2. Test<br/>     7.2.1. Automated<br/>       7.2.1.1. Unit<br/>       7.2.1.2. Integration<br/>       7.2.1.3. Functional<br/>     7.2.2. Manual<br/>

1. Introduction

Mongoose and Pravega are using quite different concepts. So it's necessary to determine how Pravega-specific terms are mapped to the Mongoose abstractions.

PravegaMongoose
StreamItem Path or Data Item
ScopeStorage Namespace
EventData Item
Stream SegmentN/A

2. Features

3. Deployment

3.1. Basic

Java 11+ is required to build/run.

  1. Get the latest mongoose-base jar from the maven repo and put it to your working directory. Note the particular version, which is referred as BASE_VERSION below.

  2. Get the latest mongoose-storage-driver-preempt jar from the maven repo and put it to the ~/.mongoose/<BASE_VERSION>/ext directory.

  3. Get the latest mongoose-storage-driver-pravega jar from the maven repo and put it to the ~/.mongoose/<BASE_VERSION>/ext directory.

java -jar mongoose-base-<BASE_VERSION>.jar \
    --storage-driver-type=pravega \
    --storage-namespace=scope1 \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --storage-net-node-port=9090 \
    --load-batch-size=100 \
    --storage-driver-limit-queue-input=10000 \
    ...

3.2. Docker

3.2.1. Standalone

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pravega \
    --storage-namespace=scope1 \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --load-batch-size=100 \
    --storage-driver-limit-queue-input=10000 \
    ...

Use emcmongoose/mongoose-storage-driver-pravega:4.2.29 for pravega 0.7 or earlier.

3.2.2. Distributed

3.2.2.1. Additional Node

docker run \
    --network host \
    --expose 1099 \
    emcmongoose/mongoose-storage-driver-pravega \
    --run-node

3.2.2.2. Entry Node

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pravega \
    --load-step-node-addrs=<ADDR1,ADDR2,...> \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --storage-namespace=scope1 \
    --load-batch-size=100 \
    --storage-driver-limit-queue-input=10000 \
    ...

4. Configuration

4.1. Specific Options

NameTypeDefault ValueDescription
storage-driver-control-scopebooleantrueAllow to try to create scope
storage-driver-control-timeoutMillisinteger2000The timeout for any Pravega Controller API call
storage-driver-create-timestampbooleanfalseShould write 8 bytes at the beginning of each event. Required for the e2e latency mode as there is no current option to pass metadata with the event.
storage-driver-event-key-enabledbooleanfalseSpecifies if Mongoose should generate its own routing key during the events creation
storage-driver-event-key-countinteger0Specifies a max count of unique routing keys to use during the events creation (may be considered as a routing key period). 0 value means to use unique routing key for each new event
storage-driver-event-timeoutMillisinteger100The event read timeout in milliseconds
storage-driver-read-e2eModebooleanfalseEnables e2e mode for read. This enables tail read as well as writing data to a csv file.
storage-driver-read-tailbooleanfalseEnables tail read. Catch-up read by default.
storage-driver-scaling-typeenum"fixed"The scaling policy type to use (fixed/event_rate/kbyte_rate). See the Pravega documentation for details
storage-driver-scaling-rateinteger0The scaling policy target rate. May be measured in events per second either kilobytes per second depending on the scaling policy type
storage-driver-scaling-factorinteger0The scaling policy factor. From the Pravega javadoc: the maximum number of splits of a segment for a scale-up event.
storage-driver-scaling-segmentsinteger1From the Pravega javadoc: the minimum number of segments that a stream can have independent of the number of scale down events.
storage-driver-stream-dataenum"events"Work on events or byte streams (if bytes is set)
storage-net-node-addrslist of strings127.0.0.1The list of the Pravega storage nodes to use for the load
storage-net-node-portinteger9090The default port of the Pravega storage nodes, should be explicitly set to 9090 (the value used by Pravega by default)
storage-net-maxConnPerSegmentstoreinteger5The default amount of connections per each Pravega Segmentstore
storage-net-node-conn-poolingbooleantrueUse or not the connection pooling for the event writers See this Pravega issue for details

4.2. Tuning

4.2.1. Concurrency

There are two configuration options controlling the load operations concurrency level.

4.2.2. Base Storage Driver Usage Warnings

See the design notes

5. Usage

5.1. Event Stream Operations

Mongoose should perform the load operations on the events when the configuration option item-type is set to data.

5.1.1. Create

Write new events into the specified stream(s). The operation latency is measured as the time between the corresponding writeEvent call returns and the returned completion callback is triggered.

5.1.1.1. Transactional

Using the transactions to create the events allows to write the events in the batch mode. The maximum count of the events per transaction is defined by the load-batch-size configuration option value.

Example:

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pravega \
    --storage-namespace=scope1 \
    --storage-driver-event-batch \
    --load-step-limit-count=100000 \
    --load-batch-size=1024 \
    --item-output-path=eventsStream1 \
    --item-data-size=10KB

Note that in this mode the operation (transaction) latency is equal to the duration.

5.1.2. Read

Notes:

There is a configuration parameter called storage-driver-read-timeoutMillis. Pravega documentation says it only works when there is no available event in the stream. readNextEvent() will block for the specified time in ms. So, in theory 0 and 1 should work just fine. They do not so far. In practice, this value should be somewhere around 2000 ms (it is the Pravega default value).

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pravega \
    --storage-namespace=scope1 \
    --item-input-path=stream3 \
    --load-batch-size=100 \
    --storage-driver-limit-concurrency=0 \
    --load-op-type=read \
    --load-op-recycle  

Right now only single-stream reading is supported. Each thread from --storage-driver-threads parameter will be taking a reader from a joint pool. All readers will be in the same readerGroup. So, it is user's responsibility to define an amount of threads the way that there aren't any readers with no EventSegmentReaders. Which happens when there are no segments assigned to a reader as each segment is assigned to only one reader within a readerGroup.

It's important to know that Mongoose's Item generator is driver agnostic. So it has no knowledge about the fact that reader can be reading basically nothing as it doesn't have an assigned EventSegmentReader, but it will still receive its portion of Items. This way if you set e.g. 5 mongoose threads while having 1 stream segment --load-batch-size=100, you are not going to get 100 successful operations as part of them wouldn't retrieve any data. Considering this case you might want to use timeouts for automation instead of op-count.

To achieve maximum efficiency the main point to be considered is auto-scaling. If it's disabled, then the best match of threads is amount of segments in the stream, which is constant during the reading. With auto-scaling enabled, it is yet to be found out what the best match is.

There are two ways reading can be done:

5.1.3. Update

Not supported. A stream append may be performed using the create load operation type and a same stream previously used to write the events.

5.1.4. Delete

Not supported.

5.1.5. End-to-end Latency

The end-to-end latency is a time span between the CREATE and READ operations executed for the same item. The end-to-end latency may be measured using e2e latency mode:

  1. Start writing the messages to a stream with enabled timestamps recording. Example command:
       --network host \
       emcmongoose/mongoose-storage-driver-pravega \
       --storage-namespace=scope1 \
       --load-service-threads=1 \
       --storage-driver-limit-concurrency=0
       --item-data-size=10B \
       --item-output-path=stream1 \
       --load-batch-size=1000 \
       --storage-driver-limit-queue-input=1000 \
       --storage-driver-create-timestamp \
       --load-op-limit-rate=100000

The last parameter is needed to make writing slower than reading, as it is not always the case. While using e2e latency mode the maximum throughput is not the main point of interest, so this behaviour can be allowed.

  1. Start the e2eMode reading from the same stream:
      --network host \
      emcmongoose/mongoose-storage-driver-pravega \
      --storage-namespace=scope1 \
      --load-batch-size=1000 \
      --load-service-threads=1 \
      --storage-driver-limit-concurrency=0 \
      --load-op-type=read \
      --item-input-path=stream1 \
      --storage-driver-read-e2eMode \
      --load-op-recycle \
      --load-step-id=e2e_test
  1. Check the end-to-end time data in the log/e2e_test/op.trace.csv log file. The data is in the CSV format with 3 columns:

Note: the end-to-end time data will not be aggregated in the distributed mode.

5.2. Byte Stream Operations

Mongoose should perform the load operations on the streams when the configuration option storage-driver-stream-data is set to bytes. This means that the whole streams are being accounted as items.

Currently, unsupported for Pravega 0.8+.

5.2.1. Create

Creates the byte streams. The created byte stream is filled with content up to the size determined by the item-data-size option. The create operation will fail with the status code #7 if the stream existed before.

Example:

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pravega \
    --storage-driver-stream-data=bytes \
    --storage-namespace=scope1 \
    --storage-driver-limit-concurrency=100 \
    --storage-driver-threads=100

5.2.2. Read

Reads the byte streams.

Example:

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pravega \
    --item-input-file=streams.csv \
    --read \
    --storage-driver-stream-data=bytes \
    --storage-driver-limit-concurrency=10 \
    --storage-driver-threads=10 \
    --storage-namespace=scope1

It's also possible to perform the byte streams read w/o the input stream items file:

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pravega \
    --item-input-path=scope1 \
    --read \
    --storage-driver-stream-data=bytes \
    --storage-driver-limit-concurrency=10 \
    --storage-driver-threads=10 \
    --storage-namespace=scope1

All streams in the specified scope are listed and analyzed for the current size before the reading.

5.2.3. Update

Not implemented yet

5.2.4. Delete

Before the deletion, the stream must be sealed because of Pravega concepts. So the sealing of the stream is done during the deletion too.

5.3. Misc

5.3.1. Manual Scaling

It's required to make a manual destination stream scaling while the event writing load is in progress in order to see if the rate changes. The additional load step may be used to perform such scaling. In order to not perform any additional load it should be explicitly configured to do a minimal work:

For more details see the corresponding scenario content.

5.3.2. Multiple Destination Streams

The configuration expression language feature may be used to specify multiple destination streams to write the events. The example of the command to write the events into 1000 destination streams (in the random order):

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pravega \
    --storage-namespace=scope1 \
    --item-data-size=1000 \
    --item-output-path=stream-%p\{1000\;1\}

6. Open Issues

IssueDescription

7. Development

7.1. Build

Note the Pravega commit # which should be used to build the corresponding Mongoose plugin. Specify the required Pravega commit # in the build.gradle file. Then run:

./gradlew clean jar

7.2. Test

7.2.1. Automated

7.2.1.1. Unit

./gradlew clean test

7.2.1.2. Integration

docker run -d --name=storage --network=host pravega/pravega:<PRAVEGA_VERSION> standalone
./gradlew integrationTest

7.2.1.3. Functional

./gradlew jar
export SUITE=api.storage
TEST=create_event_stream ./gradlew robotest
TEST=create_byte_streams ./gradlew robotest
TEST=read_byte_streams ./gradlew robotest
TEST=read_all_byte_streams ./gradlew robotest
TEST=create_event_transactional_stream ./gradlew robotest

7.2.1. Manual

  1. Build the storage driver
  2. Copy the storage driver's jar file into the mongoose's ext directory:
cp -f build/libs/mongoose-storage-driver-pravega-*.jar ~/.mongoose/<MONGOOSE_BASE_VERSION>/ext/

Note that the Pravega storage driver depends on the Preempt Storage Driver extension so it should be also put into the ext directory 3. Build and install the corresponding Pravega version:

./gradlew pravegaExtract
  1. Run the Pravega standalone node:
build/pravega_/bin/pravega-standalone
  1. Run Mongoose's default scenario with some specific command-line arguments:
java -jar mongoose-<MONGOOSE_BASE_VERSION>.jar \
    --storage-driver-type=pravega \
    --storage-net-node-port=9090 \
    --storage-driver-limit-concurrency=10 \
    --item-output-path=goose-events-stream-0

8. CI

CI is located here:

8.1. CI Runner

It is often the case that robotests are not completed successfully when shared ci runner is used. So, one can create and attach his own ci runner with higher resources.

  1. Start the runner:
docker run -d --name gitlab-runner --restart always \
    -v /srv/gitlab-runner/config:/etc/gitlab-runner \
    -v /var/run/docker.sock:/var/run/docker.sock \
    gitlab/gitlab-runner:latest
  1. Register the runner:
docker run --rm -it -v /srv/gitlab-runner/config:/etc/gitlab-runner gitlab/gitlab-runner register -n \
    --url https://gitlab.com/ \
    --registration-token <token> \
    --executor "docker" \
    --docker-image "docker:18.09.7" \
    --description "mong-runner" \
    --docker-privileged \
    -- tag pravega

One main thing why it's duplicated here is to remind that as we build aan image using dind, we need to set privileged mode for the internal docker process.