Awesome

Apache Pulsar extension for Mongoose

Content

Introduction
Features
Deployment 3.1. Basic 3.2. Docker 3.2.1. Standalone 3.2.2. Distributed 3.2.2.1. Additional Node 3.2.2.2. Entry Node
Configuration 4.1. Specific Options 4.2. Tuning
Usage 5.1. Message Operations 5.1.1. Create 5.1.2. Read 5.1.2.1. Basic 5.1.2.2. Tail 5.1.2.3. End-to-end Latency 5.2. Topic Operations 5.2.1. Create 5.2.2. Read 5.2.3. Update 5.2.4. Delete
Open Issues
Development 7.1. Build 7.2. Test 7.2.1. Automated 7.2.1.1. Unit 7.2.1.2. Integration 7.2.1.3. Functional 7.2.2. Manual

1. Introduction

Pulsar	Mongoose
Message	Data Item
Topic	Item Path or Data Item

2. Features

Authentication: https://mongoose-issues.atlassian.net/browse/PULSAR-4
SSL/TLS: https://mongoose-issues.atlassian.net/browse/PULSAR-1
Item Types:
- data: corresponds to a message
- path: corresponds to a topic
- token: not supported
Supported load operations:
- create (messages)
- read (messages)
- updae (topics appending, TODO)
- delete (topics, TODO)
Storage-specific:
- End-to-end latency measurement

3. Deployment

3.1. Basic

Java 11+ is required to build/run.

Get the latest mongoose-base jar from the maven repo and put it to your working directory. Note the particular version, which is referred as BASE_VERSION below.
Get the latest mongoose-storage-driver-coop jar from the maven repo and put it to the ~/.mongoose/<BASE_VERSION>/ext directory.
Get the latest mongoose-storage-driver-pulsar jar from the maven repo and put it to the ~/.mongoose/<BASE_VERSION>/ext directory.

java -jar mongoose-base-<BASE_VERSION>.jar \
    --storage-driver-type=pulsar \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --storage-net-node-port=6650 \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    ...

3.2. Docker

3.2.1. Standalone

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    ...

3.2.2. Distributed

3.2.2.1. Additional Node

docker run \
    --network host \
    --expose 1099 \
    emcmongoose/mongoose-storage-driver-pulsar \
    --run-node

3.2.2.2. Entry Node

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --load-step-node-addrs=<ADDR1,ADDR2,...> \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    ...

4. Configuration

Reference

4.1. Specific Options

Name	Type	Default Value	Description
storage-driver-create-batch-enabled	boolean	`true`	Producer batching
storage-driver-create-batch-delayMicros	long integer	1000	Maximum publish latency (microseconds)
storage-driver-create-compression	enum	`none`	Should compress data upon messages create or not (default). The available compression types are defined by the Pulsar client
storage-driver-create-timestamp	boolean	`false`	If enabled, will record the message creation timestamp into the message's metadata. Required for end-to-end latency measurement
storage-driver-read-tail	boolean	`false`	Should read the latest message of the topic or read from the topic beginning (default). Should be `true` for end-to-end latency measurement
storage-net-tcpNoDelay	boolean	`true`	The option specifies whether the server disables the delay of sending successive small packets on the network.
storage-net-timeoutMilliSec	integer	`0`	Connection timeout. `0` means no timeout
storage-net-node-addrs	list of strings	`127.0.0.1`	The list of the storage node IPs or hostnames to use for HTTP load. May include port numbers.
storage-net-node-port	integer	`6650`	The common port number to access the storage nodes, may be overriden adding the port number to the storage-driver-addrs, for example: "127.0.0.1:6650,127.0.0.1:6651,..."

4.2. Tuning

load-batch-size Determines how many operations the driver will try to submit at once. Large values may increase both the throughput and the memory consumption.
storage-driver-concurrency-limit Determines how many operations may be in flight at every moment of the time. Specifying 0 will cause the "burst mode", when the driver submits as many operations as possible regardless their completion. The unlimited concurrency is useful for the sustained rate measurement but may cause either Pulsar or Mongoose inconsistent state.
storage-driver-threads Determines the Pulsar client IO worker threads count.

5. Usage

5.1. Message Operations

5.1.1. Create

Example, write 1KB messages to the topic "topic1" in the Pulsar instance w/ address 12.34.56.78:

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=12.34.56.78 \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    --item-data-size=1KB \
    --item-output-path=topic1

5.1.2. Read

Notes:

Generally, load-op-recycle option should be set to true to make the messages reading working.
Mongoose couldn't determine the end of the topic(s), so it's mandatory to specify the count/time limit.

5.1.2.1. Basic

Example, read 1M messages from the beginning of the topic "topic1":

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=12.34.56.78 \
    --load-batch-size=100 \
    --storage-driver-limit-concurrency=100 \
    --read \ 
    --item-input-path=topic1 \
    --load-op-recycle \
    --load-op-limit-count=1000000

5.1.2.2. Tail

Example, read all new messages from the topic "topic1" during the 1 minute:

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=12.34.56.78 \
    --load-batch-size=100 \
    --storage-driver-limit-concurrency=100 \
    --read \ 
    --item-input-path=topic1 \
    --storage-driver-read-tail \
    --load-op-recycle \
    --load-step-limit-time=1m

5.1.2.3. End-To-End Latency

Start writing the messages to some topic with enabled timestamps recording. Example command:

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --item-data-size=1KB \
    --item-output-path=topic1 \
    --load-batch-size=1000 \
    --storage-driver-create-timestamp

Start the tail read from the same topic:

docker run \
    --network host \
    --volume $PWD/log:/root/.mongoose/4.2.16/log
    emcmongoose/mongoose-storage-driver-pulsar \
    --load-batch-size=1000 \
    --read \
    --item-input-path=topic1 \
    --storage-driver-read-tail \
    --load-op-recycle \
    --load-step-id=e2e_test

Check the end-to-end time data in the log/e2e_test/op.trace.csv log file. The data is in the CSV format with 3 columns:

internal message id
message payload size
end-to-end time in milliseconds

Note: the end-to-end time data will not be aggregated in the distributed mode.

5.2. Topic Operations

TODO: https://mongoose-issues.atlassian.net/browse/PULSAR-2

5.2.1. Create

TODO

5.2.2. Read

TODO

5.2.3. Update

TODO

5.2.4. Delete

TODO

6. Open Issues

Please refer to the issue tracker

7. Development

7.1. Build

./gradlew clean jar

7.2. Test

7.2.1. Automated

7.2.1.1. Unit

./gradlew clean test

7.2.1.2. Integration

docker run -it \
    -p 6650:6650 \
    -p 8080:8080 \
    apachepulsar/pulsar:<PULSAR_VERSION> \
    bin/pulsar standalone
./gradlew integrationTest

7.2.1.3. Functional

./gradlew jar
export SUITE=api.storage
TEST=<TODO_TEST_NAME> ./gradlew robotest

7.2.2. Manual

Build the storage driver
Copy the storage driver's jar file into the mongoose's ext directory:

cp -f build/libs/mongoose-storage-driver-pulsar-*.jar ~/.mongoose/<MONGOOSE_BASE_VERSION>/ext/

Note that the Pulsar storage driver depends on the Coop Storage Driver extension so it should be also put into the ext directory 3. Install and run the Apache Pulsar

docker run -it \
    -p 6650:6650 \
    -p 8080:8080 \
    apachepulsar/pulsar:<PULSAR_VERSION> \
    bin/pulsar standalone

Run Mongoose's default scenario with some specific command-line arguments:

java -jar mongoose-<MONGOOSE_BASE_VERSION>.jar \
    --storage-driver-type=pulsar \
    --storage-net-node-port=6650 \
    --storage-driver-limit-concurrency=10 \
    --item-output-path=topic-0