Home

Awesome

Gitter chat Issue Tracker CI status Maven metadata URL Docker Pulls

Apache Pulsar extension for Mongoose

Content

  1. Introduction<br/>
  2. Features<br/>
  3. Deployment<br/>   3.1. Basic<br/>   3.2. Docker<br/>     3.2.1. Standalone<br/>     3.2.2. Distributed<br/>       3.2.2.1. Additional Node<br/>       3.2.2.2. Entry Node<br/>
  4. Configuration<br/>   4.1. Specific Options<br/>   4.2. Tuning<br/>
  5. Usage<br/>   5.1. Message Operations<br/>     5.1.1. Create<br/>     5.1.2. Read<br/>       5.1.2.1. Basic<br/>       5.1.2.2. Tail<br/>       5.1.2.3. End-to-end Latency<br/>   5.2. Topic Operations<br/>     5.2.1. Create<br/>     5.2.2. Read<br/>     5.2.3. Update<br/>     5.2.4. Delete<br/>
  6. Open Issues<br/>
  7. Development<br/>   7.1. Build<br/>   7.2. Test<br/>     7.2.1. Automated<br/>       7.2.1.1. Unit<br/>       7.2.1.2. Integration<br/>       7.2.1.3. Functional<br/>     7.2.2. Manual<br/>

1. Introduction

PulsarMongoose
MessageData Item
TopicItem Path or Data Item

2. Features

3. Deployment

3.1. Basic

Java 11+ is required to build/run.

  1. Get the latest mongoose-base jar from the maven repo and put it to your working directory. Note the particular version, which is referred as BASE_VERSION below.

  2. Get the latest mongoose-storage-driver-coop jar from the maven repo and put it to the ~/.mongoose/<BASE_VERSION>/ext directory.

  3. Get the latest mongoose-storage-driver-pulsar jar from the maven repo and put it to the ~/.mongoose/<BASE_VERSION>/ext directory.

java -jar mongoose-base-<BASE_VERSION>.jar \
    --storage-driver-type=pulsar \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --storage-net-node-port=6650 \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    ...

3.2. Docker

3.2.1. Standalone

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    ...

3.2.2. Distributed

3.2.2.1. Additional Node

docker run \
    --network host \
    --expose 1099 \
    emcmongoose/mongoose-storage-driver-pulsar \
    --run-node

3.2.2.2. Entry Node

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --load-step-node-addrs=<ADDR1,ADDR2,...> \
    --storage-net-node-addrs=<NODE_IP_ADDRS> \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    ...

4. Configuration

Reference

4.1. Specific Options

NameTypeDefault ValueDescription
storage-driver-create-batch-enabledbooleantrueProducer batching
storage-driver-create-batch-delayMicroslong integer1000Maximum publish latency (microseconds)
storage-driver-create-compressionenumnoneShould compress data upon messages create or not (default). The available compression types are defined by the Pulsar client
storage-driver-create-timestampbooleanfalseIf enabled, will record the message creation timestamp into the message's metadata. Required for end-to-end latency measurement
storage-driver-read-tailbooleanfalseShould read the latest message of the topic or read from the topic beginning (default). Should be true for end-to-end latency measurement
storage-net-tcpNoDelaybooleantrueThe option specifies whether the server disables the delay of sending successive small packets on the network.
storage-net-timeoutMilliSecinteger0Connection timeout. 0 means no timeout
storage-net-node-addrslist of strings127.0.0.1The list of the storage node IPs or hostnames to use for HTTP load. May include port numbers.
storage-net-node-portinteger6650The common port number to access the storage nodes, may be overriden adding the port number to the storage-driver-addrs, for example: "127.0.0.1:6650,127.0.0.1:6651,..."

4.2. Tuning

5. Usage

5.1. Message Operations

5.1.1. Create

Example, write 1KB messages to the topic "topic1" in the Pulsar instance w/ address 12.34.56.78:

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=12.34.56.78 \
    --load-batch-size=1000 \
    --storage-driver-limit-concurrency=1000 \
    --item-data-size=1KB \
    --item-output-path=topic1

5.1.2. Read

Notes:

  1. Generally, load-op-recycle option should be set to true to make the messages reading working.
  2. Mongoose couldn't determine the end of the topic(s), so it's mandatory to specify the count/time limit.

5.1.2.1. Basic

Example, read 1M messages from the beginning of the topic "topic1":

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=12.34.56.78 \
    --load-batch-size=100 \
    --storage-driver-limit-concurrency=100 \
    --read \ 
    --item-input-path=topic1 \
    --load-op-recycle \
    --load-op-limit-count=1000000

5.1.2.2. Tail

Example, read all new messages from the topic "topic1" during the 1 minute:

docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --storage-net-node-addrs=12.34.56.78 \
    --load-batch-size=100 \
    --storage-driver-limit-concurrency=100 \
    --read \ 
    --item-input-path=topic1 \
    --storage-driver-read-tail \
    --load-op-recycle \
    --load-step-limit-time=1m

5.1.2.3. End-To-End Latency

  1. Start writing the messages to some topic with enabled timestamps recording. Example command:
docker run \
    --network host \
    emcmongoose/mongoose-storage-driver-pulsar \
    --item-data-size=1KB \
    --item-output-path=topic1 \
    --load-batch-size=1000 \
    --storage-driver-create-timestamp
  1. Start the tail read from the same topic:
docker run \
    --network host \
    --volume $PWD/log:/root/.mongoose/4.2.16/log
    emcmongoose/mongoose-storage-driver-pulsar \
    --load-batch-size=1000 \
    --read \
    --item-input-path=topic1 \
    --storage-driver-read-tail \
    --load-op-recycle \
    --load-step-id=e2e_test
  1. Check the end-to-end time data in the log/e2e_test/op.trace.csv log file. The data is in the CSV format with 3 columns:

Note: the end-to-end time data will not be aggregated in the distributed mode.

5.2. Topic Operations

TODO: https://mongoose-issues.atlassian.net/browse/PULSAR-2

5.2.1. Create

TODO

5.2.2. Read

TODO

5.2.3. Update

TODO

5.2.4. Delete

TODO

6. Open Issues

Please refer to the issue tracker

7. Development

7.1. Build

./gradlew clean jar

7.2. Test

7.2.1. Automated

7.2.1.1. Unit

./gradlew clean test

7.2.1.2. Integration

docker run -it \
    -p 6650:6650 \
    -p 8080:8080 \
    apachepulsar/pulsar:<PULSAR_VERSION> \
    bin/pulsar standalone
./gradlew integrationTest

7.2.1.3. Functional

./gradlew jar
export SUITE=api.storage
TEST=<TODO_TEST_NAME> ./gradlew robotest

7.2.2. Manual

  1. Build the storage driver
  2. Copy the storage driver's jar file into the mongoose's ext directory:
cp -f build/libs/mongoose-storage-driver-pulsar-*.jar ~/.mongoose/<MONGOOSE_BASE_VERSION>/ext/

Note that the Pulsar storage driver depends on the Coop Storage Driver extension so it should be also put into the ext directory 3. Install and run the Apache Pulsar

docker run -it \
    -p 6650:6650 \
    -p 8080:8080 \
    apachepulsar/pulsar:<PULSAR_VERSION> \
    bin/pulsar standalone
  1. Run Mongoose's default scenario with some specific command-line arguments:
java -jar mongoose-<MONGOOSE_BASE_VERSION>.jar \
    --storage-driver-type=pulsar \
    --storage-net-node-port=6650 \
    --storage-driver-limit-concurrency=10 \
    --item-output-path=topic-0