Home

Awesome

MongoDB-VLS

MongoDB-VLS is a prototype implementation of VLS (Virtual Lightweight Snapshots) in MongoDB v.2.5.5. VLS is a mechanism that enables consistent analytics without blocking incoming updates in NoSQL stores.

Links and References

For more detailed information about the VLS technique, please refer to our ICDE paper:

Virtual Lightweight Snapshots for Consistent Analytics in NoSQL Stores, F. Chirigati, J. Siméon, M. Hirzel, and J. Freire. In Proceedings of the 32nd International Conference on Data Engineering (ICDE), 2016

The team includes:

How To Install

Please note that MongoDB-VLS has been only tested on CentOS and Fedora 23 Server.

MongoDB-VLS

First, build and install TBB 4.2 Update 3; the source code and build instructions are available here. Make sure you add TBB's lib directory to the LD_LIBRARY_PATH environment variable.

Next, build and install MongoDB-VLS, which follows the same instructions on installing MongoDB (you will need SCons):

$ cd vls
$ scons --disable-warnings-as-errors install

We recommend using GCC 4.7+. More information about how to build and install MongoDB is available here.

We recommend creating one data directory (e.g.: /mongodb/data/) and a location for the log (e.g.: /mongodb/log).

YCSB

We extended YCSB 0.1.4 to include aggregate queries (using both full and index scans). You can build it with Maven as follows:

$ cd ycsb
$ mvn clean package

How To Use

MongoDB-VLS

To start the MongoDB server, please refer to experiments/start_mongod_server (to start the server for full scans) and experiments/start_mongod_server_index (to start the server for index scans). We use the numactl command to start the server.

To stop the server, please refer to experiments/stop_mongod_server.

For more information on MongoDB, please refer to the documentation.

YCSB

To load data into MongoDB-VLS using YCSB, use the following command:

$ ./experiments/scripts/ycsb-scripts/ycsb_load_data db_name n_records

where db_name is the name of MongoDB's collection and n_records is the number of records to be loaded into db_name.

For more information on YCSB, please refer to the documentation.

Reproducing the Results

This section shows how to reproduce the results published in our ICDE paper. Here, we assume that a directory called experiments/results/ has been created for the results.

Query Execution Time Results

To generate the results for full scan aggregates, run the following script:

$ ./experiments/scripts/ycsb-scripts/query_execution_time db_name n_records

To generate the results for index scan aggregates, run the following script:

$ ./experiments/scripts/ycsb-scripts/index_query_execution_time db_name n_records

Update Throughput and Latency Results

To generate the results for full scan aggregates, run the following script:

$ ./experiments/scripts/ycsb-scripts/query_updates db_name n_records

To generate the results for index scan aggregates, run the following script:

$ ./experiments/scripts/ycsb-scripts/index_query_updates db_name n_records

Remset Size Results

The scripts are the same, but to get information about remset size, debugging code from collection.h and collection_scan.cpp must be uncommented.

Varying the Size of Bit Vectors

The scripts are the same, but the code must be changed to support 128-bit vectors, e.g., collection.h must be changed to support std::bitset<128>, rather than uint64_t.

Plots

All plots were implemented using matplotlib.

We have made available ReproZip packages (experiments/reprozip) to reproduce the generation of these plots. For instance, to generate Figure 7:

$ cd experiments/reprozip
$ reprounzip vagrant setup query_execution_time.rpz query_execution_time/
$ reprounzip vagrant run query_execution_time/
$ reprounzip vagrant download query_execution_time/ scan_duration.png
$ reprounzip vagrant download query_execution_time/ scan_duration_updates.png

Please refer to ReproZip's documentation for more information on how to install and use the tool.

Limitations