Awesome

MongoDB-VLS

MongoDB-VLS is a prototype implementation of VLS (Virtual Lightweight Snapshots) in MongoDB v.2.5.5. VLS is a mechanism that enables consistent analytics without blocking incoming updates in NoSQL stores.

Links and References

For more detailed information about the VLS technique, please refer to our ICDE paper:

Virtual Lightweight Snapshots for Consistent Analytics in NoSQL Stores, F. Chirigati, J. Siméon, M. Hirzel, and J. Freire. In Proceedings of the 32nd International Conference on Data Engineering (ICDE), 2016

The team includes:

Fernando Chirigati (New York University)
Jérôme Siméon (IBM Watson Research)
Martin Hirzel (IBM Watson Research)
Juliana Freire (New York University)

How To Install

Please note that MongoDB-VLS has been only tested on CentOS and Fedora 23 Server.

MongoDB-VLS

First, build and install TBB 4.2 Update 3; the source code and build instructions are available here. Make sure you add TBB's lib directory to the LD_LIBRARY_PATH environment variable.

Next, build and install MongoDB-VLS, which follows the same instructions on installing MongoDB (you will need SCons):

$ cd vls
$ scons --disable-warnings-as-errors install

We recommend using GCC 4.7+. More information about how to build and install MongoDB is available here.

We recommend creating one data directory (e.g.: /mongodb/data/) and a location for the log (e.g.: /mongodb/log).

YCSB

We extended YCSB 0.1.4 to include aggregate queries (using both full and index scans). You can build it with Maven as follows:

$ cd ycsb
$ mvn clean package

How To Use

MongoDB-VLS

To start the MongoDB server, please refer to experiments/start_mongod_server (to start the server for full scans) and experiments/start_mongod_server_index (to start the server for index scans). We use the numactl command to start the server.

To stop the server, please refer to experiments/stop_mongod_server.

For more information on MongoDB, please refer to the documentation.

YCSB

To load data into MongoDB-VLS using YCSB, use the following command:

$ ./experiments/scripts/ycsb-scripts/ycsb_load_data db_name n_records

where db_name is the name of MongoDB's collection and n_records is the number of records to be loaded into db_name.

For more information on YCSB, please refer to the documentation.

Reproducing the Results

This section shows how to reproduce the results published in our ICDE paper. Here, we assume that a directory called experiments/results/ has been created for the results.

Query Execution Time Results

To generate the results for full scan aggregates, run the following script:

$ ./experiments/scripts/ycsb-scripts/query_execution_time db_name n_records

To generate the results for index scan aggregates, run the following script:

$ ./experiments/scripts/ycsb-scripts/index_query_execution_time db_name n_records

Update Throughput and Latency Results

To generate the results for full scan aggregates, run the following script:

$ ./experiments/scripts/ycsb-scripts/query_updates db_name n_records

To generate the results for index scan aggregates, run the following script:

$ ./experiments/scripts/ycsb-scripts/index_query_updates db_name n_records

Remset Size Results

The scripts are the same, but to get information about remset size, debugging code from collection.h and collection_scan.cpp must be uncommented.

Varying the Size of Bit Vectors

The scripts are the same, but the code must be changed to support 128-bit vectors, e.g., collection.h must be changed to support std::bitset<128>, rather than uint64_t.

Plots

All plots were implemented using matplotlib.

Figures 7a and 7b (query execution time): experiments/plots/query_execution_time.py
Figure 8a (update throughput for full scans): experiments/plots/full_scan_update_throughput.py
Figure 8b (update latency for full scans): experiments/plots/full_scan_update_latency.py
Figures 9a, 9b, and 9c (update throughput for index scans): experiments/plots/index_scan_update_throughput.py
Figures 10a, 10b, and 10c (update latency for index scans): experiments/plots/index_scan_update_latency.py
Figures 11a, 11b, 11c, and 11d (remset size): experiments/plots/remset_size.py

We have made available ReproZip packages (experiments/reprozip) to reproduce the generation of these plots. For instance, to generate Figure 7:

$ cd experiments/reprozip
$ reprounzip vagrant setup query_execution_time.rpz query_execution_time/
$ reprounzip vagrant run query_execution_time/
$ reprounzip vagrant download query_execution_time/ scan_duration.png
$ reprounzip vagrant download query_execution_time/ scan_duration_updates.png

Please refer to ReproZip's documentation for more information on how to install and use the tool.

Limitations

The prototype implementation does not support indexed records to be deleted (only inserted/updated).
For now, the server needs to be started either for full scan aggregates (experiments/start_mongod_server) or index scan aggregates (experiments/start_mongod_server_index) through the use of the flag --index. Both aggregate types cannot be executed in the same server instance due to some locking issues.