Awesome
MongoDB-VLS
MongoDB-VLS is a prototype implementation of VLS (Virtual Lightweight Snapshots) in MongoDB v.2.5.5. VLS is a mechanism that enables consistent analytics without blocking incoming updates in NoSQL stores.
Links and References
For more detailed information about the VLS technique, please refer to our ICDE paper:
The team includes:
- Fernando Chirigati (New York University)
- Jérôme Siméon (IBM Watson Research)
- Martin Hirzel (IBM Watson Research)
- Juliana Freire (New York University)
How To Install
Please note that MongoDB-VLS has been only tested on CentOS and Fedora 23 Server.
MongoDB-VLS
First, build and install TBB 4.2 Update 3; the source code and build instructions are available here. Make sure you add TBB's lib
directory to the LD_LIBRARY_PATH
environment variable.
Next, build and install MongoDB-VLS, which follows the same instructions on installing MongoDB (you will need SCons):
$ cd vls
$ scons --disable-warnings-as-errors install
We recommend using GCC 4.7+. More information about how to build and install MongoDB is available here.
We recommend creating one data directory (e.g.: /mongodb/data/
) and a location for the log (e.g.: /mongodb/log
).
YCSB
We extended YCSB 0.1.4 to include aggregate queries (using both full and index scans). You can build it with Maven as follows:
$ cd ycsb
$ mvn clean package
How To Use
MongoDB-VLS
To start the MongoDB server, please refer to experiments/start_mongod_server
(to start the server for full scans) and experiments/start_mongod_server_index
(to start the server for index scans). We use the numactl
command to start the server.
To stop the server, please refer to experiments/stop_mongod_server
.
For more information on MongoDB, please refer to the documentation.
YCSB
To load data into MongoDB-VLS using YCSB, use the following command:
$ ./experiments/scripts/ycsb-scripts/ycsb_load_data db_name n_records
where db_name
is the name of MongoDB's collection and n_records
is the number of records to be loaded into db_name
.
For more information on YCSB, please refer to the documentation.
Reproducing the Results
This section shows how to reproduce the results published in our ICDE paper. Here, we assume that a directory called experiments/results/
has been created for the results.
Query Execution Time Results
To generate the results for full scan aggregates, run the following script:
$ ./experiments/scripts/ycsb-scripts/query_execution_time db_name n_records
To generate the results for index scan aggregates, run the following script:
$ ./experiments/scripts/ycsb-scripts/index_query_execution_time db_name n_records
Update Throughput and Latency Results
To generate the results for full scan aggregates, run the following script:
$ ./experiments/scripts/ycsb-scripts/query_updates db_name n_records
To generate the results for index scan aggregates, run the following script:
$ ./experiments/scripts/ycsb-scripts/index_query_updates db_name n_records
Remset Size Results
The scripts are the same, but to get information about remset size, debugging code from collection.h
and collection_scan.cpp
must be uncommented.
Varying the Size of Bit Vectors
The scripts are the same, but the code must be changed to support 128-bit vectors, e.g., collection.h
must be changed to support std::bitset<128>
, rather than uint64_t
.
Plots
All plots were implemented using matplotlib.
- Figures 7a and 7b (query execution time):
experiments/plots/query_execution_time.py
- Figure 8a (update throughput for full scans):
experiments/plots/full_scan_update_throughput.py
- Figure 8b (update latency for full scans):
experiments/plots/full_scan_update_latency.py
- Figures 9a, 9b, and 9c (update throughput for index scans):
experiments/plots/index_scan_update_throughput.py
- Figures 10a, 10b, and 10c (update latency for index scans):
experiments/plots/index_scan_update_latency.py
- Figures 11a, 11b, 11c, and 11d (remset size):
experiments/plots/remset_size.py
We have made available ReproZip packages (experiments/reprozip
) to reproduce the generation of these plots. For instance, to generate Figure 7:
$ cd experiments/reprozip
$ reprounzip vagrant setup query_execution_time.rpz query_execution_time/
$ reprounzip vagrant run query_execution_time/
$ reprounzip vagrant download query_execution_time/ scan_duration.png
$ reprounzip vagrant download query_execution_time/ scan_duration_updates.png
Please refer to ReproZip's documentation for more information on how to install and use the tool.
Limitations
- The prototype implementation does not support indexed records to be deleted (only inserted/updated).
- For now, the server needs to be started either for full scan aggregates (
experiments/start_mongod_server
) or index scan aggregates (experiments/start_mongod_server_index
) through the use of the flag--index
. Both aggregate types cannot be executed in the same server instance due to some locking issues.