Awesome
Indeed has decided to archive this project, as it is no longer supported nor used internally. Thank you for your understanding. We won't be accepting pull requests or responding to issues for this project anymore. If you are a current user and would like to take over support for this project, please create a fork.
Minimal Perfect Hash Tables
About
Minimal Perfect Hash Tables are an immutable key/value store with efficient space utilization and fast reads. They are ideal for the use-case of tables built by batch processes and shipped to multiple servers.
Usage
Indeed MPH is available on Maven Central, just add the following dependency:
<dependency>
<groupId>com.indeed</groupId>
<artifactId>mph-table</artifactId>
<version>1.0.4</version>
</dependency>
The primary interfaces are TableReader, to construct a reader to an existing table, TableWriter, to build a table, and TableConfig, to specify the configuration for the writer.
How to write a table:
final TableConfig<Long, Long> config = new TableConfig()
.withKeySerializer(new SmartLongSerializer())
.withValueSerializer(new SmartVLongSerializer());
final Set<Pair<Long, Long>> entries = new HashSet<>();
for (long i = 0; i < 20; ++i) {
entries.add(new Pair(i, i * i));
}
TableWriter.write(new File("squares"), config, entries);
How to read a table:
try (final TableReader<Long, Long> reader = TableReader.open("squares")) {
final Long value = reader.get(3L); // get one
for (final Pair<Long, Long> p : reader) { // iterate over all
...
}
}
Command Line
In addition to the Java API, TableReader and TableWriter provide convenience command-line interfaces to read and write tables, allowing you to quickly get started without writing any code:
# print all key-values in a table as TSV
$ java com.indeed.mph.TableReader --dump <table>
# print the value for a single key
$ java com.indeed.mph.TableReader --get <key> <table>
# create a table from a TSV file of words with counts
$ java com.indeed.mph.TableWriter --valueSerializer .SmartVLongSerializer <table to create> <counts.tsv>
# create a table from a TSV file mapping movie ids to lists of actor names (compressed by reference)
$ java com.indeed.mph.TableWriter --keySerializer .SmartVLongSerializer --valueSerializer '.SmartListSerializer(.SmartDictionarySerializer)' <table to create> <movies.tsv>
# same as above, not actually storing the movie ids but still allowing retrieval by them
$ java com.indeed.mph.TableWriter --keyStorage IMPLICIT --keySerializer .SmartVLongSerializer --valueSerializer '.SmartListSerializer(.SmartDictionarySerializer)' <table to create> <movies.tsv>
Code of Conduct
This project is governed by the Contributor Covenant v 1.4.1
License
This project is licensed under the Apache-2.0 License - see the LICENSE file for details.