Home

Awesome

JPruningRadixTrie<br> MIT License

Java port of wolfgarbe/PruningRadixTrie. See link for the background of the project etc. All credit to Wolf Garbe.

No features have been added/removed other than where necessary to match Java conventions. So functionality and performance should be unchanged, aside only from these exceptions:


Download Instructions:

Copy the source code into your project! There are only 4 classes to copy, and no external dependencies. I'm afraid I do not have plans to upload this project to Maven Central or the like.


Usage:

Create Object

PruningRadixTrie pruningRadixTrie = new PruningRadixTrie();

addTerm: insert term and term frequency count into Pruning Radix Trie. Frequency counts for same term are summed up.

pruningRadixTrie.addTerm("microsoft", 1000);

getTopkTermsForPrefix: retrieve the top-k most relevant terms for a given prefix from the Pruning Radix Trie.

String prefix = "micro";
int topK = 10;
List<TermAndFrequency> results = pruningRadixTrie.getTopkTermsForPrefix(prefix, topK);
for (TermAndFrequency result : results) {
  System.out.println(result.getTerm() + " " + result.getTermFrequencyCount());
}

readTermsFromFile: Deserialise the Pruning Radix Trie from disk for persistence. Note here that terms.txt contains tab-delimited lines, ie in the format term\tfrequency. If format is term,frequency then replace second argument with ",", etc.

pruningRadixTrie.readTermsFromFile("terms.txt", "\t");

writeTermsToFile: Serialise the Pruning Radix Trie to disk for persistence.

pruningRadixTrie.writeTermsToFile("terms.txt");