Awesome
latin-stemming
Functions for getting the stems for Latin words. Based on the Schinke algorithm. The key thing to note about this approach is that it can be applied to words when it doesn't know what language part (e.g. noun vs verb) it is. In those cases it will usually return more than one possible stem.
var stemming = require("latin-stemming");
Constants
The module uses several hard-coded arrays for lookups and replacements. You can access them yourself via:
stemming.quewords
=> an array of -que words that are atomic and NOT 'and'stemming.nounsuffixes
=> an array of regexes for matching noun suffixesstemming.verbsuffixes
=> an array of regexes for matching verb suffixes, and what those suffixes should be replaced with in the stem
Functions
stemming.stem(word, config) // => []
Word, I hope, is self explanatory. Config is a struct that can contain several optional values:
- quewords - override the default quewords list provided by the module
- nounsuffixes - override the default noun suffix regexes
- verbsuffixes - override the default verb suffixes and replacements
- type - if this is "Noun", "Adjective", "Adverb", or "Verb", the stemmer will only apply the relevent stemming rules
stemming.couchkey(config) // => Function
Returns a contextless function that can be used for CouchDB indexes. Config can contain several optional values:
- condition - string to be used inside an
if (...)
statement; if you include this, documents will only be processed if they pass the condition - wordkey (defaults to 'word') - the document property containing the word to be stemmed
- typekey (defaults to 'wordclass') - the document property that contains language part of the word (e.g. Noun, Verb); if the property is not available, keys for both verb and noun interpretations will be emitted
- quewords - override the default quewords list provided by the module
- nounsuffixes - override the default noun suffix regexes
- verbsuffixes - override the default verb suffixes and replacements