Awesome
lancaster-stemmer
Contents
- What is this?
- When should I use this?
- Install
- Use
- API
- CLI
- Types
- Compatibility
- Related
- Contribute
- Security
- License
What is this?
This package exposes a stemming algorithm. That means it gets a certain string (typically an English word), and turns it into a shorter version (a stem), which can then be compared to other stems (of other words), to check if they are both (likely) the same term.
When should I use this?
You’re probably dealing with natural language and know you need this if you’re here!
Install
This package is ESM only. In Node.js (version 16+), install with npm:
npm install lancaster-stemmer
In Deno with esm.sh
:
import {lancasterStemmer} from 'https://esm.sh/lancaster-stemmer@2'
In browsers with esm.sh
:
<script type="module">
import {lancasterStemmer} from 'https://esm.sh/lancaster-stemmer@2?bundle'
</script>
Use
import {lancasterStemmer} from 'lancaster-stemmer'
console.log(lancasterStemmer('considerations')) // => 'consid'
console.log(lancasterStemmer('detestable')) // => 'detest'
console.log(lancasterStemmer('vileness')) // => 'vil'
console.log(lancasterStemmer('giggling')) // => 'giggl'
console.log(lancasterStemmer('anxious')) // => 'anxy'
// Case insensitive
console.log(lancasterStemmer('analytic') === lancasterStemmer('AnAlYtIc')) // => true
API
This package exports the identifier lancasterStemmer
.
There is no default export.
lancasterStemmer(value, options?)
Get the stem from a given value.
Parameters
value
(string
, required) — value to stemoptions
(Options
, default:{}
) — configuration
Returns
Stem for value
(string
).
Options
Configuration (TypeScript type).
Fields
style
(Style
, default:'c'
) — style of algorithm
Style
Style of algorithm (TypeScript type).
There are small algorithmic differences between how the algorithm was implemented over the years. Looking at Algorithm Implementations on the archived website, there are four styles available, in addition to the original paper.
The only difference currently implemented in this package is whether a final
s
is kept before stopping (paper
) or dropped before stopping (c
).
Values
'c'
— rules from the ANSI C (Stark, 1994) and Perl (Taffet, 2001) implementations (compensation
->compen
)'paper'
— rules from the original paper (1990), and Pascal (Paice/Husk) and Java (O’Neill, 2000) implementations (compensation
->compens
)
CLI
Usage: lancaster-stemmer [options] <words...>
Lancaster stemming algorithm
Options:
-h, --help output usage information
-v, --version output version number
Usage:
# output stems
$ lancaster-stemmer considerations
consid
# output stems from stdin
$ echo "detestable vileness" | lancaster-stemmer
detest vil
Types
This package is fully typed with TypeScript.
It exports the additional types Options
and
Style
.
Compatibility
Projects maintained by the unified collective are compatible with maintained versions of Node.js.
When we cut a new major release, we drop support for unmaintained versions of
Node.
This means we try to keep the current release line, lancaster-stemmer@^2
,
compatible with Node.js 12.
Related
stemmer
— porter stemmer algorithmdouble-metaphone
— double metaphone algorithmsoundex-code
— soundex algorithmdice-coefficient
— sørensen–dice coefficientlevenshtein-edit-distance
— levenshtein edit distancesyllable
— syllable count of English words
Contribute
Yes please! See How to Contribute to Open Source.
Security
This package is safe.