Awesome
open-korean-text-node
A nodejs binding for open-korean-text via node-java interface.
Dependency
Currently wraps open-korean-text 2.2.0
현재 이 프로젝트는 open-korean-text 2.2.0을 사용중입니다.
Requirement
Since it uses java code compiled with Java 8, make sure you have both Java 8 JDK and JRE installed.
For more details about installing java interface, see installation notes on below links.
이 프로젝트는 Java 8로 컴파일된 코드를 사용하기 때문에, Java 8 JDK/JRE가 설치되어 있어야 합니다.
Java interface의 설치에 관련된 더 자세한 사항은 아래 링크에서 확인하세요.
Installation
npm install --save open-korean-text-node
Usage
import OpenKoreanText from 'open-korean-text-node';
// or
const OpenKoreanText = require('open-korean-text-node').default;
- See API section to get more informations.
Examples
API
OpenKoreanText
Tokenizing
OpenKoreanText.tokenize(text: string): Promise<IntermediaryTokens>;
OpenKoreanText.tokenizeSync(text: string): IntermediaryTokens;
text
a target string to tokenize
Detokenizing
OpenKoreanText.detokenize(tokens: IntermediaryTokensObject): Promise<string>;
OpenKoreanText.detokenize(words: string[]): Promise<string>;
OpenKoreanText.detokenize(...words: string[]): Promise<string>;
OpenKoreanText.detokenizeSync(tokens: IntermediaryTokensObject): string;
OpenKoreanText.detokenizeSync(words: string[]): string;
OpenKoreanText.detokenizeSync(...words: string[]): string;
tokens
an intermediary token object fromtokenize
words
an array of words to detokenize
Phrase Extracting
OpenKoreanText.extractPhrases(tokens: IntermediaryTokens, options?: ExcludePhrasesOptions): Promise<KoreanToken>;
OpenKoreanText.extractPhrasesSync(tokens: IntermediaryTokens, options?: ExcludePhrasesOptions): KoreanToken;
tokens
an intermediary token object fromtokenize
orstem
options
an object to pass options to extract phrases wherefilterSpam
- a flag to filter spam tokens. defaults totrue
includeHashtag
- a flag to include hashtag tokens. defaults tofalse
Normalizing
OpenKoreanText.normalize(text: string): Promise<string>;
OpenKoreanText.normalizeSync(text: string): string;
text
a target string to normalize
Sentence Splitting
OpenKoreanText.splitSentences(text: string): Promise<Sentence[]>;
OpenKoreanText.splitSentencesSync(text: string): Sentence[];
text
a target string to normalize
- returns array of
Sentence
which includes:text
: string - the sentence's textstart
: number - the sentence's start position from original stringend
: number - the sentence's end position from original string
Custom Dictionary
OpenKoreanText.addNounsToDictionary(...words: string[]): Promise<void>;
OpenKoreanText.addNounsToDictionarySync(...words: string[]): void;
words
words to add to dictionary
toJSON
OpenKoreanText.tokensToJsonArray(tokens: IntermediaryTokensObject, keepSpace?: boolean): Promise<KoreanToken[]>;
OpenKoreanText.tokensToJsonArraySync(tokens: IntermediaryTokensObject, keepSpace?: boolean): KoreanToken[];
tokens
an intermediary token object fromtokenize
orstem
keepSpace
a flag to omit 'Space' token or not, defaults tofalse
IntermediaryToken object
An intermediate token object required for internal processing.
Provides a convenience wrapper functionS to process text without using processor object
tokens.extractPhrases(options?: ExcludePhrasesOptions): Promise<KoreanToken>;
tokens.extractPhrasesSync(options?: ExcludePhrasesOptions): KoreanToken;
tokens.detokenize(): Promise<string>;
tokens.detokenizeSync(): string;
tokens.toJSON(): KoreanToken[];
- NOTE:
tokens.toJSON()
method is equivalent withOpenKoreanText.tokensToJsonArraySync(tokens, false)
KoreanToken object
A JSON output object which contains:
text
: string - token's textstem
: string - token's stempos
: stirng - type of token. possible entries are:- Word level POS:
Noun
,Verb
,Adjective
,Adverb
,Determiner
,Exclamation
,Josa
,Eomi
,PreEomi
,Conjunction
,NounPrefix
,VerbPrefix
,Suffix
,Unknown
- Chunk level POS:
Korean
,Foreign
,Number
,KoreanParticle
,Alpha
,Punctuation
,Hashtag
,ScreenName
,Email
,URL
,CashTag
- Functional POS:
Space
,Others
- Word level POS:
offset
: number - position from original stringlength
: number - length of textisUnknown
: boolean