Home

Awesome

TJNGram

It's common to see Chinese, Jananse and Korean articles contain some English, but it's not common to see an n-gram library which can parse this sort of articles. TJNGram was made for solving this problem.

Install

gem install tjngram

Usage

require 'tjngram'

text = <<eos
這是一個範例。
This is an example.
これは例です。

這裡有一個蘋果。
There is an apple.
これはリンゴです。
eos

puts text, "=========="

TJNGram.process(2, text) #=> {"一個"=>2, "これ"=>2, "is an"=>2, ...}

Note

If your file is utf-8 encoded, please run ruby with the following options:

ruby -Ku example.rb

It's strongly recommand you make your all script files utf-8 encoded.