Home

Awesome

Languages.jl

Build Status version deps pkgeval

Introduction

Languages.jl is a Julia package for working with human languages. It provides:

Usage

using Languages

articles(Languages.English())
stopwords(Languages.English())

All word lists are returned as vectors of UTF-8 strings.

Script detection

Script detection model works by checking the unicode character ranges present within the input text

Languages.detect_script("To be or not to be") # => Languages.LatinScript()

Language Detection

A trigram based model is used to detect the language for the text. The model is filtered based on the detected script.

We detect 84 of the most common languages spoken around the world. This usually covers most languages with more than 10 million native speakers.

detector = LanguageDetector() detector("To be or not to be") #=> (Languages.English(), Languages.LatinScript(), 1.0)

List All Supported Languages

You can use list_languages() to get all supported languages.

The LanguageDetector model returns the language, the script, and the confidence when applied to a string.

The language and script detection code in this package is heavily inspired from the rust package whatlang-rs. That package is in turn derived from franc. See LICENSE.whatlang-rs for details.

Deprecations

The API of this package has been refurbished recently. If you have used this package earlier, please be aware of these changes.