Home

Awesome

PyThaiNLP: Thai Natural Language Processing in Python

Project Logo

pypi Python 3.9 License DOI

Project Status: Active Codacy Grade Coverage Status

Google Colab Badge Chat on Matrix

PyThaiNLP is a Python package for text processing and linguistic analysis, similar to NLTK with a focus on Thai language.

PyThaiNLP เป็นไลบารีภาษาไพทอนสำหรับประมวลผลภาษาธรรมชาติ คล้ายกับ NLTK โดยเน้นภาษาไทย ดูรายละเอียดภาษาไทยได้ที่ README_TH.MD

Quick install

pip install pythainlp

News

Now, You can contact with or ask any questions of the PyThaiNLP team. <a href="https://matrix.to/#/#thainlp:matrix.org" rel="noopener" target="_blank"><img src="https://matrix.to/img/matrix-badge.svg" alt="Chat on Matrix"></a>

VersionDescriptionStatus
5.0.5StableChange Log
devRelease Candidate for 5.1Change Log

Getting Started

Capabilities

PyThaiNLP provides standard linguistic analysis for Thai language and standard Thai locale utility functions. Some of these functions are also available via the command-line interface (run thainlp in your shell).

Partial list of features:

Installation

pip install --upgrade pythainlp

This will install the latest stable release of PyThaiNLP.

Install different releases:

Installation Options

Some functionalities, like Thai WordNet, may require extra packages. To install those requirements, specify a set of [name] immediately after pythainlp:

pip install "pythainlp[extra1,extra2,...]"

Possible extras:

For dependency details, look at the extras variable in setup.py.

Data Directory

Command-Line Interface

Some of PyThaiNLP functionalities can be used via command line with the thainlp command.

For example, to display a catalog of datasets:

thainlp data catalog

To show how to use:

thainlp help

Testing and test suites

We test core functionalities on all officially supported Python versions.

Some functionality requiring extra dependencies may be tested less frequently due to potential version conflicts or incompatibilities between packages.

Test cases are categorized into three groups: core, compact, and extra. You can find these tests in the tests/ directory.

For more detailed information on testing, please refer to the tests README: tests/README.md

Licenses

License
PyThaiNLP source codes and notebooksApache Software License 2.0
Corpora, datasets, and documentations created by PyThaiNLPCreative Commons Zero 1.0 Universal Public Domain Dedication License (CC0)
Language models created by PyThaiNLPCreative Commons Attribution 4.0 International Public License (CC-by)
Other corpora and models that may be included in PyThaiNLPSee Corpus License

Contribute to PyThaiNLP

Who uses PyThaiNLP?

You can read INTHEWILD.md.

Citations

If you use PyThaiNLP in your project or publication, please cite the library as follows:

Phatthiyaphaibun, Wannaphong, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, and Pattarawat Chormai. “Pythainlp: Thai Natural Language Processing in Python”. Zenodo, 2 June 2024. http://doi.org/10.5281/zenodo.3519354.

or by BibTeX entry:

@software{pythainlp,
    title = "{P}y{T}hai{NLP}: {T}hai Natural Language Processing in {P}ython",
    author = "Phatthiyaphaibun, Wannaphong  and
      Chaovavanich, Korakot  and
      Polpanumas, Charin  and
      Suriyawongkul, Arthit  and
      Lowphansirikul, Lalita  and
      Chormai, Pattarawat",
    doi = {10.5281/zenodo.3519354},
    license = {Apache-2.0},
    month = jun,
    url = {https://github.com/PyThaiNLP/pythainlp/},
    version = {v5.0.4},
    year = {2024},
}

Our NLP-OSS 2023 paper:

Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul, Lalita Lowphansirikul, Pattarawat Chormai, Peerat Limkonchotiwat, Thanathip Suntorntip, and Can Udomcharoenchaikit. 2023. PyThaiNLP: Thai Natural Language Processing in Python. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 25–36, Singapore, Singapore. Empirical Methods in Natural Language Processing.

and its BibTeX entry:

@inproceedings{phatthiyaphaibun-etal-2023-pythainlp,
    title = "{P}y{T}hai{NLP}: {T}hai Natural Language Processing in {P}ython",
    author = "Phatthiyaphaibun, Wannaphong  and
      Chaovavanich, Korakot  and
      Polpanumas, Charin  and
      Suriyawongkul, Arthit  and
      Lowphansirikul, Lalita  and
      Chormai, Pattarawat  and
      Limkonchotiwat, Peerat  and
      Suntorntip, Thanathip  and
      Udomcharoenchaikit, Can",
    editor = "Tan, Liling  and
      Milajevs, Dmitrijs  and
      Chauhan, Geeticka  and
      Gwinnup, Jeremy  and
      Rippeth, Elijah",
    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Empirical Methods in Natural Language Processing",
    url = "https://aclanthology.org/2023.nlposs-1.4",
    pages = "25--36",
    abstract = "We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language. We first provide a brief historical context of tools for Thai language prior to the development of PyThaiNLP. We then outline the functionalities it provided as well as datasets and pre-trained language models. We later summarize its development milestones and discuss our experience during its development. We conclude by demonstrating how industrial and research communities utilize PyThaiNLP in their work. The library is freely available at https://github.com/pythainlp/pythainlp.",
}

Sponsors

LogoDescription
VISTEC-depa Thailand Artificial Intelligence Research InstituteSince 2019, our contributors Korakot Chaovavanich and Lalita Lowphansirikul have been supported by VISTEC-depa Thailand Artificial Intelligence Research Institute.
MacStadiumWe get support of free Mac Mini M1 from MacStadium for running CI builds.

<div align="center"> Made with ❤️ | PyThaiNLP Team 💻 | "We build Thai NLP" 🇹🇭 </div>
<div align="center"> <strong>We have only one official repository at https://github.com/PyThaiNLP/pythainlp and another mirror at https://gitlab.com/pythainlp/pythainlp</strong> </div> <div align="center"> <strong>Beware of malware if you use codes from mirrors other than the official two on GitHub and GitLab.</strong> </div>