Awesome
Polyglot
This repository is marked as a public archive and it will be deleted in the future. It contains duplicated code that it can be found in the other pharo-ai repositories. We encourage you to look into the other NLP repositories that we have inside pharo-ai.
A library for Natural Language Processing implemented in Pharo. To get more information, check out the Polyglot Booklet.
Installation
To install Polyglot, go to the Playground (Ctrl+OW
) in your fresh Pharo image and execute the following Metacello script (select it and press Do-it button or Ctrl+D
):
Metacello new
baseline: 'Polyglot';
repository: 'github://PolyMathOrg/Polyglot/src';
load.
List of Supported Features
- Tokenization
- N-grams
- Term Frequency-Inverse Document Frequency Scoring
- N-Gram Language Modelling
- Stemming
- Part of Speech Tagging
- Named Entity Recognizer
- Dependency Parser
- Modified Atlas Bridge
- Common Vector Metrics
Google Summer of Code 2019 Report
Author: Nikhil Pinnaparaju
Organisation: Pharo
Project: Polyglot
Mentors: Oleksandr Zaitsev, Alexandre Bergel
A library for Natural Language Processing implemented in Pharo.
Features Implemented
- Tokenization
- N-grams
- Term Frequency-Inverse Document Frequency Scoring
- N-Gram Language Modelling
- Stemming
- Part of Speech Tagging
- Named Entity Recognizer
- Dependency Parser
- Modified Atlas Bridge
- Common Vector Metrics
Code Contribution
Documentation
Blog Posts
- Representing Documents as Vectors and Visualizing them Using Polyglot in Pharo
- Stemming in Polyglot
- Working with the Atlas Pharo-Python Bridge
- Polyglot for Large Corpora
- Introducing Polyglot
- Tokenization — GSoC with Pharo Consortium
- Community Bonding Period — GSoC with Pharo Consortium
- Architecture Design For an NLP Library
- PCA in Pharo using PolyMath, DataFrame and Roassal
- My Journey Into Google Summer of Code — 2019