Home

Awesome

Term Frequency - Inverse Document Frequency (TF-IDF)

Build status Coverage Status License

This repository contains the implementation of TF-IDF algorithm in Pharo.

For more infomation please refer to the Pharo-AI wiki: https://github.com/pharo-ai/wiki

How to install it

To install TF-IDF, go to the Playground (Ctrl+OW) in your Pharo image and execute the following Metacello script (select it and press Do-it button or Ctrl+D):

Metacello new
  baseline: 'AITfIdf';
  repository: 'github://pharo-ai/tf-idf/src';
  load.

How to depend on it

If you want to add a dependency on TF-IDF to your project, include the following lines into your baseline method:

spec
  baseline: 'AITfIdf'
  with: [ spec repository: 'github://pharo-ai/tf-idf/src' ].

If you are new to baselines and Metacello, check out the Baselines tutorial on Pharo Wiki.

How to use it

Here is a simple example of how you can train a TF-IDF model and use it to assign scores to words. You are given an array of sentences where each sentence is represented as an array of words:

sentences := #(
  (I am Sam)
  (Sam I am)
  (I 'don''t' like green eggs and ham)).

Train a TF-IDF model on those sentences:

tfidf := AITermFrequencyInverseDocumentFrequency new.
tfidf trainOn: sentences.

Use it to assign TF-IDF scores to words:

tfidf scoreOf: 'Sam' in: #(I am Sam). "0.4054651081081644"

You can also encode any given text with a TF-IDF vector

tfidf vectorFor: #(I am green green ham). "#(0.0 0.0 0.4054651081081644 0.0 0.0 0.0 2.1972245773362196 1.0986122886681098 0.0)"