Home

Awesome

<p align="center"> <img align="center" height="200" src="assets/logo_w_text.svg"> <br> <b>Topic modeling is your turf too.</b> <br> <i> Contextual topic models with representations from transformers. </i></p>

Features

This package is still work in progress and scientific papers on some of the novel methods are currently undergoing peer-review. If you use this package and you encounter any problem, let us know by opening relevant issues.

New in version 0.8.0

Automated Topic Naming

Turftopic now allows you to automatically assign human readable names to topics using LLMs or n-gram retrieval!

from turftopic import KeyNMF
from turftopic.namers import OpenAITopicNamer

model = KeyNMF(10).fit(corpus)

namer = OpenAITopicNamer("gpt-4o-mini")
model.rename_topics(namer)
model.print_topics()
Topic IDTopic NameHighest Ranking
0Operating Systems and Softwarewindows, dos, os, ms, microsoft, unix, nt, memory, program, apps
1Atheism and Belief Systemsatheism, atheist, atheists, belief, religion, religious, theists, beliefs, believe, faith
2Computer Architecture and Performancemotherboard, ram, memory, cpu, bios, isa, speed, 486, bus, performance
3Storage Technologiesdisk, drive, scsi, drives, disks, floppy, ide, dos, controller, boot
...

Basics (Documentation)

Open in Colab

Installation

Turftopic can be installed from PyPI.

pip install turftopic

If you intend to use CTMs, make sure to install the package with Pyro as an optional dependency.

pip install turftopic[pyro-ppl]

Fitting a Model

Turftopic's models follow the scikit-learn API conventions, and as such they are quite easy to use if you are familiar with scikit-learn workflows.

Here's an example of how you use KeyNMF, one of our models on the 20Newsgroups dataset from scikit-learn.

from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(
    subset="all",
    remove=("headers", "footers", "quotes"),
)
corpus = newsgroups.data

Turftopic also comes with interpretation tools that make it easy to display and understand your results.

from turftopic import KeyNMF

model = KeyNMF(20).fit(corpus)

Interpreting Models

Turftopic comes with a number of pretty printing utilities for interpreting the models.

To see the highest the most important words for each topic, use the print_topics() method.

model.print_topics()
<center>
Topic IDTop 10 Words
0armenians, armenian, armenia, turks, turkish, genocide, azerbaijan, soviet, turkey, azerbaijani
1sale, price, shipping, offer, sell, prices, interested, 00, games, selling
2christians, christian, bible, christianity, church, god, scripture, faith, jesus, sin
3encryption, chip, clipper, nsa, security, secure, privacy, encrypted, crypto, cryptography
....
</center>
# Print highest ranking documents for topic 0
model.print_representative_documents(0, corpus, document_topic_matrix)
<center>
DocumentScore
Poor 'Poly'. I see you're preparing the groundwork for yet another retreat from your...0.40
Then you must be living in an alternate universe. Where were they? An Appeal to Mankind During the...0.40
It is 'Serdar', 'kocaoglan'. Just love it. Well, it could be your head wasn't screwed on just right...0.39
</center>
model.print_topic_distribution(
    "I think guns should definitely banned from all public institutions, such as schools."
)
<center>
Topic nameScore
7_gun_guns_firearms_weapons0.05
17_mail_address_email_send0.00
3_encryption_chip_clipper_nsa0.00
19_baseball_pitching_pitcher_hitter0.00
11_graphics_software_program_3d0.00
</center>

Visualization

Turftopic does not come with built-in visualization utilities, topicwizard, an interactive topic model visualization library, is compatible with all models from Turftopic.

pip install topic-wizard

By far the easiest way to visualize your models for interpretation is to launch the topicwizard web app.

import topicwizard

topicwizard.visualize(corpus, model=model)
<figure> <img src="https://x-tabdeveloping.github.io/topicwizard/_images/screenshot_topics.png" width="70%" style="margin-left: auto;margin-right: auto;"> <figcaption>Screenshot of the topicwizard Web Application</figcaption> </figure>

Alternatively you can use the Figures API in topicwizard for individual HTML figures.

References