Home

Awesome

<img align="left" width="82" height="82" src="docs/_static/icon.svg">

tweetopic

:zap: Blazing Fast topic modelling over short texts in Python <br>

PyPI version pip downloads python version Code style: black <br>

<p align="center"> <img src="docs/_static/banner.svg" height=400 align="center"> </p>

Features

New in version 0.4.0 ✨

You can now pass random_state to topic models to make your results reproducible.

from tweetopic import DMM

model = DMM(10, random_state=42)

🛠 Installation

Install from PyPI:

pip install tweetopic

👩‍💻 Usage (documentation)

Train your a topic model on a corpus of short texts:

from tweetopic import DMM
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline

# Creating a vectorizer for extracting document-term matrix from the
# text corpus.
vectorizer = CountVectorizer(min_df=15, max_df=0.1)

# Creating a Dirichlet Multinomial Mixture Model with 30 components
dmm = DMM(n_components=30, n_iterations=100, alpha=0.1, beta=0.1)

# Creating topic pipeline
pipeline = Pipeline([
    ("vectorizer", vectorizer),
    ("dmm", dmm),
])

You may fit the model with a stream of short texts:

pipeline.fit(texts)

To investigate internal structure of topics and their relations to words and indicidual documents we recommend using topicwizard.

Install it from PyPI:

pip install topic-wizard

Then visualize your topic model:

import topicwizard

topicwizard.visualize(pipeline=pipeline, corpus=texts)

topicwizard visualization

🎓 References