Home

Awesome

MTAnchor

Repository contains demo code for MTAnchor, an interactive, multilingual topic modeling system. The code accompanies the paper Multilingual Anchoring: Interactive Topic Modeling and Alignment Across Languages (Yuan et al., 2018).

Dependencies

All above packages can be installed with pip install.

Setup

Usage

Memory space

The demo uses a SQLite database (stored in your local files) to save data. Since this is only a demo, only top 1000 words from each corpus is included in the vocabulary. The demo doesn't let user submit their results to prevent database from getting too large. If you find that the database is taking up too much space, run delete_db.sh to delete all data from the database.

Data

If you want the Wikipedia data used in the experiments, you may download it here. The dataset contains Chinese and English Wikipedia articles that are labeled in one of six categories: film, music, animals, politics, religion, and food. Please cite this paper if you use the data.

See also

Citation

@inproceedings{yuan2018mtanchor,
  title={Multilingual Anchoring: Interactive Topic Modeling and Alignment Across Languages},
  author={Yuan, Michelle and Van Durme, Benjamin and Boyd-Graber, Jordan},
  booktitle={Advances in neural information processing systems},
  year={2018}
}

License

Copyright (C) 2018, Michelle Yuan

Licensed under the terms of the MIT License. A full copy of the license can be found in LICENSE.