Home

Awesome

WEDTM

The code for the paper "Inter and Intra Topic Structure Learning with Word Embeddings" in ICML 2018 PDF.

Key features:

  1. WEDTM is a deep topic model that discovers topic hierarchies.
  2. WEDTM is also able to discover "sub-topics" with the help of word embeddings.
  3. Excellent performance on perplexity, document classification, and topic coherence.

Run WEDTM

  1. The code has been tested in MacOS and Linux (Ubuntu). To run it on Windows, you need to re-compile GNBP_mex_collapsed_deep_WEDTM.c with MEX and a C++ complier.

  2. Requirements: Matlab 2016b (or later) and the code of GBN.

  3. Make sure GBN runs properly on your machine.

  4. We have offered the WS dataset used in the paper, which is stored in MAT format, with the following contents:

Please prepare your own documents in the above format. If you want to use this dataset, please cite the original papers, which are cited in our paper.

  1. Run demo_WEDTM.m:

Notes

  1. As WEDTM adapts GBN for a part of its model structure, the code heavily relies on GBN and basically follows the code structure of GBN.

  2. For the Polya-Gamma sampler (PolyaGamRnd_Gam.m), I used Mingyuan Zhou's implementation, described in "Parsimonious Bayesian deep networks". If you want to use the sampler, please cite the paper.

  3. For the sampling of W, I partly referred to the implementation of DPFA by Gan Zhe.