Topic Modelling via Community Mining of Term Co-occurrence Networks
Date
Author
Institution
Degree Level
Degree
Department
Supervisor / Co-Supervisor and Their Department(s)
Citation for Previous Publication
Link to Related Item
Abstract
Topic modelling seeks to uncover the conceptual and thematic content of collections of documents. These topics can be used as features for document indexing and classification. However, topic models are increasingly important as tools of applied research. As we seek to develop agents capable of having real conversations with humans, topic models are needed to control topic drift and guide the conversation. Unfortunately, the most popular topic models in use today do not provide a suitable topic structure for these purposes and the state-of-the-art models based on neural networks suffer from many of the same drawbacks while requiring specialized hardware and many hours to train.
We take a fundamentally different approach to topic modelling. Our algorithm, Community Topic, is based on mining communities of terms from term-occurrence networks extracted from the documents. In addition to providing interpretable collections of terms as topics, the network representation provides a natural topic structure. The topics form a network, so topic similarity is inferred from the weights of the edges between them. Super-topics can be found by iteratively applying community detection on the topic network, grouping similar topics together. Sub-topics can be found by iteratively applying community detection on a single topic community. This can be done dynamically, with the user or conversation agent moving up and down the topic hierarchy as desired.
We evaluate Community Topic against two contenders. We find that our algorithm detects topics with the highest coherence as measured by two standard automated metrics. Our algorithm has the fastest run time and detects topics in few seconds with no specialized hardware required. It is hyperparameter free and can detect topics at multiple scales. It finds coherent sub- and super-topics at multiple levels. This makes Community Topic an ideal topic modelling algorithm for both applied research and practical applications like conversational agents.
