Surrogate-based optimization of learning strategies for additively regularized topic models

Maria Khodorchenko,Nikolay Butakov,Timur Sokhin,Sergey Teryoshkin

LOGIC JOURNAL OF THE IGPL（2023）

引用 2|浏览0

暂无评分

摘要

Topic modelling is a popular unsupervised method for text processing that provides interpretable document representation. One of the most high-level approaches is additively regularized topic models (ARTM). This method features better quality than other methods due to its flexibility and advanced regularization abilities. However, it is challenging to find an optimal learning strategy to create high-quality topics because a user needs to select the regularizers with their values and determine the order of application. Moreover, it may require many real runs or model training which makes this task time consuming. At the current moment, there is a lack of research on parameter optimization for ARTM-based models. Our work proposes an approach that formalizes the learning strategy into a vector of parameters which can be solved with evolutionary approach. We also propose a surrogate-based modification which utilizes machine learning methods that makes the approach for parameters search time efficient. We investigate different optimization algorithms (evolutionary and Bayesian) and their modifications with surrogates in application to topic modelling optimization using the proposed learning strategy approach. An experimental study conducted on English and Russian datasets indicates that the proposed approaches are able to find high-quality parameter solutions for ARTM and substantially reduce the execution time of the search.

查看译文

关键词

Topic modelling, learning strategy, genetic algorithm, surrogate models, machine learning, evolutionary optimization

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要