Interpretable And Effective Hashing Via Bernoulli Variational Auto-Encoders

INTELLIGENT DATA ANALYSIS(2020)

引用 0|浏览4
暂无评分
摘要
Due to the rapid increase in the amount of data generated in many fields of science and engineering, information retrieval methods tailored to large-scale datasets have become increasingly important in the last years. Semantic hashing is an emerging technique for this purpose that works on the idea of representing complex data objects, like images and text, using similarity-preserving binary codes that are then used for indexing and search.In this paper, we investigate a hashing algorithm that uses a deep variational auto-encoder to learn and predict the codes. Unlike previous approaches of this type, that learn a continuous (Gaussian) representation and then project the embedding to obtain hash codes, our method employs Bernoulli latent variables in both the training and the prediction stage. Constraining the model to use a binary encoding allow us to obtain a more interpretable representation for hashing: each factor in the generative model represents a bit that should help to reconstruct and thus identify the input pattern. Interestingly, we found that the binary constraint does not lead to a loss but an increase of search accuracy. We argue that continuous formulations learn a representation that can significantly differ from the code used for search. Minding this gap in the design of the auto-encoder can translate into more accurate retrieval results. Extensive experiments on seven datasets involving image data and text data illustrate these findings and demonstrate the advantages of our approach.
更多
查看译文
关键词
Hashing, variational autoencoders, deep learning, Gumbel-Softmax distribution, neural information retrieval
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要