Long short-term memory network with external memories for image caption generation

JOURNAL OF ELECTRONIC IMAGING（2019）

引用 2|浏览3

暂无评分

摘要

In long short-term memory (LSTM) neural networks, the input gates and output gates control information flowing into and out of memory cells. For sequence-to-sequence learning problems, each element is input into the network only once. If the input gates are closed at a certain step, the information is lost and is not input again. The same problem exists for the output gates. Therefore, the input and output gates do not fully support the roles of gating. An LSTM network with external memories, in which separate memories are installed for the input and output gates, is proposed. Information that is blocked by the input gates is preserved in the input memories, enabling the cells to read these memories when necessary. Similarly, information blocked by the output gates is preserved in the output memories and flows out to hidden units of the network at an appropriate time. In addition, a dynamic attention model is proposed to take into account the attention history. It provides guidance when predicting the attention weights at each step. The proposed model exploits attention-based encoder-decoder architecture to generate image captions. Experiments were conducted on three benchmark datasets, namely Flickr8k, Flickr30k, and MSCOCO, to demonstrate the effectiveness of the proposed approach. Captions generated by the proposed method are longer and more informative than those obtained with the original LSTM network. (C) 2019 SPIE and IS&T

查看译文

关键词

long short-term memory,external memory,image caption generation,attention-based model

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要