Optimizing image captioning algorithm to facilitate english writing

Xiaxia Cao, Yao Zhao,Xiang Li

Education and Information Technologies(2024)

引用 0|浏览8
暂无评分
摘要
Various studies have been conducted on applying intelligent recognition technology, especially speech recognition technology to improve English learning ability, mostly listening and speaking. However, few studies have touched on how image-to-text recognition technology can be used for writing. The present research was conducted to fill this gap by exploring the optimization of a deep-learning-based image captioning algorithm to facilitate English writing, so as to enable learners to break the time and space limitations and learn English writing (including sentence patterns, spelling, vocabulary, and grammar) anytime and anywhere by taking pictures. Therefore, this paper focused on image captioning based on CNN(Convolutional Neural Networks) and LSTM(Long Short-Term Memory), using DenseNet201 or Vision Transformer trained on the ImageNet-1K image classification dataset as the image encoder and LSTM as the decoder. First, pre-training was performed on the Flickr8k dataset. After selecting the best-trained model as the pre-trained weight model for the COCO dataset, fine-tuning optimization was performed on the COCO dataset, and the attention mechanism was used to design the ablation experiment. The BLEU-4, CIDEr, METEOR, and ROUGE evaluation indexes of the optimized model on the test set were 0.3437,1.121,0.2750, and 0.5117, respectively. The study results showed that the convergence of the model was accelerated and had better performance. The model was used to automatically caption 12 images that had never been used during the training process. The descriptions generated by the optimized image captioning algorithm have lexical and syntactic accuracy, and matched what the images expressed, showing that this improved algorithm could be used as a learning tool to help English learners improve lexical and syntactic acquisition to promote writing through the generated descriptions of the pictures taken anytime and anywhere in real-life situations.
更多
查看译文
关键词
English writing,Deep learning,Image captioning,Attention mechanism
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要