Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller
CoRR(2024)
摘要
Storytelling aims to generate reasonable and vivid narratives based on an
ordered image stream. The fidelity to the image story theme and the divergence
of story plots attract readers to keep reading. Previous works iteratively
improved the alignment of multiple modalities but ultimately resulted in the
generation of simplistic storylines for image streams. In this work, we propose
a new pipeline, termed LLaMS, to generate multimodal human-level stories that
are embodied in expressiveness and consistency. Specifically, by fully
exploiting the commonsense knowledge within the LLM, we first employ a sequence
data auto-enhancement strategy to enhance factual content expression and
leverage a textual reasoning architecture for expressive story generation and
prediction. Secondly, we propose SQ-Adatpter module for story illustration
generation which can maintain sequence consistency. Numerical results are
conducted through human evaluation to verify the superiority of proposed LLaMS.
Evaluations show that LLaMS achieves state-of-the-art storytelling performance
and 86
SOTA methods. Furthermore, ablation experiments are conducted to verify the
effectiveness of proposed sequence data enhancement and SQ-Adapter.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要