Incorporating Visual and Textual Cues in Dialogue Generation : An Application to Comic Strips

semanticscholar(2019)

Cited 0|Views1
No score
Abstract
Conventional dialogue generation systems mostly leverage textual data and are incapable of detecting and acting on the visual cues while making conversation. Thus, they cannot be used to generate dialogue-oriented compositions such as scripts for television or comic strips which heavily use visual cues. In this work, we try to overcome this obstacle and propose a system which can make use of such cues to generate comic strips. First, we propose a baseline approach based on a conditional variational autoencoder which is only capable of predicting the last speech bubble of a strip. We then model the task as a visual story telling problem and adapt an encoderdecoder style model in order to generate entire comic strips. So as to test this story telling-based approach, we propose new metrics and also perform a qualitative human evaluation on the results. We notice that this model is able to detect the setting of a strip and the characters involved in most cases. It is also able to generate some coherent strips. We believe that the results are promising and that they warrant further research in this area.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined