Multi-caption Text-to-Face Synthesis: Dataset and Algorithm

International Multimedia Conference(2021)

引用 34|浏览51
暂无评分
摘要
ABSTRACTText-to-Face synthesis with multiple captions is still an important yet less addressed problem because of the lack of effective algorithms and large-scale datasets. We accordingly propose a Semantic Embedding and Attention (SEA-T2F) network that allows multiple captions as input to generate highly semantically related face images. With a novel Sentence Features Injection Module, SEA-T2F can integrate any number of captions into the network. In addition, an attention mechanism named Attention for Multiple Captions is proposed to fuse multiple word features and synthesize fine-grained details. Considering text-to-face generation is an ill-posed problem, we also introduce an attribute loss to guide the network to generate sentence-related attributes. Existing datasets for text-to-face are either too small or roughly generated according to attribute labels, which is not enough to train deep learning based methods to synthesize natural face images. Therefore, we build a large-scale dataset named CelebAText-HQ, in which each image is manually annotated with 10 captions. Extensive experiments demonstrate the effectiveness of our algorithm.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要