Text Guided Facial Image Synthesis Using StyleGAN and Variational Autoencoder Trained CLIP

Anagha Srinivasa, Anjali Praveen, Anusha Mavathur, Apurva Pothumarthi,Arti Arya,Pooja Agarwal

ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II(2023)

引用 0|浏览1
暂无评分
摘要
The average user may have little to no artistic skills but can describe what they envision in words. The user-provided text can be instantly transformed into a realistic image with the aid of generative neural architectures. This study intends to propose a novel approach to generate a facial image based on a user-given textual description. Prior works focus less on the manipulation aspects, hence the approach also emphasizes on manipulating and modifying the image generated, based on additional textual descriptions as required to further refine the expected face. It consists of a multi-level Vector-Quantized Variational Auto Encoder (VQVAE) that provides the image encodings, the Contrastive Language-Image Pre-Training (CLIP) module to interpret the texts and compute how close the final image encodings and the text are with each other within a common space, and a StyleGAN2 to decode and generate the required image output. The combination of such components within the architecture is unseen in previous studies and yields promising results, capturing the context of the text and generating realistic good quality images of human faces.
更多
查看译文
关键词
Facial synthesis,Image manipulation,Vector Quantized Variational Autoencoders (VQVAE),Contrastive Language Image,Pre-training (CLIP),StyleGAN2
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要