Going Deeper Into Embedding Learning for Video Object Segmentation
2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)(2019)
摘要
In this paper, we investigate the principles of consistent training, between given reference and predicted sequence, for better embedding learning of semi-supervised video object segmentation. To accurately segment the target objects given the mask at the first frame, we realize that the expected feature embeddings of any consecutive frames should satisfy the following properties: 1) global consistency in terms of both foreground object(s) and background; 2) robust local consistency under a various object moving rate; 3) environment consistency between the training and inference process; 4) receptive consistency between the receptive fields of network and the variable scales of objects; 5) sampling consistency between foreground and background pixels to avoid training bias. With the principles in mind, we carefully design a simple pipeline to lift both accuracy and efficiency for video object segmentation effectively. With the ResNet-101 as the backbone, our single model achieves a J&F score of 81.0% on the validation set of Youtube-VOS benchmark without any bells and whistles. By applying multi-scale & flip augmentation at the testing stage, the accuracy can be further boosted to 82.4%. Code will be made available.
更多查看译文
关键词
video object segmentation,embedding learning,object recognition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络