Self-Supervised Video Interaction Classification using Image Representation of Skeleton Data.

Farzaneh Askari,Ruixi Jiang, Zhiwei Li, Jiatong Niu, Yuyan Shi,James J. Clark

CVPR Workshops(2023)

引用 1|浏览4
暂无评分
摘要
Recognizing interactions from sports games broadcast videos is an application of Interaction Recognition from Videos (IRV), that offers many challenges due to complex interactions that are often recorded from a suboptimal view point. Annotating large scale sports specific datasets is expensive and time-consuming. Therefore, in this study, we propose to demonstrate the effectiveness of applying Self-Supervised Learning (SSL) methods for building useful representations from human skeleton pose data (pose for short) without requiring costly annotations for a large scale dataset. Given the numerous well established image-based SSL methods, we demonstrate how to adapt them for sequences of pose through data transformation and a series of pose-based augmentations. We specifically adapt the Relational Reasoning SSL (Relational-SSL for short) [27] and achieve 68.18 ± 0% and 76.62 ± 2.7% in linear evaluation and finetuning protocols, respectively, for the downstream task of IRV from sports broadcast videos. Lastly, we run ablation studies on different components of the method, including the effect of using estimated pose (versus ground truth) on the performance of the downstream task. 1
更多
查看译文
关键词
annotating large scale sports specific datasets,complex interactions,costly annotations,data transformation,human skeleton,image representation,Interaction Recognition,IRV,numerous well established image-based SSL methods,pose-based augmentations,Relational Reasoning SSL,Relational-SSL,scale dataset,Self-Supervised,skeleton data,sports broadcast videos,sports games broadcast videos,suboptimal view point,Supervised video Interaction classification,useful representations
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要