What, When, and Where? - Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions.
ICLR 2024(2024)
Key words
Self-supervised learning,Video grounding,Multimodal learning
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined