MLLM-TA: Leveraging Multimodal Large Language Models for Precise Temporal Video Grounding
IEEE SIGNAL PROCESSING LETTERS(2025)
Key words
Visualization,Grounding,Large language models,Feature extraction,Benchmark testing,Vectors,Training,Nickel,Electronic mail,Biological system modeling,Highlight detection,multimedia large language modeling,precise temporal alignment,video grounding
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined