OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
arxiv(2024)
摘要
Open-vocabulary object detection focusing on detecting novel categories
guided by natural language. In this report, we propose Open-Vocabulary
Light-Weighted Detection Transformer (OVLW-DETR), a deployment friendly
open-vocabulary detector with strong performance and low latency. Building upon
OVLW-DETR, we provide an end-to-end training recipe that transferring knowledge
from vision-language model (VLM) to object detector with simple alignment. We
align detector with the text encoder from VLM by replacing the fixed
classification layer weights in detector with the class-name embeddings
extracted from the text encoder. Without additional fusing module, OVLW-DETR is
flexible and deployment friendly, making it easier to implement and modulate.
improving the efficiency of interleaved attention computation. Experimental
results demonstrate that the proposed approach is superior over existing
real-time open-vocabulary detectors on standard Zero-Shot LVIS benchmark.
Source code and pre-trained models are available at
[https://github.com/Atten4Vis/LW-DETR].
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要