Usage of compressed domain in fast frameworks

Hasan Sait Arslan, Simon Archambault,Prakruti Bhatt, Keita Watanabe, Josue Cuevaz, Phuc Le, Denis Miller, Viktor Zhumatiy

Signal, Image and Video Processing(2022)

引用 3|浏览1
暂无评分
摘要
There has been considerable progress in the applications of Convolutional Neural Networks (CNNs) to computer vision tasks with RGB images. A few studies investigated gaining more performance by replacing RGB representation with block-wise Discrete Cosine Transform (DCT) coefficients. DCT coefficients that are readily available during JPEG decoding might be competitive with the output of computationally costly initial CNN layers fed by RGB representation. Despite the attractiveness of the approach, up to our knowledge, there is only a single study targeting the use of DCT coefficients with the low-latency models. In this paper, we investigate the usage of DCT coefficients firstly with MnasNet, a mobile image classification model processing thousands of images per second on a single modern GPU, and secondly with Yolov5, which holds the benchmark performance on Average Precision (AP) and latency. After applying our methods to MnasNet (1.0) and evaluating performance on the ImageNet dataset, we observe competitive accuracy with RGB-based MnasNet (1.0) and significantly higher processing speed compared to RGB-based MnasNet (0.5). After applying our methods to Yolov5, we evaluate performance on three benchmark datasets. The resulting DCT-based object detection model processes up to 519 more images per second, while demonstrating up to 4.7% AP drop on MSCOCO test-dev set, up to 5.1% AP drop on Pascal VOC 2007 test set, and up to 3.8% AP drop on Crowd Human (Full-Body) validation set.
更多
查看译文
关键词
Discrete Cosine Transform,Image classification,Object detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要