LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model
arxiv(2024)
摘要
Previous work has shown that well-crafted adversarial perturbations can
threaten the security of video recognition systems. Attackers can invade such
models with a low query budget when the perturbations are semantic-invariant,
such as StyleFool. Despite the query efficiency, the naturalness of the minutia
areas still requires amelioration, since StyleFool leverages style transfer to
all pixels in each frame. To close the gap, we propose LocalStyleFool, an
improved black-box video adversarial attack that superimposes regional
style-transfer-based perturbations on videos. Benefiting from the popularity
and scalably usability of Segment Anything Model (SAM), we first extract
different regions according to semantic information and then track them through
the video stream to maintain the temporal consistency. Then, we add
style-transfer-based perturbations to several regions selected based on the
associative criterion of transfer-based gradient information and regional area.
Perturbation fine adjustment is followed to make stylized videos adversarial.
We demonstrate that LocalStyleFool can improve both intra-frame and inter-frame
naturalness through a human-assessed survey, while maintaining competitive
fooling rate and query efficiency. Successful experiments on the
high-resolution dataset also showcase that scrupulous segmentation of SAM helps
to improve the scalability of adversarial attacks under high-resolution data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要