Chrome Extension
WeChat Mini Program
Use on ChatGLM

Cross-Modal Pixel-and-Stroke representation aligning networks for free-hand sketch recognition

EXPERT SYSTEMS WITH APPLICATIONS(2024)

Cited 0|Views13
No score
Abstract
We consider the cross-modal alignment problem for free-hand sketch. Given a sequence of stroke and a rasterized image, the objective is to enhance the performance of sketch recognition through cross-modal interactions. Existing works mostly employ simple weighted adding and concatenation for late fusion, or shallow attention layers for cross-modal alignment. Due to the high heterogeneity between sketch modalities, these methods do not capture meaningful feature representations sufficiently. In this paper, we propose a sketch recognition frame-work CMPS for aligning Cross-Modal Pixel-and-Stroke representation, which includes novel components, namely the Semantic-Temporal Alignment Rasterization (STAR) and Pixel-Stroke Alignment (PSA) module. STAR aligns stroke with image at the semantic and temporal levels during the rasterization preprocessing phase by utilizing color variations in the RGB space for sketch. PSA, through its pre-alignment and post-alignment, learns how to align semantic connections at both pixel and stroke levels, capturing cross-modal dependencies, rather than relying on shallow matrix operations for interaction. Additionally, we introduce a concise stroke processing network called StrokeFormer. It extracts two hierarchical features, i.e., point-level and stroke-level, based on the formation mechanism of sketch. StrokeFormer outperforms most RNN-based and CNN-based models by a significant margin. Our experimental results demonstrate that proposed CMPS achieves new state-of-the-art performance on the Google QuickDraw-414 K dataset and TU-Berlin dataset. The code is available at https://github. com/WoodratTradeCo/CMPS.
More
Translated text
Key words
Sketch recognition,Multi-modal alignment,Feature fusion
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined