Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
CVPR 2024(2024)
Abstract
Significant progress has been made in scene text detection models since the
rise of deep learning, but scene text layout analysis, which aims to group
detected text instances as paragraphs, has not kept pace. Previous works either
treated text detection and grouping using separate models, or train a model
from scratch while using a unified one. All of them have not yet made full use
of the already well-trained text detectors and easily obtainable detection
datasets. In this paper, we present Text Grouping Adapter (TGA), a module that
can enable the utilization of various pre-trained text detectors to learn
layout analysis, allowing us to adopt a well-trained text detector right off
the shelf or just fine-tune it efficiently. Designed to be compatible with
various text detector architectures, TGA takes detected text regions and image
features as universal inputs to assemble text instance features. To capture
broader contextual information for layout analysis, we propose to predict text
group masks from text instance features by one-to-many assignment. Our
comprehensive experiments demonstrate that, even with frozen pre-trained
models, incorporating our TGA into various pre-trained text detectors and text
spotters can achieve superior layout analysis performance, simultaneously
inheriting generalized text detection ability from pre-training. In the case of
full parameter fine-tuning, we can further improve layout analysis performance.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined