Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models
CoRR(2023)
摘要
Scene text detection techniques have garnered significant attention due to
their wide-ranging applications. However, existing methods have a high demand
for training data, and obtaining accurate human annotations is labor-intensive
and time-consuming. As a solution, researchers have widely adopted synthetic
text images as a complementary resource to real text images during
pre-training. Yet there is still room for synthetic datasets to enhance the
performance of scene text detectors. We contend that one main limitation of
existing generation methods is the insufficient integration of foreground text
with the background. To alleviate this problem, we present the Diffusion Model
based Text Generator (DiffText), a pipeline that utilizes the diffusion model
to seamlessly blend foreground text regions with the background's intrinsic
features. Additionally, we propose two strategies to generate visually coherent
text with fewer spelling errors. With fewer text instances, our produced text
images consistently surpass other synthetic data in aiding text detectors.
Extensive experiments on detecting horizontal, rotated, curved, and line-level
texts demonstrate the effectiveness of DiffText in producing realistic text
images.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要