Textual Inversion and Self-supervised Refinement for Radiology Report Generation
CoRR(2024)
Abstract
Existing mainstream approaches follow the encoder-decoder paradigm for
generating radiology reports. They focus on improving the network structure of
encoders and decoders, which leads to two shortcomings: overlooking the
modality gap and ignoring report content constraints. In this paper, we
proposed Textual Inversion and Self-supervised Refinement (TISR) to address the
above two issues. Specifically, textual inversion can project text and image
into the same space by representing images as pseudo words to eliminate the
cross-modeling gap. Subsequently, self-supervised refinement refines these
pseudo words through contrastive loss computation between images and texts,
enhancing the fidelity of generated reports to images. Notably, TISR is
orthogonal to most existing methods, plug-and-play. We conduct experiments on
two widely-used public datasets and achieve significant improvements on various
baselines, which demonstrates the effectiveness and generalization of TISR. The
code will be available soon.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined