Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers
CoRR(2024)
Abstract
Direct image-to-graph transformation is a challenging task that solves object
detection and relationship prediction in a single model. Due to the complexity
of this task, large training datasets are rare in many domains, which makes the
training of large networks challenging. This data sparsity necessitates the
establishment of pre-training strategies akin to the state-of-the-art in
computer vision. In this work, we introduce a set of methods enabling
cross-domain and cross-dimension transfer learning for image-to-graph
transformers. We propose (1) a regularized edge sampling loss for sampling the
optimal number of object relationships (edges) across domains, (2) a domain
adaptation framework for image-to-graph transformers that aligns features from
different domains, and (3) a simple projection function that allows us to
pretrain 3D transformers on 2D input data. We demonstrate our method's utility
in cross-domain and cross-dimension experiments, where we pretrain our models
on 2D satellite images before applying them to vastly different target domains
in 2D and 3D. Our method consistently outperforms a series of baselines on
challenging benchmarks, such as retinal or whole-brain vessel graph extraction.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined