GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
arxiv(2023)
摘要
GenerateCT, the first approach to generating 3D medical imaging conditioned
on free-form medical text prompts, incorporates a text encoder and three key
components: a novel causal vision transformer for encoding 3D CT volumes, a
text-image transformer for aligning CT and text tokens, and a text-conditional
super-resolution diffusion model. Given the absence of directly comparable
methods in 3D medical imaging, we established baselines with cutting-edge
methods to demonstrate our method's effectiveness. GenerateCT significantly
outperforms these methods across all key metrics. Importantly, we explored
GenerateCT's clinical applications by evaluating its utility in a
multi-abnormality classification task. First, we established a baseline by
training a multi-abnormality classifier on our real dataset. To further assess
the model's generalization to external datasets and its performance with unseen
prompts in a zero-shot scenario, we employed an external dataset to train the
classifier, setting an additional benchmark. We conducted two experiments in
which we doubled the training datasets by synthesizing an equal number of
volumes for each set using GenerateCT. The first experiment demonstrated an 11
improvement in the AP score when training the classifier jointly on real and
generated volumes. The second experiment showed a 7
on both real and generated volumes based on unseen prompts. Moreover,
GenerateCT enables the scaling of synthetic training datasets to arbitrary
sizes. As an example, we generated 100,000 3D CT volumes, fivefold the number
in our real dataset, and trained the classifier exclusively on these synthetic
volumes. Impressively, this classifier surpassed the performance of the one
trained on all available real data by a margin of 8
evaluated the generated volumes, confirming a high degree of alignment with the
text prompt.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要