Actionability of Synthetic Data in a Heterogeneous and Rare Healthcare Demographic; Adolescents and Young Adults (AYAs) with Cancer

medrxiv(2024)

引用 0|浏览6
暂无评分
摘要
Purpose : Research on rare diseases and atypical healthcare demographics is often slowed by high inter-subject heterogeneity and overall scarcity of data. Synthetic data (SD) has been proposed as means for data sharing, enlargement, and diversification, by artificially generating ′real′ phenomena while obscuring the ′real′ subject data. The utility of SD is actively scrutinised in healthcare research, but the role of sample size for actionability of SD is insufficiently explored. We aim to understand the interplay of actionability and sample size by generating SD sets of varying sizes from gradually diminishing amounts of real subjects′ data. We evaluate the actionability of SD in a highly heterogeneous and rare demographic: adolescents and young adults (AYAs) with cancer. Methodology : A population-based cross-sectional cohort study of 3735 AYAs was sub-sampled at random to produce 13 training datasets of varying sample sizes. We studied four distinct generator architectures built on the open-source Synthetic Data Vault library. Each architecture was used to generate SD of varying sizes based on each aforementioned training subsets. SD actionability was assessed by comparing the resulting SD to its respective ′real′ data against three metrics - veracity, utility, and privacy concealment. Results : All examined generator architectures yielded actionable data when generating SD with sizes similar to the ′real′ data. Large SD sample size increased veracity but generally increased privacy risks. Using fewer training subjects led to faster convergence in veracity, but partially exacerbated privacy concealment issues. Conclusion : SD is a potentially promising option for data sharing and data augmentation, yet sample size plays a significant role in its actionability. SD generation should go hand-in-hand with consistent scrutiny and sample size should be carefully considered in this process. ### Competing Interest Statement The authors have declared no competing interest. ### Clinical Protocols ### Funding Statement J. Hogenboom, A.L.A.J. Dekker, W.T.A. Van Der Graaf, O. Husson, and L.Y.L. Wee are supported by the European Union′s Horizon 2020 research and innovation programme through The STRONG-AYA Initiative (Grant agreement ID: 101057482). A. Lobo Gomes is supported by Innovative Medicines Initiative (IMI), Digital Oncology Network for Europe (DigiONE), and the European Regional Development Fund (ERDF). O. Husson is also supported by the Netherlands Organization for Scientific Research through a Vidi grant (ID: 198.007). L.Y.L. Wee is also supported by ZonMW and Stichting Hanarth Fonds ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: The SURVAYA study was conducted in accordance with the Declaration of Helsinki, and was approved by the Netherlands Cancer Institute Institutional Review Board (IRBIRBd18122) on 6 February 2019. In the presented work, the SURVAYA study was re-used with permission of principal investigator and the study sponsor. I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes All data produced and used in the presented work are available upon reasonable request to the authors. The code and software versioning used in the presented work are available on GitHub with a working example, see: https://github.com/MaastrichtU-CDS/AYA-synthetic-data.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要