Digital ethnicity data in population-wide electronic health records in England: a description of completeness, coverage, and granularity of diversity

medrxiv(2022)

引用 0|浏览7
暂无评分
摘要
Background The link between ethnicity and healthcare inequity, and the urgency for better data is well-recognised. This study describes ethnicity data in nation-wide electronic health records in England, UK. Methods We conducted a retrospective cohort study using de-identified person-level records for the England population available in the National Health Service (NHS) Digital trusted research environment. Primary care records (GDPPR) were linked to hospital and national mortality records. We assessed completeness, consistency, and granularity of ethnicity records using all available SNOMED-CT concepts for ethnicity and NHS ethnicity categories. Findings From 61.8 million individuals registered with a primary care practice in England, 51.5 (83.3%) had at least one ethnicity record in GDPPR, increasing to 93·9% when linked with hospital records. Approximately 12·0% had at least two conflicting ethnicity codes in primary care records. Women were more likely to have ethnicity recorded than men. Ethnicity was missing most frequently in individuals from 18 to 39 years old and in the southern regions of England. Individuals with an ethnicity record had more comorbidities recorded than those without. Of 489 SNOMED-CT ethnicity concepts available, 255 were used in primary care records. Discrepancies between SNOMED-CT and NHS ethnicity categories were observed, specifically within “Other-” ethnicity groups. Interpretation More than 250 ethnicity sub-groups may be found in health records for the English population, although commonly categorised into “White”, “Black”, “Asian”, “Mixed”, and “Other”. One in ten individuals do not have ethnicity information recorded in primary care or hospital records. SNOMED-CT codes represent more diversity in ethnicity groups than the NHS ethnicity classification. Improved recording of self-reported ethnicity at first point-of-care and consistency in ethnicity classification across healthcare settings can potentially improve the accuracy of ethnicity in research and ultimately care for all ethnicities. Funding British Heart Foundation Data Science Centre led by Health Data Research UK. Evidence before this study Ethnicity has been highlighted as a significant factor in the disproportionate impact of SARS-CoV-2 infection and mortality. Better knowledge of ethnicity data recorded in real clinical practice is required to improve health research and ultimately healthcare. We searched PubMed from database inception to 14th July 2022 for publications using the search terms “ethnicity” and “electronic health records” or “EHR,” without language restrictions. 228 publications in 2019, before the COVID-19 pandemic, and 304 publications between 2020 and 2022 were identified. However, none of these publications used or reported any of over 400 available SNOMED-CT concepts for ethnicity to account for more granularity and diversity than captured by traditional high-level classification limited to 5 to 9 ethnicity groups. Added value of this study We provide a comprehensive study of the largest collection of ethnicity records from a national-level electronic health records trusted research environment, exploring completeness, consistency, and granularity. This work can serve as a data resource profile of ethnicity from routinely-collected EHR in England. Implications of all the available evidence To achieve equity in healthcare, we need to understand the differences between individuals, as well as the influence of ethnicity both on health status and on health interventions, including variation in the behaviour of tests and therapies. Thus, there is a need for measurements, thresholds, and risk estimates to be tailored to different ethnic groups. This study presents the different medical concepts describing ethnicity in routinely collected data that are readily available to researchers and highlights key elements for improving their accuracy in research. We aim to encourage researchers to use more granular ethnicity than the than typical approaches which aggregate ethnicity into a limited number of categories, failing to reflect the diversity of underlying populations. Accurate ethnicity data will lead to a better understanding of individual diversity, which will help to address disparities and influence policy recommendations that can translate into better, fairer health for all. ### Competing Interest Statement All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: AA is supported by Health Data Research UK (HDR-9006), which receives its funding from the UK Medical Research Council (MRC, grant MR/V028367/1), and Administrative Data Research UK, which is funded by the ESRC (grant ES/S007393/1). CT is supported by a UCL UKRI Centre for Doctoral Training in AI-enabled Healthcare studentship (EP/S021612/1), MRC Clinical Top-Up and a studentship from the NIHR Biomedical Research Centre at University College London Hospital NHS Trust. KK is the director of Centre for Ethnic Health Research, and trustee of South Asian Health Foundation. SK has received research grant funding from the UKRI and Alan Turing Institute outside this work. SK and DPA's research group has received grant/s from Amgen, Chiesi-Taylor, Lilly, Janssen, Novartis, and UCB Biopharma. His research group has received consultancy fees from Astra Zeneca and UCB Biopharma. Amgen, Astellas, Janssen, Synapse Management Partners and UCB Biopharma have funded or supported training programmes organised by DPA's department. No other relationships or activities that could appear to have influenced the submitted work. ### Funding Statement This study was funded by the BHF Data Science Centre led by HDR UK (BHF Grant no. SP/19/3/34678) and by UK Research and Innovation (grant ref MC\_PC\_20058). This work was also supported by The Alan Turing Institute via "Towards Turing 2.0" EPSRC Grant Funding. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: Ethics committee/IRB of The North East - Newcastle and North Tyneside 2 research gave ethical approval for this work (REC no: 20/NE/0161). I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable. Yes The data used in this study are available in NHS Digital's TRE for England but, as restrictions apply, they are not publicly available (https://digital.nhs.uk/coronavirus/coronavirus-data-services-updates/trusted-research-environment-service-for-england). The CVD-COVID-UK/COVID-IMPACT programme led by the BHF Data Science Centre (https://www.hdruk.ac.uk/helping-with-health-data/bhf-data-science-centre/) received approval to access data in NHS Digital's TRE for England from the Independent Group Advising on the Release of Data (IGARD) (https://digital.nhs.uk/about-nhs-digital/corporate-information-and-documents/independent-group-advising-on-the-release-of-data) via an application made in the Data Access Request Service (DARS) Online system (ref. DARS-NIC-381078-Y9C5K) (https://digital.nhs.uk/services/data-access-request-service-dars/dars-products-and-services). The CVD-COVID-UK/COVID-IMPACT Approvals & Oversight Board (https://www.hdruk.ac.uk/projects/cvd-covid-uk-project/) subsequently granted approval to this project to access the data within NHS Digital's TRE for England. The de-identified data used in this study were made available to accredited researchers only. Those wishing to gain access to the data should contact bhfdsc{at}hdruk.ac.uk in the first instance.
更多
查看译文
关键词
digital ethnicity data,population-wide
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要