Chrome Extension
WeChat Mini Program
Use on ChatGLM

Epidemiologic information discovery from open-access COVID-19 case reports via pretrained language model

SSRN Electronic Journal(2022)

Cited 1|Views20
No score
Abstract
Although open-access data are increasing common and useful to epidemiological research, curation of such datasets is resource-intensive and time-consuming. Despite a major source of COVID-19 data, the regularly disclosed case reports were often written in natural language with unstructured format. Here we propose a computational framework that can automatically extract epidemiological information from open-access COVID-19 case reports. We develop this framework by coupling language model developed using deep neural networks with training samples compiled using an optimized data annotation strategy. When applying to the COVID-19 case reports collected from mainland China, our novel framework outstrips all other state-of-the-art deep learning models. The information extracted from our approach is highly consistent with that obtained from the gold-standard manual coding, with a matching rate of 80%. To implement our algorithm, we provide an open-access online platform that can accurately estimate epidemiological statistics in real-time with substantially reduced burden in data curation.
More
Translated text
Key words
Health sciences,Virology,Artificial intelligence,Machine learning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined