Using N-Grams For Syndromic Surveillance In A Turkish Emergency Department Without English Translation: A Feasibility Study

BIOMEDICAL INFORMATICS INSIGHTS(2013)

引用 0|浏览4
暂无评分
摘要
Introduction: Syndromic surveillance is designed for early detection of disease outbreaks. An important data source for syndromic surveillance is free-text chief complaints (CCs), which are generally recorded in the local language. For automated syndromic surveillance, CCs must be classified into predefined syndromic categories. The n-gram classifier is created by using text fragments to measure associations between chief complaints (CC) and a syndromic grouping of ICD codes.Objectives: The objective was to create a Turkish n-gram CC classifier for the respiratory syndrome and then compare daily volumes between the n-gram CC classifier and a respiratory ICD-10 code grouping on a test set of data.Methods: The design was a feasibility study based on retrospective cohort data. The setting was a university hospital emergency department (ED) in Turkey. Included were all ED visits in the 2002 database of this hospital. Two of the authors created a respiratory grouping of International Classification of Diseases, 10th Revision ICD-10-CM codes by consensus, chosen to be similar to a standard respiratory (RESP) grouping of ICD codes created by the Electronic Surveillance System for Early Notification of Community-based Epidemics (ESSENCE), a project of the Centers for Disease Control and Prevention. An n-gram method adapted from AT& T Labs' technologies was applied to the first 10 months of data as a training set to create a Turkish CC RESP classifier. The classifier was then tested on the subsequent 2 months of visits to generate a time series graph and determine the correlation with daily volumes measured by the CC classifier versus the RESP ICD-10 grouping.Results: The Turkish ED database contained 30,157 visits. The correlation (R2) of n-gram versus ICD-10 for the test set was 0.78.Conclusion: The n-gram method automatically created a CC RESP classifier of the Turkish CCs that performed similarly to the ICD-10 RESP grouping. The n-gram technique has the advantage of systematic, consistent, and rapid deployment as well as language independence.
更多
查看译文
关键词
disease outbreaks, epidemiology, public health, surveillance, n-gram
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要