Evaluation of four machine learning models for signal detection

THERAPEUTIC ADVANCES IN DRUG SAFETY(2023)

引用 0|浏览4
暂无评分
摘要
Background:Logistic regression-based signal detection algorithms have benefits over disproportionality analysis due to their ability to handle potential confounders and masking factors. Feature exploration and developing alternative machine learning algorithms can further strengthen signal detection.Objectives:Our objective was to compare the signal detection performance of logistic regression, gradient-boosted trees, random forest and support vector machine models utilizing Food and Drug Administration adverse event reporting system data.Design:Cross-sectional study.Methods:The quarterly data extract files from 1 October 2017 through 31 December 2020 were downloaded. Due to an imbalanced outcome, two training sets were used: one stratified on the outcome variable and another using Synthetic Minority Oversampling Technique (SMOTE). A crude model and a model with tuned hyperparameters were developed for each algorithm. Model performance was compared against a reference set using accuracy, precision, F1 score, recall, the receiver operating characteristic area under the curve (ROCAUC), and the precision-recall curve area under the curve (PRCAUC).Results:Models trained on the balanced training set had higher accuracy, F1 score and recall compared to models trained on the SMOTE training set. When using the balanced training set, logistic regression, gradient-boosted trees, random forest and support vector machine models obtained similar performance evaluation metrics. The gradient-boosted trees hyperparameter tuned model had the highest ROCAUC (0.646) and the random forest crude model had the highest PRCAUC (0.839) when using the balanced training set.Conclusion:All models trained on the balanced training set performed similarly. Logistic regression models had higher accuracy, precision and recall. Logistic regression, random forest and gradient-boosted trees hyperparameter tuned models had a PRCAUC > 0.8. All models had an ROCAUC > 0.5. Including both disproportionality analysis results and additional case report information in models resulted in higher performance evaluation metrics than disproportionality analysis alone. Evaluating new methods to detect potential harmful adverse drug events in spontaneous report databasesBackground:The Food and Drug Administration (FDA) adverse event reporting system (FAERS) is a database that contains adverse event reports, medication error reports, and product quality complaints. The FDA uses statistical methods to identify potentially harmful drug-adverse event combinations, also known as signals, within FAERS. This study compared several different methods to identify harmful drug-related events from adverse event reports in FAERS. The performance of each method was compared to see which method worked best.Methods:Logistic regression-based signal detection methods have demonstrated superior performance due to their ability to handle variables that can distort the effect of other variables or hide potential associations. The development of other machine learning models is of interest. Machine learning models can define complex relationships between risk factors and outcomes. Our objective was to compare the signal detection performance of multiple models.Results:Our study show that two models (logistic regression and random forest) were better at identifying true signals than other models.Conclusions:The four methods have differing abilities on how well they identify adverse drug events in voluntarily reported surveillance data. Including both results of searches for unexpected associations between drugs and adverse events and additional case report information in models resulted in identifying more true signals than unexpected association results alone. The models can be replicated or modified for use by drug safety programs.
更多
查看译文
关键词
adverse drug events,machine learning,pharmacovigilance,signal detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要