Prediction of 35 Target Per- and Polyfluoroalkyl Substances (PFASs) in California Groundwater Using Multilabel Semisupervised Machine Learning

Jichen Dong, Gary Tsai,Christopher I. Olivares

ACS ES&T water(2023)

引用 0|浏览0
暂无评分
摘要
Comprehensive monitoring of perfluoroalkyl and polyfluoroalkyl substances (PFASs) is challenging because of the high analytical cost and an increasing number of analytes. We developed a machine learning pipeline to understand environmental features influencing PFAS profiles in groundwater. By examining 23 public data sets (2016–2022) in California, we built a state-wide groundwater database (25,000 observations across 4200 wells) encompassing contamination sources, weather, air quality, soil, hydrology, and groundwater quality (PFASs and cocontaminants). We used supervised learning to prescreen total PFAS concentrations above 70 ng/L and multilabel semisupervised learning to predict 35 individual PFAS concentrations above 2 ng/L. Random forest with ADASYN oversampling performed the best for total PFASs (AUROC 99%). XGBoost with SMOTE oversampling achieved the AUROC of 73–100% for individual PFAS prediction. Contamination sources and soil variables contributed the most to accuracy. Individual PFASs were strongly correlated within each PFAS’s subfamily (i.e., short- vs long-chain PFCAs, sulfonamides). These associations improved prediction performance using classifier chains, which predicts a PFAS based on previously predicted species. We applied the model to reconstruct PFAS profiles in groundwater wells with missing data in previous years. Our approach can complement monitoring programs of environmental agencies to validate previous investigation results and prioritize sites for future PFAS sampling.
更多
查看译文
关键词
california groundwater,polyfluoroalkyl substances,machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要