Automatic identification of ensembles of critical futures in large datasets

crossref(2024)

引用 0|浏览1
暂无评分
摘要
In climate risk modelling, the growing trend of simulating large ensembles is driven by the need to understand a wide range of possible future scenarios. This approach generates vast datasets, which presents a challenge: identifying the most critical scenarios that could have significant impacts. While mainstream data patterns offer general insights, outliers provide unique perspectives, specifying areas for further investigation. However, focusing on single outliers is not optimal. Instead, analysing groups of outliers enables a more comprehensive exploration for the identification of patterns in multiple plausible future outcomes. In this context, we introduce the term ensemble of outliers to describe groups of data points deviating significantly from the mean of the dataset. An ensemble of outliers can help uncover underlying patterns and highlight areas for deeper exploration. These ensembles of outliers, once identified can possess distinct properties and indicate phenomena that are not represented in the rest of the dataset. Our research proposes a new method to address the challenge of identifying these ensembles of outliers within large datasets. Our methodology, Mahalanobis distance-based Ensemble of Outlier Detection (MEOD) includes Gaussian Mixture Models for probabilistic clustering coupled with Enhanced Mahalanobis distance-based statistical analysis to identify an ensemble of outliers in complex large datasets. MEOD's efficiency is validated through extensive testing on thousands of synthetic datasets, encompassing diverse configurations of both the dataset and an ensemble of outlier characteristics. The results indicate a high degree of accuracy for MEOD, with an average purity of 99.65% and an average F1 score of 0.92. To demonstrate the utility of MEOD to climate risk assessment, we implement our method on a large dataset of future agricultural production scenarios for the Indus River Basin (IRB). This large dataset was generated using an Integrated Assessment Model, Global Change Analysis Model and encompasses 3,000 scenarios outlining potential socioeconomic, water supply-demand, and land use changes up to the century's end. Our goal is to use MEOD to identify and analyse a critical ensemble of outliers that significantly drives water scarcity in IRB's agricultural sector. We successfully identified 150 scenarios as an ensemble of outliers, characterised by their unique socioeconomic attributes and agricultural practices. These scenarios predominantly fall into two categories: 1) those involving increased competition for resources due to regional disparities and 2) those incorporating a mix of sustainable and conventional agricultural practices. This dichotomy highlights both overuse and intensive water resource utilisation scenarios, signalling significant agricultural withdrawals and high scarcity risks. Our findings demonstrate the MEOD's efficiency as a robust, versatile tool for analysing complex, large-scale datasets, providing nuanced insights into intricate data patterns.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要