Exploratory functional data analysis of multivariate densities for the identification of agricultural soil contamination by risk elements

JOURNAL OF GEOCHEMICAL EXPLORATION(2024)

引用 0|浏览3
暂无评分
摘要
Geochemical mapping of risk element concentrations in soils is performed in many countries around the world. It results in numerous large datasets of high analytical quality, which can be used to identify soils that violate individual legislative limits for safe food production. However, there is a lack of advanced data mining tools that would be suitable for sensitive exploratory data analysis of big data while respecting the natural variability of soil composition. To distinguish anthropogenic contamination from natural variations, the analysis of the entire data distribution for smaller subareas is key. In this article, we propose a new data mining methodology for geochemical mapping data based on functional data analysis of probability densities in the framework of Bayes spaces after post -stratification of a big dataset to smaller districts. The tools we propose allow us to analyse the entire distribution, going well beyond a superficial detection of extreme concentration anomalies. We illustrate the proposed methodology on a dataset gathered according to the Czech national legislation (1990-2009), whose information content has not yet been fully exploited. Taking into account specific properties of probability density functions and recent results for orthogonal decomposition of multivariate densities enabled us to reveal real contamination patterns that were so far only suspected in Czech agricultural soils. We process the above Czech soil composition dataset for Cu, Pb, and Zn by first compartmentalizing it into spatial units, the so-called districts, and by subsequently clustering these districts according to diagnostic features of their uni- and multivariate distributions at high concentration levels. These clusters were seen to correspond to compartments that show known features of contamination, such as historical metallurgy of non-ferrous metals and iron and steel production. Comparison between compartments, notably neighbouring districts with similar natural factors controlling soil variability, is key to the reliable distinction of diffuse contamination. In this work, we used soil contamination by Cu -bearing pesticides as an example for empirical testing of the proposed data mining approach. In general, there are no natural and justifiable thresholds of risk element concentrations that would be valid for geographical areas with too much natural heterogeneity. Therefore, national (or larger) soil geochemistry datasets cannot be processed as a whole. As we demonstrate in this paper, empirical knowledge and careful tailoring of statistical tools for the characteristic types of soil contamination are essential for unequivocal identification of the anthropogenic component in real datasets.
更多
查看译文
关键词
FDA for geochemical maps,FDA of univariate and multivariate densities,Compartmentalisation,Identification of Czech agricultural soil,contamination,Cu-bearing pesticides,Bayes spaces
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要