Spatial Cross-Validation for Globally Distributed Data.

Rita Beigaite,Michael Mechenich,Indre Zliobaite

International Conference on Discovery Science (DS)（2022）

引用 0|浏览3

暂无评分

摘要

Increasing amounts of large scale georeferenced data produced by Earth observation missions present new challenges for training and testing machine-learned predictive models. Most of this data is spatially auto-correlated, which violates the classical i.i.d. assumption (identically and independently distributed data) commonly used in machine learning. One of the largest challenges in relation to spatial auto-correlation is how to generate testing sets that are sufficiently independent of the training data. In the geoscience and ecological literature, spatially stratified cross-validation is increasingly used as an alternative to standard random cross-validation. Spatial cross-validation, however, is not yet widely studied in the machine learning setting, and theoretical and empirical support is largely lacking. Our study aims at formally introducing spatial cross-validation to the machine learning community. We present experiments on data sets from two different domains (mammalian ecology and agriculture), which include globally distributed multi-target data, and show how standard cross-validation may lead to over-optimistic evaluation. We propose how to use tailored spatial cross-validation in this context to achieve more realistic assessment of performance and prudent model selection.

查看译文

关键词

data,cross-validation

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要