Spatial Cross-Validation for Globally Distributed Data.

International Conference on Discovery Science (DS)(2022)

引用 0|浏览3
暂无评分
摘要
Increasing amounts of large scale georeferenced data produced by Earth observation missions present new challenges for training and testing machine-learned predictive models. Most of this data is spatially auto-correlated, which violates the classical i.i.d. assumption (identically and independently distributed data) commonly used in machine learning. One of the largest challenges in relation to spatial auto-correlation is how to generate testing sets that are sufficiently independent of the training data. In the geoscience and ecological literature, spatially stratified cross-validation is increasingly used as an alternative to standard random cross-validation. Spatial cross-validation, however, is not yet widely studied in the machine learning setting, and theoretical and empirical support is largely lacking. Our study aims at formally introducing spatial cross-validation to the machine learning community. We present experiments on data sets from two different domains (mammalian ecology and agriculture), which include globally distributed multi-target data, and show how standard cross-validation may lead to over-optimistic evaluation. We propose how to use tailored spatial cross-validation in this context to achieve more realistic assessment of performance and prudent model selection.
更多
查看译文
关键词
data,cross-validation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要