A Practical Probabilistic Benchmark for AI Weather Models
CoRR(2024)
摘要
Since the weather is chaotic, forecasts aim to predict the distribution of
future states rather than make a single prediction. Recently, multiple data
driven weather models have emerged claiming breakthroughs in skill. However,
these have mostly been benchmarked using deterministic skill scores, and little
is known about their probabilistic skill. Unfortunately, it is hard to fairly
compare AI weather models in a probabilistic sense, since variations in choice
of ensemble initialization, definition of state, and noise injection
methodology become confounding. Moreover, even obtaining ensemble forecast
baselines is a substantial engineering challenge given the data volumes
involved. We sidestep both problems by applying a decades-old idea – lagged
ensembles – whereby an ensemble can be constructed from a moderately-sized
library of deterministic forecasts. This allows the first parameter-free
intercomparison of leading AI weather models' probabilistic skill against an
operational baseline. The results reveal that two leading AI weather models,
i.e. GraphCast and Pangu, are tied on the probabilistic CRPS metric even though
the former outperforms the latter in deterministic scoring. We also reveal how
multiple time-step loss functions, which many data-driven weather models have
employed, are counter-productive: they improve deterministic metrics at the
cost of increased dissipation, deteriorating probabilistic skill. This is
confirmed through ablations applied to a spherical Fourier Neural Operator
(SFNO) approach to AI weather forecasting. Separate SFNO ablations modulating
effective resolution reveal it has a useful effect on ensemble dispersion
relevant to achieving good ensemble calibration. We hope these and forthcoming
insights from lagged ensembles can help guide the development of AI weather
forecasts and have thus shared the diagnostic code.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要