Building Trust in Artificial Intelligence Weather Prediction

22 December 2025

Why does verification matter for AI weather models?

Share:
Related Topics

As artificial intelligence (AI) becomes more common in weather prediction, one key question is how to be sure these new AI models can be trusted? To help answer this, more than 200 experts from forecasting centres, research institutions, and universities gathered in October for a joint workshop of the Working Group on Forecast Verification Research (JWGFVR) and the Working Group on Numerical Experimentation (WGNE), hosted by Environment and Climate Change Canada. The meeting focused on improving how AI-based forecasts are evaluated before they enter operational use.

Why does verification matter for AI weather models? A key message from the workshop was that AI is changing the role of verification. Traditionally, verification scores are used after a forecast is produced to assess its quality. But many AI systems use these same scores during training, meaning the metric itself can shape model behaviour. Some metrics can produce forecasts that are too smooth, while others can lead to overprediction of rainfall. This highlighted the need for fair, transparent verification scores and practical tools that detect when AI models generate unrealistic or physically inconsistent results.

Following the workshop, the Weather Prediction Model Intercomparison Project (WP-MIP) community developed a coordinated plan to evaluate both AI-based and traditional numerical weather prediction models in a consistent way. This includes using a shared global dataset and examining a wide range of aspects: basic skill scores, climate versus day-to-day weather differences, physical consistency checks, explainability studies, extreme event behaviour, tropical cyclone performance, and regional assessments. A strong emphasis will be placed on improving verification in regions often underrepresented in global studies, including Africa, South Asia, South America, and the polar regions. The approach also addresses a key concern for both AI and traditional models: avoiding the use of overlapping data for training and testing.

This joint effort aims to build a fair and transparent foundation for comparing AI and physics-based forecasts and to support future verification standards under the WMO Integrated Processing and Prediction System (WIPPS). The group plans to publish results in an American Meteorological Society Special Collection beginning in June 2026.