The proposed project aims to develop a proof-of-concept for a machine learning based digital twin of the atmosphere for environmental applications. To accomplish this, the project is subdivided into two main parts. The first segment will focus on the development of a machine learning based modelling core prototype, called AtmoRep, built on the concept of large scale representation learning applied to Earth System Science. In the second phase, the modelling core will be integrated into the digital twin architecture currently under development within the CERN IT department by the InterTwin project.
Background
The atmosphere and its dynamics have a significant impact on human well-being, from agricultural decision making, to policy making and the renewable energy sector. An accurate and equitable modeling of atmospheric dynamics is consequently of critical importance to allow for evidencebased decision making that improves human well being and minimizes adverse impacts for current and future generations. Very recently, AI-based models have shown tremendous potential in reducing the computational costs for numerical weather prediction. However, they lack the versatility of conventional models. The EMP2/AtmoRep project aims at developing an AI-based model of atmospheric dynamics for multi-purpose applications. The model will be implemented leveraging the concept of large-scale representation learning, so to encapsulate the information from the large amounts of available data. The implementation on the digital twin platform will make such information more accessible to the general public, allowing the users to easily develop their own applications in weather and climate.
Progress
In September 2023, the team publicly released a first prototype of the core model, which has been tested on multiple tasks in Earth System science, like weather forecasting, downscaling, spatio-temporal interpolation and precipitation rate corrections. The model, also referred to as AtmoRep, consists of a 3.5 billion parameter network, trained for several weeks in summer 2023 at the Juelich Supercomputing Center using 4M core hours in total. The model shows competitive skill in weather forecasting when compared to the newly released AI-based forecasting models and it out-performs the competitors for tasks such as downscaling or precipitation rate forecasting. One of the main innovations consists in a novel probabilistic loss, so the model outputs probabilistic ensembles for each downstream task.
The CERN team was mainly responsible of implementing the analysis workflow of the downstream applications. This converged in the development of a dedicated analysis package together with collaborators at the Juelich Supercomputing Center. All the training code has been recently released on GitHub and made available to the public.
Next Steps
The next steps related to the modelling core development include the implementation of an autoregressive roll out mechanism to reach medium range weather forecasts (10-15 days) and the extension to multiresolution dataset handling to go beyond the quarter degree spatial resolution forecasts available with the current setup.
The second part of the project will involve the implementation of the core model on the InterTwin digital twin architecture, in collaboration with InterTwin and the other members of the Digital Twin initiative at CERN.
Project Coordinator: Alberto Di Meglio
Technical Team: Alberto Di Meglio, Ilaria Luise
Collaboration Liaisons: Christian Lessig (ECMWF, Magdeburg University), Martin Schultz (Juelich Supercomputing Center)
In partnership with: Juelich Supercomputing Center, ECMWF, Magdeburg University