interTwin: An Interdisciplinary Digital Twin Engine for Science

Project Goal

The Interdisciplinary Digital Twin (interTwin) project is an ambitious initiative aimed at revolutionizing digital twin technology. At its core, interTwin seeks to co-design and implement a prototype of an open-source Digital Twin Engine (DTE), built upon open standards, facilitating seamless integration with application-specific Digital Twins (DTs). This innovative platform, rooted in a co-designed interoperability framework and the conceptual model of a DT for research, known as the DTE blueprint architecture, aims to simplify and accelerate the development of complex applicationspecific DTs. By extending the technical capabilities of the European Open Science Cloud with integrated modelling and simulation tools, interTwin not only fosters trust and reproducibility in science but also showcases the potential of data fusion with advanced modelling and prediction technologies. With a focus on ensuring quality, reliability, and verifiability of DT outputs, while simultaneously simplifying application development through AI workflow management and reinforcement of open science practices, interTwin stands as a pioneering endeavor at the forefront of interdisciplinary innovation.

Background

InterTwin develops and implements an open-source DTE that offers generic and customized software components for modeling and simulation, promoting interdisciplinary collaboration. The DTE blueprint architecture, guided by open standards, aims to create a common approach applicable across scientific disciplines. Use cases span high-energy physics, radio astronomy, climate research, and environmental monitoring. The project leverages expertise from European research infrastructures, fostering the validation of technology across facilities and enhancing accessibility. InterTwin aligns with initiatives like Destination Earth, EOSC, EuroGEO, and EU data spaces for continuous development and collaboration.

Progress

We’ve pinpointed the components of the CERN digital twin (DT) application. The first one uses the Monte Carlo (MC) based simulation framework called GEANT4. The second component is the deep learning component, known as the 3D Generative Adversarial Network (3DGAN). This component is designed to simulate particle interactions tailored to specific particle detector setups. We’re initially concentrating on calorimeters use case, which are types of detectors that require the most computing power for simulations.

We’ve established the requirements for this use case, including thematic modules, and have explored more sophisticated generative models. We’ve also incorporated the latest 3DGAN component into our AI workflow tool. The capabilities of the DT workflow include generating training data, preprocessing this data before it’s input into the machine learning model, storing both input and output data, enabling distributed training across multiple GPUs, conducting model inference, performing validation and quality checks, and facilitating continuous re-training to fine-tune the simulations.

Requirements of all use-cases about their specific AI/ML setup (model, data, infrastructure) have been collected. An analysis of those requirements led to the DTE blueprint architecture. A first version of a prototype was developed that includes basic machine learning functionalities like training, saving in a model registry using MLFlow, and inference. A toy preprocessing module has been created to achieve this. The workflow execution was tested in two ways: Using a python environment and using the Common Workflow Language. The prototype has been successfully tested locally, on CERN compute resources, and on the Julich HDF-ML cluster.

Finally, several use cases both from the Earth Observation and Physics domain have been successfully integrated: MNIST toy use case, which serves as an example other use cases can follow, CERN use case, and CMCC use case. The VIRGO use case’s integration is currently on hold due to access policies. An MoU is currently being set up to solve this issue.

Next Steps

The upcoming steps involve exploring integration of our work with the MC based framework, and refining the data transformation processes. This includes integrating 3DGAN model into the MC framework, in case that this activity can be supported by the DTE, developing or incorporating tools for simultaneous training and optimizing hyperparameters (adjusting them as necessary for adversarial training), and selecting solutions that are best suited for the GAN use case, keeping in mind the specific characteristics of the computing hardware, such as accelerators and how they communicate between nodes. We plan to adopt a continuous training method that allows for the model to be updated as soon as new data become available.

Additionally, we’re working on creating a customizable validation framework in partnership with experts from the High Energy Physics (HEP) community. This entails developing complex multivariate distributions that consider a wide variety of input conditions, and establishing validation techniques that can evaluate different performance metrics, such as accuracy and comparison to classical simulation methods (i.e., uncertainty estimation, coverage of the support space). We will be advancing the development of our DT’s thematic modules, ensuring our software components work well with and meet the standards of DTE solutions created in other work packages. This involves integrating and rigorously testing our components for compatibility and compliance.

Our prototype was packaged in a Docker container, and an integration test with WP5 using their interLink prototype is currently underway. Furthermore, the prototype is getting extended to provide more advanced machine learning capabilities, such as distributed training. After first experiments with hyperparameter optimization frameworks have been conducted, a production-ready implementation is planned within the next period. Moreover, additional use cases will be integrated in the next period.

Lastly, an easy-access user interface based on Jupyter notebooks using frameworks like, e.g., KubeFlow is planned.

 

Project Coordinator: EGI Foundation

Technical Team: Matteo Bunino, Xavier Espinal, Enrique Garcia, Maria Girone, Kalliopi Tsolaki, Sofia Vallecorsa, Alexander Zoechbauer

Collaboration Liaisons: Isabel Campos (CSIC), Charis Chatzikyriakou (EODC), Levente Farkas (EGI), Diana Gudu (KIT), Andreas Lintermann (FZJ), Paul Millar (DESY), David Rousseau (IJCLab, CNRS/IN2P3), Mario Rüttgers (FZJ), Rakesh Sarma (FZJ), Daniele Spiga (INFN)

In partnership with: EGI Foundation (EC Funded Project), discover more on intertwin.eu