AI Models Registry in the Cloud

Project Goal

CERN is running a pilot project to demonstrate the possibility to optimize the ML training task for benchmark ML tasks of different sizes, complexity, and data. In this context, this project will focus on demonstrating the use of a ML model catalogue, evaluate the ML energy footprint of ML training and test the possibility of using large models (foundation models). 

Background

With the increase of the utilization and complexity of ML algorithms at CERN it is necessary to investigate performance tracking and cost optimization, furthermore we need to ensure the models are generalizable and reusable. A ML catalogue aims to be a centralized place to store models, and able to be used for performance tracking, model sharing and reuse. On the other hand, from a data science perspective, the AI state-of-the-art trend is to use foundational model as a way for AI model generalization. The aim of these approaches will lead to a future ‘sustainable’ AI by improving the ML training and deployment efficiency. 

Progress

For a thorough evaluation of the ML catalog provided by Oracle and its attached Accelerated Data Science (ADS) SDK, 2 use cases previously developed at openlab were used. Both consist of deep generative models, one for HEP and another for Earth Observation, with different levels of scale (number of network parameters and time to train). Apart from testing the platform and the usage of the model catalog, a study of energy consumption was conducted alongside to understand the environmental impact of training a machine learning model. More specifically, we tested the 2 generative models and logged the energy consumption during training, for multiple hardware options and multiple training and hardware optimizations. The second half of the year was dedicated to the exploration of a foundation model, using a diffusion model method for generation of calorimeter showers, alongside, and following state-of-theart trends, we use transformers for better generalization. Currently we can generate, with high fidelity, showers like the ones produced by Geant4.

Next Steps

Our work in 2024 will be devoted to extending the generation approach, towards a foundation model, by being able to do more tasks than what it was trained to do. 

 

Project Coordinator: Sofia Vallecorsa

Technical Team: Renato Cardoso, Sofia Vallecorsa

Collaboration Liaisons from Oracle: Şengül Chardonnereau, Jérôme Designe, Allen Hosler, Sébastien Hurel, Cristobal Pedregal-Martin, Lyudmil Pelov, Bob Peulen, Garret Swart 

In partnership with: Oracle