Project goal

This project is testing and prototyping solutions that combine data engineering with machine-learning and deep-learning tools. These solutions are being run using cloud resources — in particular resources and tools from Oracle Cloud Infrastructure (OCI) — and address a number of use cases of interest to CERN’s community. Notably, this activity will make it possible to compare the performance, maturity, and stability of solutions deployed on CERN’s infrastructure with the deployment on the OCI.

R&D topic
R&D Topic 3: Machine learning and data analytics
Project coordinator(s)
Eric Grancher and Eva Dafonte Perez
Technical team members
Luca Canali, Riccardo Castellotti
Collaborator liaison(s)
Barry Gleeson, Vincent Leocorbo, Don Mowbray, Cristobal Pedregal-Martin

Collaborators

Project background

Big-data tools — particularly related to data engineering and machine learning — are evolving rapidly. As these tools reach maturity and are adopted more broadly, new opportunities are arising for extracting value out of large data sets.

Recent years have seen growing interest from the physics community in machine learning and deep learning. One important activity in this area has been the development of pipelines for real-time classification of particle-collision events recorded by detectors of the LHC detectors. Filtering events using so-called “trigger” systems is set to become increasingly complex as upgrades to the LHC increase the rate of particle collisions.

Recent progress

The project launched at the end of 2018. We began by developing and porting data pipelines — related to data-analysis and machine-learning use cases of interest — to the cloud (Kubernetes). The workloads had originally been developed to run on CERN’s Hadoop and Spark infrastructure.

Next steps

In 2019, the project will investigate two main use cases. Firstly, the replication of data-reduction systems employed at the CMS experiment, with a view to exploiting the scalability of OCI to improve upon current performance. Secondly, the deployment of specific training of the models on OCI using GPUs to test the performance limits of such models on cloud-native solutions. The training models in question are those detailed in the paper listed under publications.

Publications

    T. Nguyen et al., Topology classification with deep learning to improve real-time event selection at the LHC, 2018. http://cern.ch/go/8trZ