Data analytics in the cloud
This project is testing and prototyping solutions that combine data engineering with machine-learning and deep-learning tools. These solutions are being run using cloud resources — in particular resources and tools from Oracle Cloud Infrastructure (OCI) — and address a number of use cases of interest to CERN’s community. Notably, this activity will make it possible to compare the performance, maturity, and stability of solutions deployed on CERN’s infrastructure with the deployment on the OCI.
Big-data tools — particularly related to data engineering and machine learning — are evolving rapidly. As these tools reach maturity and are adopted more broadly, new opportunities are arising for extracting value out of large data sets.
Recent years have seen growing interest from the physics community in machine learning and deep learning. One important activity in this area has been the development of pipelines for real-time classification of particle-collision events recorded by detectors of the LHC detectors. Filtering events using so-called “trigger” systems is set to become increasingly complex as upgrades to the LHC increase the rate of particle collisions.
The project launched at the end of 2018. We began by developing and porting data pipelines — related to data-analysis and machine-learning use cases of interest — to the cloud (Kubernetes). The workloads had originally been developed to run on CERN’s Hadoop and Spark infrastructure.
In 2019, the project will investigate two main use cases. Firstly, the replication of data-reduction systems employed at the CMS experiment, with a view to exploiting the scalability of OCI to improve upon current performance. Secondly, the deployment of specific training of the models on OCI using GPUs to test the performance limits of such models on cloud-native solutions. The training models in question are those detailed in the paper listed under publications.
- T. Nguyen et al., Topology classification with deep learning to improve real-time event selection at the LHC, 2018. http://cern.ch/go/8trZ