Members of CERN’s research community expend significant efforts to understand how they can get the most value out of the data produced by the LHC experiments. They seek to maximise the potential for discovery and employ new techniques to help ensure that nothing is missed. At the same time, it is important to optimise resource usage (tape, disk, and CPU), both in the online and offline environments. Modern machine-learning technologies — in particular, deep-learning solutions — offer a promising research path to achieving these goals. Deep-learning techniques offer the LHC experiments the potential to improve performance in each of the following areas: particle detection, identification of interesting events, modelling detector response in simulations, monitoring experimental apparatus during data taking, and managing computing resources.

 

Fast detector simulation

Project goal

We are using artificial intelligence (AI) techniques to simulate the response of the particle detectors to collision events. Specifically, we are developing deep neural networks — in particular, generative adversarial networks (GANs) — to do this. Such tools will play a significant role in helping the research community cope with the vastly increased computing demands of the High Luminosity LHC (HL-LHC).

Once properly trained and optimised, generative models can simulate a variety of particles, energies, and detectors in just a fraction of the time required by classical simulation, which is based on detailed Monte Carlo methods. Our objective is to tune and integrate these new tools in the experiments’ existing simulation frameworks.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Sofia Vallecorsa
Team members
Florian Rhem, Gurlukh Khattak, Krisitna Jaruskova
Collaborator liaison(s)
Intel: Claudio Bellini, Andrea Luiselli, Saletore Vikram, Hans Pabst, Adel Chaibi, Eric Petit. | SURFsara BV: Valeriu Codreanu, Maxwell Cai, Damian Podareanu. Barcelona Suepercomputing Center: John Osorio Rios, Adrià Armejach Marc Casas

Collaborators

Project background

Simulating the response of detectors to particle collisions — under a variety of conditions — is an important step on the path to new physics discoveries. However, this work is very computationally expensive. Over half of the computing workload of the Worldwide LHC Computing Grid (WLCG) is the result of this single activity.  

We are exploring an alternative approach, referred to as ‘fast simulation’, which trades some level of accuracy for speed. Fast-simulation strategies have been developed in the past, using different techniques (e.g. look-up tables or parametrised approaches). However, the latest developments in machine learning (particularly in relation to deep neural networks) make it possible to develop fast-simulation tools that are both more flexible and more accurate than those developed in the past.

Recent progress

Most of the work in 2019 focused on the acceleration of the training process using a data-parallel approach. In 2020 we turned our attention to the optimisation and acceleration of the inference process. Industry is developing new hardware platforms that promise large acceleration factors for the training and inference processes related to deep neural networks (e.g. Intel XE). In most cases, low-precision data representation (e.g. half-precision floating points or half-precision integers) is one of the key strategies for achieving significant acceleration. Given this, we have carefully studied the effect of low-precision data representation on the 3DGAN model. We obtained a 1.8x speedup by running inference using a half-precision integer representation, compared to using single-precision float points. We verified that the precision of physics results is conserved with this approach. We also verified that using a mixed-precision approach for training (dynamically switching between single-precision and half-precision floating points) converges to stable results.

Next steps

The work done so far on 3DGAN can be considered as an initial R&D phase. Our focus now is on moving from the prototyping stage to production, deployment and integration within the simulation software. To achieve this goal, it is essential to optimise resources, stabilising the training process and improving model convergence. At the same time, it is also important to perform systematic studies on model generalisation and robustness, as well as on results interpretability and reproducibility.

Validating the performance of a generative model is not an easy task. In particular, evaluating the number of missing modes (as well as their properties) is critical for ensuring that the simulated data are a good representation of the underlying theoretical models, thus meaning that they can be safely used to evaluate detector performance and model their response.

Building on the work done to optimise the 3DGAN discriminator network, the plan is to design a convolutional neural network (CNN) able to analyse the GAN-generated images and to act as a feature extractor. The CNN output can then be analysed by an XGBoost-based analyser, solving the final classification or regression problem.

Motivated by the issue of missing modes, we also intend to develop and optimise a boosting approach to improve the convergence of the 3DGAN model.

 

Publications

    F. Carminati et al., A Deep Learning tool for fast detector simulation. Poster presented at the 18th International Supercomputing Conference 2018, Frankfurt, 2018. First prize awarded for best research poster. cern.ch/go/D9sn
    G. Khattak, Training Generative Adversarial Models over Distributed Computing System (2018), revised selected papers. cern.ch/go/8Ssz
    D. Anderson, F. Carminati, G. Khattak, V. Loncar, T. Nguyen, F. Pantaleo, M. Pierini, S. Vallecorsa, J-R. Vlimant, A. Zlokapa, Large scale distributed training applied to Generative Adversarial Networks for calorimeter Simulation. Presented at the 23rd international Conference on Computing in High Energy and Nuclear Physics (CHEP 2018). Proceedings in publication.
    F. Carminati, G. Khattak, S. Vallecorsa, 3D convolutional GAN for fast simulation. Presented at the 23rd international Conference on Computing in High Energy and Nuclear Physics (CHEP 2018). Proceedings in publication.
    F. Carminati, S. Vallecorsa, G. Khattak, V. Codreanu, D. Podareanu, H. Pabst , V. Saletore, Distributed Training of Generative Adversarial Networks for Fast Detector Simulation. ISC 2018 Workshops, LNCS 11203, pp. 487–503, 2018. cern.ch/go/wLP6
    G. Khattak, S. Vallecorsa, F. Carminati, Three Dimensional Energy Parametrized Generative Adversarial Networks for Electromagnetic Shower Simulation. 2018 25th IEEE International Conference on Image Processing (ICIP), Geneva, Pages 3913-3917, 2018. cern.ch/go/7PHp
    G. Khattak, S. Vallecorsa, F. Carminati, D. Moise, Data-Parallel Training of Generative Adversarial Networks on HPC Systems for HEP Simulations. 2018 IEEE 25th International Conference on High Performance Computing (HiPC), Geneva, Pages 162-171, 2018. cern.ch/go/kTX9
    F. Carminati et al., Calorimetry with Deep Learning: Particle Classification, Energy Regression, and Simulation for High-Energy Physics, NIPS 2017. cern.ch/go/7vc8
    F. Carminati et al., Three dimensional Generative Adversarial Networks for fast simulation, ACAT 2017. cern.ch/go/BN6r
    F. Rehm, Reduced Precision Strategies for Deep Learning: A High Energy Physics Generative Adversarial Network Use Case, 10th international Conference on Pattern Recognition Applications and Methods 2021, Vienna, Pages 251 - 258, 2021. cern.ch/go/v7wF
    F. Rehm, Validation of Deep Convolutional Generative Adversarial Networks for High Energy Physics Calorimeter Simulations, Combining Artificial Intelligence and Machine Learning with Physical Sciences, California, 2021. cern.ch/go/zFp7
    J. O. Rios, A. Armejach, G. Khattak, E. Petit, S. Vallecorsa, M. Casas, Mixed-Precision Arithmetic for 3DGAN to Simulate High Energy Physics Detectors. Published at the ICMLA, 2020.

Presentations

    D. Brayford, S. Vallecorsa, A. Atanasov, F. Baruffa, W. Riviera, Deploying AI Frameworks on Secure HPC Systems with Containers. Presented at 2019 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, 2019, pp. 1-6.
    G. R. Khattak, S. Vallecorsa, F. Carminati, G. M. Khan, Particle Detector Simulation using Generative Adversarial Networks with Domain Related Constraints. Presented at 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, 2019, pp. 28-33.
    F. Carminati, S. Vallecorsa, G. Khattak, 3D convolutional GAN for fast simulation (5 March). Presented at IXPUG Spring Conference, Bologna, 2018. cern.ch/go/9TqS
    F. Carminati, G. Khattak, S. Vallecorsa, Three-dimensional energy parametrized adversarial networks for electromagnetic shower simulation (7 October). Presented at 2018 IEEE International Conference on Image Processing, Athens, 2018. cern.ch/go/lVr8
    F. Carminati, V. Codreanu, G. Khattak, H. Pabst, D. Podareanu, V. Saletore, S. Vallecorsa, Fast Simulation with Generative Adversarial Networks (12 November). Presented at The International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, 2018. cern.ch/go/Z6Wg
    F. Carminati, V. Codreanu, G. Khattak, H. Pabst, D. Podareanu, V. Saletore, S. Vallecorsa, Fast Simulation with Generative Adversarial Networks (12 November). Presented at The International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, 2018. cern.ch/go/Z6Wg
    F. Carminati, S. Vallecorsa, G. Khattak, 3D convolutional GAN for fast simulation, IXPUG Spring Conference 2018. cern.ch/go/9TqS
    F. Carminati, G. Khattak, S. Vallecorsa, Three-dimensional energy parametrized adversarial networks for electromagnetic shower simulation (7 October). Presented at 2018 IEEE International Conference on Image Processing, Athens, 2018. cern.ch/go/lVr8
    F. Carminati, G. Khattak, D. Moise, S. Vallecorsa, Data-parallel Training of Generative Adversarial Networks on HPC Systems for HEP Simulations (18 December). Presented at 25th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC, Bengaluru, 2018.
    F. Carminati, S. Vallecorsa, G. Khattak, 3D convolutional GAN for fast simulation (5 March). Presented at IXPUG Spring Conference, Bologna, 2018. cern.ch/go/9TqS
    F. Carminati, G. Khattak, D. Moise, S. Vallecorsa, Data-parallel Training of Generative Adversarial Networks on HPC Systems for HEP Simulations (18 December). Presented at 25th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC, Bengaluru, 2018.
    S. Vallecorsa, Machine Learning for Fast Simulation 2017 (June 24), Presented at ISC High Performance, Frankfurt, 2017. cern.ch/go/k6sV
    E. Orlova, Deep learning for fast simulation: development for distributed computing systems (15 August), Presented at CERN openlab summer students’ lightning talks, Geneva, 2017. cern.ch/go/NW9k
    A. Gheata, GeantV (Intel Code Modernisation) (21 September), Presented at CERN openlab Open Day, Geneva, 2017. cern.ch/go/gBS6
    S. Vallecorsa, GANs for simulation (May 2017), Fermilab, Talk at DS@HEP workshop, 2017. cern.ch/go/m9Bl
    S. Vallecorsa, GeantV – Adapting simulation to modern hardware (June 2017), Talk at PASC 2017 conference, Lugano, 2017. cern.ch/go/cPF8
    S. Vallecorsa, Machine Learning-based fast simulation for GeantV (June2017), Talk at LPCC workshop, CERN, 2017.cern.ch/go/QqD7
    S. Vallecorsa, Generative models for fast simulation (August 2017), Plenary talk at ACAT conference, Seattle, 2017.cern.ch/go/gl7l
    S. Vallecorsa, Three dimensional Generative Adversarial Networks for fast simulation, ACAT 2017. cern.ch/go/jz6C
    S. Vallecorsa et al., Tutorial on "3D convolutional GAN implementation in Neon'', Intel HPC Developers Conference 2017. cern.ch/go/ZtZ7
    F. Rehm, Reduced Precision Strategies for Deep Learning: 3DGAN Use Case (March 2021), 10th international Conference on Pattern Recognition Applications and Methods, Vienna, 2021. cern.ch/go/9ZRr
    F. Rehm, Validation of GANs for High Energy Physics Simulations (February 2021), Combining Artificial Intelligence and Machine Learning with Physical Sciences, California, 2021. cern.ch/go/8Pfc
    F. Rehm, Reduced Precision Strategies in Deep Learning: A 3D GAN use case (October 2020), Accelerated Artificial Intelligence for Big-Data Experiments Conference, Urbana-Champaign, 2020. cern.ch/go/T6wj
    F. Rehm, Reduced Precision Strategies for Deep Learning: 3DGAN Use Case (October 2020), 4th Inter-experiment Machine Learning Workshop CERN, Geneva, 2021. cern.ch/go/Hq6j
    J. O. Rios, A. Armejach, G. Khattak, E. Petit, S. Vallecorsa, M. Casas, Mixed-Precision Arithmetic for 3DGAN to Simulate High Energy Physics Detectors (13 October). Presented at the 2020 IXPUG Annual Meeting, US, 2020. cern.ch/go/7PXj
    F. Rhem, Reduced Precision Strategies for Deep Learning: A GAN use case from High Energy Physics (19 October). Presented at the FS 2020 Accelerated Artificial Intelligence for Big-Data Experiments Conference, Urbana-Champaign, 2020. cern.ch/go/T6wj
    F. Rhem, S. Vallecorsa, Reduced Precision Strategies for Deep Learning: 3DGAN Use Case (21 October ). Presented at the 4th IML Machine Learning Workshop CERN, Geneva, 2020. cern.ch/go/Tnf8

Exploring accelerated machine learning for experiment data analytics

Project goal

The project has two threads, each investigating a unique use case for the Micron Deep Learning Accelerator (a modular FPGA-based architecture). The first thread relates to the development of a real-time streaming machine inference engine prototype for the level-1 trigger of the CMS experiment.

The second thread focuses on prototyping a particle-identification system based on deep learning for the DUNE experiment. DUNE is a leading-edge, international experiment for neutrino science and proton-decay studies. It will be built in the US and is scheduled to begin operation towards the end of this decade.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Emilio Meschi, Paola Sala, Maria Girone
Team members
Thomas Owen James, Dejan Golubovic, Maurizio Pierini, Manuel Jesus Rodriguez, Saul Alonso-Monsalve, Ema Puljak, Lorenzo Uboldi
Collaborator liaison(s)
Mark Hur, Stuart Grime, Michael Glapa, Eugenio Culurciello, Andre Chang, Marko Vitez, Dustin Werran, Aliasger Zaidy, Abhishek Chaurasia, Patrick Estep, Jason Adlard, Steve Pawlowski

Collaborators

Project background

The level-1 trigger of the CMS experiment selects relevant particle-collision events for further study, while rejecting 99.75% of collisions. This decision must be made with a fixed latency of a few microseconds. Machine-learning inference in FPGAs may be used to improve the capabilities of this system.

The DUNE experiment will consist of large arrays of sensors exposed to high-intensity neutrino beams. The use of convolutional neural networks has been shown to substantially boost particle-identification performance for such detectors. For DUNE, an FPGA solution is advantageous for processing ~ 5 TB/s of data.

Recent progress

The CMS team primarily focussed on preparing a level-1 scouting system using the Micron SB-852 FPGA processing boards to capture and process trigger-level data at 40 MHz. We developed and optimised neural networks to improve analysis performance using level-1 scouting objects. In addition, we developed a system for level-1 anomaly detection using a variational auto-encoder approach. We implemented this using the Micron deep-learning framework.

The DUNE team organised a test on the protoDUNE Single Phase detector to analyse data from cosmic rays. This was the last chance to test it before protoDUNE’s second run in 2022. The test aimed to examine the incoming triggered data using a triple AC-511 Micron FPGA as a hardware accelerator. The hardware ran an image-segmentation neural network designed to identify regions of interest. This setup was able to analyse data at a rate of ~1.42 Gb/s. The capability of the network to identify low-energy events was tested in data taken by protoDUNE with a neutron generator.

Next steps

The CMS team will focus on completing the installation and validation of the 40 MHz scouting system for LHC Run 3. This will require the integration and debugging of several software, FPGA-firmware, and hardware layers. The performance of a deep-learning-driven anomaly-detection algorithm will also be evaluated for use in LHC Run 3.

The DUNE team plans to continue analysing the neutron generator's data, as it demonstrates huge potential for the reconstruction techniques at DUNE. The goal is to perform studies in real time to understand the quality of the data taken using Micron's state-of-the-art hardware accelerators for the data-acquisition system of protoDUNE.

Publications

    D. Golubovic, 40 MHz scouting with deep learning in CMS. Published on Zenodo, 2020. cern.ch/go/vJD9
    M. Popa, Deep learning for 40 MHz scouting with level-1 trigger muons for CMS at LHC run-3. Published on Zenodo, 2020. cern.ch/go/99St

Presentations

    M. J. R. Alonso, Fast inference using FPGAs food DUNE data reconstruction (7 November). Presented at 24th International Conference on Computing in High Energy and Nuclear Physics, Adelaide, 2019. cern.ch/go/bl7n
    M. J. R. Alonso, Prototyping of a DL-based Particle Identification System for the Dune Neutrino Detector (22 January). Presented at CERN openlab Technical Workshop, Geneva, 2020. cern.ch/go/zH8W
    T. O. James, FPGA-based Machine Learning Inference for CMS with the Micron Deep Learning Accelerator (22 January). Presented at CERN openlab Technical Workshop, Geneva, 2020. cern.ch/go/pM7P
    M. Rodríguez, S. A. Monsalve, P. Sala, Prototyping of a DL-based Particle Identification System for the Dune Neutrino Detector (22 January). Presented at CERN openlab Technical Workshop, Geneva, 2020. cern.ch/go/zH8W
    D. Golubovic, T. James, E. Meschi, 40 MHz Scouting with Deep Learning in CMS (22-24 April). Presented at Connecting the Dots Workshop, New Jersey, 2020. cern.ch/go/C96N

TPUs for deep learning

Project goal

CERN is running pilot projects to investigate the potential of hardware accelerators, using a set of LHC workloads. In the context of these investigations, this project focuses on the testing and optimisation of machine-learning and deep-learning algorithms on Google TPUs. In particular, we are focusing on generative adversarial networks (GANs).

R&D topic
Machine learning and data analytics
Project coordinator(s)
Sofia Vallecorsa
Team members
Renato Cardoso
Collaborator liaison(s)
Renato Cardoso

Collaborators

Project background

The high-energy physics (HEP) community has a long tradition of using neural networks and machine-learning methods (random forests, boosted decision trees, multi-layer perceptrons) to solve specific tasks. In particular, they are used to improve the efficiency with which interesting particle-collision events can be selected from the background. In the recent years, several studies have demonstrated the benefits of using deep learning (DL) to solve typical tasks related to data taking and analysis. Building on these examples, many HEP experiments are now working to integrate deep learning into their workflows for many different applications (examples include data-quality assurance, simulation, data analysis, and real-time selection of interesting collision events). For example, generative models, from GANs to variational auto-encoders, are being tested as fast alternatives to simulation based on Monte Carlo methods. Anomaly-detection algorithms are being explored to improve data-quality monitoring, to design searches for rare new-physics processes, and to analyse and prevent faults in complex systems (such as those use for controlling accelerators and detectors).

 The training of models such as these has been made tractable with the improvement of optimisation methods and the availability of dedicated hardware that is well adapted to tackling the highly parallelisable task of training neural networks.  Storage and HPC technologies are often required by these kinds of projects, together with the availability of HPC multi-architecture frameworks (from large multi-core systems to hardware accelerators like Google TPUs).

Recent progress

Machine learning has been used in a wide range of areas. Nevertheless, the need to make it faster while still maintaining accuracy (and thus the validity of results) is a growing problem for data scientists. Our work is exploring the Tensorflow distributed parallel strategy approach, in order to efficiently run a GAN model in a parallel environment. This includes benchmarking different types of hardware for such an approach.

Specifically, we parallelised a 3D convolutional GAN training process on multiple GPUs and multiple Google TPU cores. This involved two main approaches to the Tensorflow mirrored strategy:

The first approach uses the default implementation and the built-in logic from the Tensorflow strategy deployment model, with training on several GPUs.The second approach uses a custom training loop that we optimised in order to increase control over the training process, as well as adding further elements to each GPU’s work, increasing the overall speedup.

For the TPUs, we used the TPU distributed strategy present in Tensorflow, applying the same approaches as described for the mirrored strategy. This was validated by comparing with the results obtained with the original 3DGAN model, as well as the Monte Carlo simulated data. Additionally, we tested scalability over multiple GPU nodes by deploying the training process on different public cloud providers using Kubernetes.

Next steps

Our work in 2021 will be devoted to the extension of this approach to more efficient parallelisation strategies. We will also work to optimise deep-learning architectures; these are more demanding in terms of computing resources, meaning that they stand to benefit greatly from TPU architectures.


Presentations

    R. Cardoso, Accelerating GAN training using distributed tensorflow and highly parallel hardware (22 October). Presented at the 4th IML workshop, Geneva, 2020. cern.ch/go/H6xt

Next Generation Archiver for WinCC OA

Project goal

Our aim is to make control systems used for the LHC more efficient and smarter. We are working to enhance the functionality of WinCC OA (a SCADA tool used widely at CERN) and to apply data-analytics techniques to the recorded monitoring data, in order to detect anomalies and systematic issues that may impact upon system operation and maintenance.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Rafal Kulaga
Team members
Anthony Hennessey, Mariusz Suder, Piotr Golonka, Peter Sollander, Fernando Varela, Marc Bengulescu, Filip Siroky
Collaborator liaison(s)
Thomas Hahn, Juergen Kazmeier, Alexey Fishkin, Tatiana Mangels, Mikhail Kalinkin, Elisabeth Bakany, Ewald Sperrer

Collaborators

Project background

The HL-LHC programme aims to increase the integrated luminosity — and hence the rate of particle collisions — by a factor of ten beyond the LHC’s design value. Monitoring and control systems will therefore become increasingly complex, with unprecedented data throughputs. Consequently, it is vital to further improve the performance of these systems, and to make use of data-analytics algorithms to detect anomalies and anticipate future behaviour. Achieving this involves a number of related lines of work. This project focuses on the development of a modular and future-proof archiving system (NextGen Archiver) that supports different SQL and NOSQL technologies to enable data analytics. It is important that this can be scaled up to meet our requirements beyond 2020.

Recent progress

The most important milestone for the NextGen Archiver (NGA) project in 2020 was the preparation of the first stable version with Oracle and InfluxDB backends for all users at CERN. Despite challenges, it was successfully released in July, receiving positive feedback.

The release is currently being deployed in the ALICE systems, where the NGA will be used with a custom backend to stream data from the control systems to the new physics data readout and analysis architecture after Long Shutdown 2. This represents a crucial validation step for the massive deployment of the new archiver in all CERN systems, planned for Long Shutdown 3.

After the release, the team focused on developing several features and improving reliability. These upgrades will be included in the subsequent versions. One notable upgrade is the ability to send queries to selected backends, with the option to specify different time ranges for each of them. This functionality will be indispensable in systems where parallel archiving into multiple databases is used to improve performance; it will enable new use cases.

Next steps

The work on the project will continue on several fronts. The NGA will be deployed in all systems in the ALICE experiment. A further increase of test coverage is also one of the priorities, with particular attention to performance, handling of failure scenarios, and redundancy. The work on multiple features of the archiver will continue, including extensions to the query mechanisms and improvements in all the backends.


Presentations

    F. M. Tilaro, R. Kulaga, Siemens Data Analytics and SCADA evolution status report (23 January). Presented at CERN openlab Technical Workshop, Geneva, 2019. cern.ch/go/kt7K
    A. Hennessey, P. Golonka, R. Kulaga, F. V.arela, WinCC Open Architecture – Next Generation Archiver (23 January). cern.ch/go/8Kq7

Data analytics for industrial controls and monitoring

Project goal

The main goal of the project is to render the industrial control systems used for the LHC more efficient and more intelligent. The focus is to develop a data-analytics platform that capitalises on the latest advances in artificial intelligence (AI), cloud and edge-computing technologies. The ultimate goal is to make use of analytics solutions provided by Siemens to provide non-expert end users with a turnkey data-analytics service.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Rafal Kulaga
Team members
Filip Široký, Marc Bengulescu, Fernando Varela Rodriguez, Rafal Kulaga, Piotr Golonka, Peter Sollander
Collaborator liaison(s)
Thomas Hahn, Juergen Kazmeier, Alexey Fishkin, Tatiana Mangels, Elisabeth Bakany, Ewald Sperrer

Collaborators

Project background

The HL-LHC project aims to increase the integrated luminosity — and hence the rate of particle collisions — by a factor of ten beyond the LHC’s design value. Monitoring and control systems will therefore become increasingly complex, with unprecedented data throughputs. Consequently, it is vital to further improve the performance of these systems, and to make use of data-analytics algorithms to detect anomalies and to anticipate future behaviour. Achieving this involves a number of related lines of work. This particular project focuses on the development of a data-analytics platform that combines the benefits of cloud and edge computing.

Recent progress

One of the main achievements in 2020 was the experimental deployment of the Siemens Distributed Complex Event Processing (DCEP) technology to enable advanced data analytics and predictive maintenance for the oxygen-deficiency sensors in the LHC tunnel. This was initially done by deploying a suite of microservices on a pool of virtual machines in the cloud. In the latest phase, the cloud-edge gap was bridged by also adding a Siemens IoT 2050 box as a worker to the computing pool. 

Progress was also made on the optimisation of the ion-beam source for the LINAC3 accelerator at CERN. Multiple machine-learning and spectral techniques were employed and extensively tested, under the guidance of LINAC3 experts. We expect to continue this work in 2021.

A prototype platform was developed to provide collection service for generic AI algorithms that could easily be employed by people who are not data scientists, helping them to perform advanced analytics on controls data. This is an ongoing effort together with colleagues from various groups at CERN, including the Cryogenics group in the Technology department and the Cooling and Ventilation group in the Engineering department.

Next steps

Depending on the priorities agreed with the company, the focus of the collaboration can shift to new areas, such as device management. A CERN-SIEMENS joint workshop has been scheduled in order to define the use cases for the following year.


Presentations

    F. Tilaro, F. Varela, Model Learning Algorithms for Anomaly Detection in CERN Control Systems (25 January). Presented at BE-CO Technical Meeting, Geneva, 2018. cern.ch/go/7SGK
    F. Tilaro, F. Varela, Industrial IoT in CERN Control Systems (21 February). Presented at Siemens IoT Conference, Nuremberg, 2018.
    F. Tilaro, F. Varela, Optimising CERN control systems through Anomaly Detection & Machine Learning (29 August). Presented at AI workshop for Future Production Systems, Lund, 2018.
    F. Široký, Data Analytics and IoT for Industrial Control Systems (22 January). Presented at CERN openlab technical workshop, Geneva, 2020. cern.ch/go/F6NM
    M. Bengulescu, F. Široký, Distributed Complex Event Processing at the Large Hadron Collider (2 July). Presented at IoT Siemens Conference, 2020.
    F. Široký, M. Bengulescu, Predicting LINAC3 current with LSTM and wavelet spectral analysis (22 September). Presented at Siemens Corporate Technology Seminar, 2020.
    M. Bengulescu, F. Široký, Predicting LINAC3 current using RNNs and spectral decomposition (19 June). Presented at CERN ML Coffee, Geneva, 2020.
    F. Široký, Bayesian ML – a practical approach (6 May). Presented at CERN ML Coffee, Geneva, 2020.
    M. Bengulescu, F. Široký, Data Analytics for Industrial Controls (31 January). Presented at CERN ML Coffee, Geneva, 2020.

Oracle Cloud technologies for data analytics on industrial control systems

Project goal

CERN’s control systems acquire more than 250 TB of data per day from LHC and its experiments. Managing these extremely complex “Industrial Internet of Things” (IIoT) systems raises important challenges in terms of data management, retrieval, and analytics.

The main objective is to explore the capabilities of Oracle Autonomous Data Warehouse and Oracle Analytics Cloud for integrating heterogeneous control IIoT data, while improving performance and efficiency for the most challenging analytics requirements.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Manuel Martin Marquez, Sébastien Masson
Team members
Manuel Martin Marquez, Sébastien Masson, Aimilios Tsouvelekakis
Collaborator liaison(s)
Çetin Özbütün, Reiner Zimmermann, Michael Connaughton, Cristobal Pedregal-Martin, Engin Senel, Cemil Alper, Giuseppe Calabrese, David Ebert, Dmitrij Dolgušin

Collaborators

Project background

Keeping the LHC and the rest of the accelerator complex at CERN running efficiently requires state-of-the-art control systems. A complex IIOT system has been developed to persist this data, making it possible for engineers to gain insights about temperatures, magnetic-field strengths, and more. This plays a vital role in ensuring the highest levels of operational efficiency.

The current system for persisting, accessing and analysing this data is based on Oracle Database. Today, significant effort is dedicated to improving performance and coping with increasing demand — in terms of data volume, analysis and exploration of bigger data sets.

Recent progress

During 2020, the team focused on three main aspects:

(i) Assessing extended capabilities for supporting custom data types when importing and dealing with complex data schemas on parquet files. The results showed that complex parquet schemas can now be handled and automatically ingested by Oracle Data Warehouse technology. This enabled us to work with data coming from control-system devices based on CERN's custom controls middleware, CMW.

(ii) Deploying a hybrid solution based on standard and external table partitions, with the objective of improving performance for data retrieval. This improved the performance compared to the solution based on external partitions and is thus a good fit for the most demanding real-time applications consuming data from the IIoT control system.

(iii) Increasing data retrieval/analytics complexity using real-life scenarios to explore Oracle Analytics technologies as a front-end solution for control engineers and equipment experts.

We presented the outcomes of this work at events across Asia Pacific, Europe, Middle East, Africa and North America. Through our participation in these events, CERN openlab could share insights obtained with representatives of large companies from various industries, ranging from banking and telecommunications to research and educational institutions.

Next steps

The team will increase the data volume and complexity to assess the capabilities of the new Autonomous Database for drag-and-drop data loading and transformation, as well as for automatic insight discovery. This will be done with the goal of going one step further in terms of automating processes to improve operational efficiency.


Presentations

    E. Grancher, M. Martin, S. Masson, Research Analytics at Scale: CERN’s Experience with Oracle Cloud Solutions (16 January). Presented at Oracle OpenWorld 2019, London, 2019. cern.ch/go/S6qf
    A. Mendelsohn (Oracle), E. Grancher, M. Martin, Oracle Autonomous Database Keynote (16 January). Oracle OpenWorld 2019, London, 2019.
    M. Martin, J. Abel (Oracle), Enterprise Challenges and Outcomes (17 January). Presented at Oracle OpenWorld 2019, London, 2019.
    S. Masson, M. Martin, Managing one of the largest IoT systems in the world with Oracle Autonomous Technologies (18 September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/SBc9
    D. Ebert (Oracle), M. Martin, A. Nappi, Advancing research with Oracle Cloud (17 September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/9ZCg
    M. Martin, R. Zimmermann (Oracle), J. Otto (IDS GmbH), Oracle Autonomous Data Warehouse: Customer Panel (17 September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/nm9B
    S. Masson, M. Martin, Oracle Autonomous Data Warehouse and CERN Accelerator Control Systems (25 November). Presented at Modern Cloud Day, Paris, 2019.
    M. Martin, M. Connaughton (Oracle), Big Data Analytics and the Large Hadron Collider (26 November). Presented at Oracle Digital Days 2019, Dublin, 2019.
    M. Martin, Big Data, AI and Machine Learning at CERN (27 November). Presented at Trinity College Dublin and ADAPT Center, Dublin, 2019.
    M. Martin, M. Connaughton (Oracle), Big Data Analytics and the Large Hadron Collider (27 November). Presented at the National Analytics Summit 2019, Dublin, 2019. cern.ch/go/CF9p
    M. Martin Marquez, Boosting Complex IoT Analysis with Oracle Autonomous Data Warehouse Cloud (June). Presented at Oracle Global Leaders Meeting – EMEA, Budapest, 2018.
    E. Grancher, M. Martin Marquez, S. Masson, Boosting Complex IoT Analysis with Oracle Autonomous Data Warehouse Cloud (23 October). Presented at Oracle Openworld 2018, San Francisco, 2018. cern.ch/go/RBZ6
    E. Grancher, M. Martin Marquez, S. Masson, Managing one of the largest IoT Systems in the world (December). Presented at Oracle Global Leaders Meeting – EMEA, Sevilla, 2018.
    M. M. Marquez, Managing 1 PB of Object Storage in the Oracle Cloud (22 July). Presented at Oracle Cloud Platform Virtual Summit: The Modern Data Warehouse, 2020. cern.ch/go/d7Vj
    M. M. Marquez, Managing 1 PB of Object Storage in the Oracle Cloud (9 July). Presented at Oracle Cloud Platform Virtual Summit: The Modern Data Warehouse North America, 2020. cern.ch/go/QK9Nj
    M. M. Marquez, Managing 1 PB of Object Storage in the Oracle Cloud (6 August). Presented at Oracle Cloud Platform Virtual Summit: The Modern Data Warehouse EMEA, 2020. cern.ch/go/r7cX
    M. M. Marquez, Managing 1 PB of Object Storage in the Oracle Cloud (15 August). Presented at Oracle Cloud Platform Virtual Summit: The Modern Data Warehouse JAPAC, 2020. cern.ch/go/t6JQ
    M. M. Marquez, CERN Industrial IoT data with Oracle Autonomous Data Warehouse (2 June). Presented at the Oracle Global Leaders Summer Meeting, Miami, 2020. cern.ch/go/CH9d
    S. Masson, La gestion des données en gros volume au quotidien (19 May). Presented at the Oracle Technology Day 2020 : Data, innovation et continuité d’activité, 2020. http://cern.ch/go/8xkP

Data analytics in the cloud

Project goal

This project is evaluating solutions that combine data engineering, machine-learning and deep-learning tools. They are being run using cloud resources — from Oracle Cloud Infrastructure (OCI) — and address a number of use cases of interest to CERN’s community. This activity will enable us to compare performance, maturity, and stability of solutions deployed on CERN’s infrastructure with the ones in OCI.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Eric Grancher and Eva Dafonte Perez
Team members
Luca Canali, Riccardo Castellotti
Collaborator liaison(s)
Vincent Leocorbo, Cristobal Pedregal-Martin, David Ebert, Dmitrij Dolgušin

Collaborators

Project background

Big-data tools — particularly related to data engineering and machine learning — are evolving rapidly. As these tools reach maturity and are adopted more broadly, new opportunities are arising for extracting value out of large data sets.

Recent years have seen growing interest from the physics community in machine learning and deep learning. One important activity in this area has been the development of pipelines for real-time classification of particle-collision events recorded by the detectors of the LHC experiments. Filtering events using so-called “trigger” systems is set to become increasingly complex as upgrades to the LHC increase the rate of particle collisions.

Recent progress

SWAN is a platform for performing interactive data analysis in the cloud. It was developed at CERN and integrates software, compute, and storage resources used by CERN physicists and data scientists. In 2020, we deployed a proof-of-concept version of SWAN on OCI resources.

As part of this work, we developed a custom Kubernetes deployment on OCI resources, in order to take advantage of GPU resources. This proved that it is possible to run interactive analytics workflows and ML in OCI while accessing datasets from CERN’s storage systems.

We also performed a distributed machine-learning training exercise, with a recurrent neural network using over 250 GB of data. For this, we utilised 500 CPU cores and 10 GPUs using Kubernetes on OCI. Using OCI showed us that public clouds are particularly convenient for use cases that need a large number of resources for a short amount of time.

Next steps

The focus of the project in 2021 will include work to integrate CERN’s analytics platform with OCI, enabling users to run their workloads on remote cloud resources using a common interface.

Over the longer term, we are planning to add new features to the analytics platform, with a focus on improving the full lifecycle of the development of machine-learning use cases.

Publications

    M. Bień, Big Data Analysis and Machine Learning at Scale with Oracle Cloud Infrastructure. Zenodo, 2019. cern.ch/go/lhH9
    M. Migliorini, R. Castellotti, L. Canali, M. Zanetti, Machine Learning Pipelines with Modern Big Data Tools for High Energy Physics. arXiv e-prints, p. arXiv:1909.10389 [cs.DC], 2019. cern.ch/go/8CpQ
    T. Nguyen et al., Topology classification with deep learning to improve real-time event selection at the LHC, 2018. cern.ch/go/8trZ
    M. Migliorini, R. Castellotti, L. Canali, M. Zanetti, Machine Learning Pipelines with Modern Big Data Tools for High Energy Physics. Published in Computing and Software for Big Science 4, 2020. cern.ch/go/Z98M
    R.Castellotti, L. Canali, Distributed Deep Learning for Physics with TensorFlow and Kubernetes. Databases at CERN blog, 2020. cern.ch/go/8Tpl
    M. Migliorini, R. Castellotti, L. Canali, M. Zanetti, Machine Learning Pipelines with Modern Big Data Tools for High Energy Physics. Published on Computing and Software for Big Science, 2020. cern.ch/go/S8wV

Presentations

    L. Canali, “Big Data In HEP” - Physics Data Analysis, Machine learning and Data Reduction at Scale with Apache Spark (24 September). Presented at IXPUG 2019 Annual Conference, Geneva, 2019. cern.ch/go/6pr6
    L. Canali, Deep Learning Pipelines for High Energy Physics using Apache Spark with Distributed Keras on Analytics Zoo (16 October). Presented at Spark Summit Europe, Amsterdam, 2019. cern.ch/go/xp77
    R. Castellotti, L. Canali, P. Kothuri, SWAN: Powering CERN’s Data Analytics and Machine Learning Use cases (22 October). Presented at 4th Inter-experiment Machine Learning Workshop, CERN, 2020. cern.ch/go/9XPw

Evaluation of Power CPU architecture for deep learning

Project goal

We are investigating the performance of distributed training and inference of different deep-learning models on a cluster consisting of IBM Power8 CPUs (with NVIDIA V100 GPUs) installed at CERN. A series of deep neural networks is being developed to reproduce the initial steps in the data-processing chain of the DUNE experiment. More specifically, a combination of convolutional neural networks and graph neural networks are being designed to reduce noise and select specific portions of the data to focus on during the reconstruction step (region selector).

R&D topic
Machine learning and data analytics
Project coordinator(s)
Maria Girone and Sofia Vallecorsa
Team members
Marco Rossi
Collaborator liaison(s)
Eric Aquaronne, Oliver Bethmann

Collaborators

Project background

Neutrinos are elusive particles: they have a very low probability of interacting with other matter. In order to maximise the likelihood of detection, neutrino detectors are built as large, sensitive volumes. Such detectors produce very large data sets. Although large in size, these data sets are usually very sparse, meaning dedicated techniques are needed to process them efficiently. Deep-learning methods are being investigated by the community with great success.

Recent progress

We have developed a deep neural network architecture based on a combination of  two-dimensional convolutional layers and graphs. These networks can  analyse both real and simulated data from protoDUNE and perform the region selection and de-noising tasks, which are usually applied to the raw detector data before any other processing is run.

Both of these methods improve on the classical approaches currently integrated in the experiment software stack. In order to reduce training time and set up hyper-parameter scans, the training process for the networks is parallelised and has been benchmarked on the IBM Minsky cluster.

In accordance with the concept of data-parallel distributed learning, we trained our models on a total of twelve GPUs, distributed over the three nodes that comprise the test Power cluster. Each GPU ingests a unique part of the physics dataset for training the model.

Next steps

We will work to further optimise our region-selection and noise-reduction models for the DUNE data. We will test its performance on real data collected from ProtoDUNE, the prototype experiment built at CERN.

Today, high-resolution images (millions of pixels) representing DUNE data are split into a series of small crops (32x32 pixels). A new U-Net architecture approach is being investigated in order to overcome this limitation and process entire images in one single step, thus accelerating the whole data-processing process.

Our plan is to then extend this approach to perform several other steps in the data-processing chain. Our ultimate, long-term goal is to develop a tool capable of processing the raw data from the DUNE experiment, thus making it possible to replace the entire offline reconstruction approach.


Presentations

    A. Hesam, Evaluating IBM POWER Architecture for Deep Learning in High-Energy Physics (23 January). Presented at CERN openlab Technical Workshop, Geneva, 2018. cern.ch/go/7BsK
    D. H. Cámpora Pérez, ML based RICH reconstruction (8 May). Presented at Computing Challenges meeting, Geneva, 2018. cern.ch/go/xwr7
    D. H. Cámpora Pérez, Millions of circles per second. RICH at LHCb at CERN (7 June). Presented as a seminar in the University of Seville, Seville, 2018.