Members of CERN’s research community expend significant efforts to understand how they can get the most value out of the data produced by the LHC experiments. They seek to maximise the potential for discovery and employ new techniques to help ensure that nothing is missed. At the same time, it is important to optimise resource usage (tape, disk, and CPU), both in the online and offline environments. Modern machine-learning technologies — in particular, deep-learning solutions — offer a promising research path to achieving these goals. Deep-learning techniques offer the LHC experiments the potential to improve performance in each of the following areas: particle detection, identification of interesting events, modelling detector response in simulations, monitoring experimental apparatus during data taking, and managing computing resources.

 

Oracle cloud technologies for data analytics on industrial control systems

Project goal

CERN’s control systems acquire more than 250 TB of data per day from over two million signals from across the LHC and its experiments. Managing these extremely complex “Industrial Internet of Things” (IIoT) systems raises important challenges in terms of data management, retrieval, and analytics.

The project team is working with Oracle Autonomous Database and analytics technologies. The goal is to assess their capabilities in terms of integrating heterogeneous control IIoT data sets and improving performance and efficiency for the most challenging analytics requirements, while reducing operational costs.

 

R&D topic
Machine learning and data analytics
Project coordinator(s)
Manuel Martin Marquez
Team members
Manuel Martin Marquez, Sébastien Masson, Franck Pachot, Ludovico Caldara
Collaborator liaison(s)
Çetin Özbütün, Reiner Zimmermann, Michael Connaughton, Cristobal Pedregal-Martin, Engin Senel, Cemil Alper, Giuseppe Calabrese, David Ebert, Dmitrij Dolgušin

Collaborators

Project background

Keeping the LHC and the rest of the accelerator complex at CERN running efficiently requires state-of-the-art control systems. A complex IIOT system is in place to persist this data, making it possible for engineers to gain insights about temperatures, magnetic-field strengths, and more. This plays a vital role in ensuring the highest levels of operational efficiency.

The current system to persist, access, and analyse this data is based on Oracle Database. Today, significant effort is dedicated to improving performance and coping with increasing demand — in terms of data volume, analysis and exploration of bigger data sets.

Recent progress

During 2019, the team focused on three main aspects: (i) scaling data volumes, (ii) improving the efficiency of the potential solutions in terms of automatisations and reducing operational costs, and (iii) increasing data retrieval/analytics complexity using real-life scenarios.

We began migrating one of the largest and most complex control data sets to the object-storage system of Oracle Cloud Infrastructure (OCI). Due to the large volume of this data set (about 1 PB), different solutions — based on standard networks, the GÉANT network, and Oracle’s appliance-based data-transfer solution — were tested. At the same time, the team worked together with development and management teams at Oracle to define the best data-model strategy to reduce associated costs and improve efficiency. To achieve this, a hybrid model was put in place. This model emphasises the benefits of object storage in the following manner: transparent external tables, based on parquet files, are used for data that is infrequently accessed, whereas normal database tables are used for data that requires close to real-time responses. To assess this, once a representative amount of data was available, real-life data loads were captured and simulated on OCI’s Autonomous Database.

Next steps

In 2020, we will focus on finalising the migration of the historical data to OCI Object Storage, to make it available for Oracle Autonomous Database instances. This will require us to face challenges related to the task flow for Oracle’s appliance-based data-transfer solution and the network configuration for GÉANT. In addition, we will work on automating data ingestion. In parallel, we will constantly increase real analytics load and asses solutions for interactive data exploration based on Oracle Autonomous Database technologies.


Presentations

    E. Grancher, M. Martin, S. Masson, Research Analytics at Scale: CERN’s Experience with Oracle Cloud Solutions (16 January). Presented at Oracle OpenWorld 2019, London, 2019.
    A. Mendelsohn (Oracle), E. Grancher, M. Martin, Oracle Autonomous Database Keynote (16 January). Oracle OpenWorld 2019, London, 2019.
    M. Martin, J. Abel (Oracle), Enterprise Challenges and Outcomes (17 January). Presented at Oracle OpenWorld 2019, London, 2019.
    S. Masson, M. Martin, Managing one of the largest IoT systems in the world with Oracle Autonomous Technologies (18 September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/SBc9
    D. Ebert (Oracle), M. Martin, A. Nappi, Advancing research with Oracle Cloud (17 September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/9ZCg
    M. Martin, R. Zimmermann (Oracle), J. Otto (IDS GmbH), Oracle Autonomous Data Warehouse: Customer Panel (17 September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/nm9B
    S. Masson, M. Martin, Oracle Autonomous Data Warehouse and CERN Accelerator Control Systems (25 November). Presented at Modern Cloud Day, Paris, 2019.
    M. Martin, M. Connaughton (Oracle), Big Data Analytics and the Large Hadron Collider (26 November). Presented at Oracle Digital Days 2019, Dublin, 2019.
    M. Martin, Big Data, AI and Machine Learning at CERN (27 November). Presented at Trinity College Dublin and ADAPT Center, Dublin, 2019.
    M. Martin, M. Connaughton (Oracle), Big Data Analytics and the Large Hadron Collider (27 November). Presented at the National Analytics Summit 2019, Dublin, 2019. cern.ch/go/CF9p
    M. Martin Marquez, Boosting Complex IoT Analysis with Oracle Autonomous Data Warehouse Cloud (June). Presented at Oracle Global Leaders Meeting – EMEA, Budapest, 2018.
    E. Grancher, M. Martin Marquez, S. Masson, Boosting Complex IoT Analysis with Oracle Autonomous Data Warehouse Cloud (23 October). Presented at Oracle Openworld 2018, San Francisco, 2018. cern.ch/go/RBZ6
    E. Grancher, M. Martin Marquez, S. Masson, Managing one of the largest IoT Systems in the world (December). Presented at Oracle Global Leaders Meeting – EMEA, Sevilla, 2018.

Data analytics in the cloud

Project goal

This project is testing and prototyping solutions that combine data engineering with machine-learning and deep-learning tools. These solutions are being run using cloud resources — in particular resources and tools from Oracle Cloud Infrastructure (OCI) — and address a number of use cases of interest to CERN’s community. Notably, this activity will make it possible to compare the performance, maturity, and stability of solutions deployed on CERN’s infrastructure with the deployment on the OCI.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Eva Dafonte Perez, Eric Grancher
Team members
Luca Canali, Riccardo Castellotti
Collaborator liaison(s)
Barry Gleeson, Vincent Leocorbo, Don Mowbray, Cristobal Pedregal-Martin, David Ebert, Dmitrij Dolgušin

Collaborators

Project background

Big-data tools — particularly related to data engineering and machine learning — are evolving rapidly. As these tools reach maturity and are adopted more broadly, new opportunities are arising for extracting value out of large data sets.

Recent years have seen growing interest from the physics community in machine learning and deep learning. One important activity in this area has been the development of pipelines for real-time classification of particle-collision events recorded by the detectors of the LHC experiments. Filtering events using so-called “trigger” systems is set to become increasingly complex as upgrades to the LHC increase the rate of particle collisions.

Recent progress

In 2019, we tested and deployed data-analytics and machine-learning workloads of interest for CERN on OCI. Testing began with the deployment of Apache Spark on Kubernetes, using OCI resources.

During this initial phase, we were able to successfully deploy two workloads for processing physics data at scale:

•    Reduction of big data from the CMS experiment: This use case consists of running data-reduction workloads for data from particle collisions. Its goal is to demonstrate the scalability of a data-reduction workflow based on processing ROOT files using Apache Spark.

•    Spark deep-learning trigger: This use case entails the deployment of a full data-preparation and machine-learning pipeline (with 4.5 TB of ROOT data) using Apache Spark and TensorFlow.

This activity has led to a number of improvements. In particular, we were able to improve the open-source connector between OCI and the Hadoop Distributed File System: we made it compatible with recent versions of Spark and we developed a mechanism to distribute workloads.

Next steps

In 2020, the focus of the project will also include work to improve user interfaces and ease of adoption. We will develop a proof-of-concept integration of CERN’s analytics platform (SWAN) with OCI resources.

Publications

    M. Bień, Big Data Analysis and Machine Learning at Scale with Oracle Cloud Infrastructure. Zenodo (2019). cern.ch/go/lhH9
    M. Migliorini, R. Castellotti, L. Canali, M. Zanetti, Machine Learning Pipelines with Modern Big Data Tools for High Energy Physics. arXiv e-prints, p. arXiv:1909.10389 [cs.DC], 2019. cern.ch/go/8CpQ
    T. Nguyen et al., Topology classification with deep learning to improve real-time event selection at the LHC, 2018. cern.ch/go/8trZ

Presentations

    L. Canali, “Big Data In HEP” - Physics Data Analysis, Machine learning and Data Reduction at Scale with Apache Spark (24 September). Presented at IXPUG 2019 Annual Conference, Geneva, 2019. cern.ch/go/6pr6
    L. Canali, Deep Learning Pipelines for High Energy Physics using Apache Spark with Distributed Keras on Analytics Zoo (16 October). Presented at Spark Summit Europe, Amsterdam, 2019. cern.ch/go/xp77

Fast detector simulation

Project goal

We are using artificial intelligence (AI) techniques to simulate the response of the HEP detectors to particle collision events. Specifically, we are developing deep neural networks and, in particular, generative adversarial networks (GANs) to do this. Such tools will play a significant role in helping the research community cope with the vastly increased computing demands of the High Luminosity LHC (HL-LHC).

Once properly trained and optimised, generative models are able to simulate a variety of particles, energies, and detectors in just a fraction of the time required by classical simulation, which is based on detailed Monte Carlo methods. Our objective is to tune and integrate these new tools in the experiments’ existing simulation frameworks.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Federico Carminati
Team members
Sofia Vallecorsa, Gulrukh Khattak
Collaborator liaison(s)
Claudio Bellini, Marie-Christine Sawley, Andrea Luiselli, Saletore Vikram, Hans Pabst, Sun Choi, Fabio Baruffa from Intel. Valeriu Codreanu, Maxwell Cai, Damian Podareanu from SURFsara, B.V., which is also collaborating in the project.

Collaborators

Project background

Simulating the response of detectors to particle collisions — under a variety of conditions — is an important step on the path to new physics discoveries. However, this work is very computationally expensive. Over half of the computing workload of the Worldwide LHC Computing Grid (WLCG) is the result this single activity.  

We’re exploring an alternative approach, referred to as ‘fast simulation’, which trades some level of accuracy for speed. Fast-simulation strategies have been developed in the past, using different techniques (e.g. look-up tables or parametrised approaches). However, the latest developments in machine learning (particularly in relation to deep neural networks) make it possible to develop fast-simulation tools that are both more flexible and more accurate than those developed in the past.

Recent progress

Building on our work from 2018, we focused on optimising a more complex model that can simulate the effects of several particle types to within 5-10 % over a large energy range and for realistic kinematic conditions. The model is remarkably accurate: GANs can reproduce Monte Carlo predictions to within just a few percent.

Training time is, however, still a bottleneck for the meta-optimisation of the model. This includes not only the optimisation of the network weights, but also of the architecture and convergence parameters. Much of our work in 2019 concentrated on addressing this issue.

We followed up on the work, started in 2018, to develop distributed versions of our training code, both on GPUs and CPUs. We tested their performance and scalability in different environments, such as high-performance computing (HPC) clusters and clouds. The results are encouraging: we observed almost linear speed-up as the number of processors increased, with very limited degradation in the results.

We also began work to implement a genetic algorithm for optimisation. This simultaneously performs training and hyper-parameter optimisation of our network, making it easier to generalise our GAN to different detector geometries.

Next steps

We will continue to investigate HPC training and will work on the optimisation of physics accuracy in the distributed training mode.  We will also complete the development of the genetic-algorithm approach for hyper-parameter optimisation. We will also extend the tool to other types of detectors.

More broadly though, we now believe our model is mature enough to start planning its test integration with the classical approaches currently used by the LHC experiments. In addition, we will also extend the tool to cover other detectors not currently simulated.

Publications

    F. Carminati et al., A Deep Learning tool for fast detector simulation. Poster presented at the 18th International Supercomputing Conference 2018, Frankfurt, 2018. First prize awarded for best research poster. cern.ch/go/D9sn
    G. Khattak, Training Generative Adversarial Models over Distributed Computing System (2018), revised selected papers. cern.ch/go/8Ssz
    D. Anderson, F. Carminati, G. Khattak, V. Loncar, T. Nguyen, F. Pantaleo, M. Pierini, S. Vallecorsa, J-R. Vlimant, A. Zlokapa, Large scale distributed training applied to Generative Adversarial Networks for calorimeter Simulation. Presented at the 23rd international Conference on Computing in High Energy and Nuclear Physics (CHEP 2018). Proceedings in publication.
    F. Carminati, G. Khattak, S. Vallecorsa, 3D convolutional GAN for fast simulation. Presented at the 23rd international Conference on Computing in High Energy and Nuclear Physics (CHEP 2018). Proceedings in publication.
    F. Carminati, S. Vallecorsa, G. Khattak, V. Codreanu, D. Podareanu, H. Pabst , V. Saletore, Distributed Training of Generative Adversarial Networks for Fast Detector Simulation. ISC 2018 Workshops, LNCS 11203, pp. 487–503, 2018. cern.ch/go/wLP6
    G. Khattak, S. Vallecorsa, F. Carminati, Three Dimensional Energy Parametrized Generative Adversarial Networks for Electromagnetic Shower Simulation. 2018 25th IEEE International Conference on Image Processing (ICIP), Geneva, Pages 3913-3917, 2018. cern.ch/go/7PHp
    G. Khattak, S. Vallecorsa, F. Carminati, D. Moise, Data-Parallel Training of Generative Adversarial Networks on HPC Systems for HEP Simulations. 2018 IEEE 25th International Conference on High Performance Computing (HiPC), Geneva, Pages 162-171, 2018. cern.ch/go/kTX9
    F. Carminati et al., Calorimetry with Deep Learning: Particle Classification, Energy Regression, and Simulation for High-Energy Physics, NIPS 2017. cern.ch/go/7vc8
    F. Carminati et al., Three dimensional Generative Adversarial Networks for fast simulation, ACAT 2017. cern.ch/go/BN6r

Presentations

    D. Brayford, S. Vallecorsa, A. Atanasov, F. Baruffa, W. Riviera, Deploying AI Frameworks on Secure HPC Systems with Containers. Presented at 2019 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, 2019, pp. 1-6.
    G. R. Khattak, S. Vallecorsa, F. Carminati, G. M. Khan, Particle Detector Simulation using Generative Adversarial Networks with Domain Related Constraints. Presented at 2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, 2019, pp. 28-33.
    F. Carminati, S. Vallecorsa, G. Khattak, 3D convolutional GAN for fast simulation (5 March). Presented at IXPUG Spring Conference, Bologna, 2018. cern.ch/go/9TqS
    F. Carminati, G. Khattak, S. Vallecorsa, Three-dimensional energy parametrized adversarial networks for electromagnetic shower simulation (7 October). Presented at 2018 IEEE International Conference on Image Processing, Athens, 2018. cern.ch/go/lVr8
    F. Carminati, V. Codreanu, G. Khattak, H. Pabst, D. Podareanu, V. Saletore, S. Vallecorsa, Fast Simulation with Generative Adversarial Networks (12 November). Presented at The International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, 2018. cern.ch/go/Z6Wg
    F. Carminati, V. Codreanu, G. Khattak, H. Pabst, D. Podareanu, V. Saletore, S. Vallecorsa, Fast Simulation with Generative Adversarial Networks (12 November). Presented at The International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, 2018. cern.ch/go/Z6Wg
    F. Carminati, S. Vallecorsa, G. Khattak, 3D convolutional GAN for fast simulation, IXPUG Spring Conference 2018. cern.ch/go/9TqS
    F. Carminati, G. Khattak, S. Vallecorsa, Three-dimensional energy parametrized adversarial networks for electromagnetic shower simulation (7 October). Presented at 2018 IEEE International Conference on Image Processing, Athens, 2018. cern.ch/go/lVr8
    F. Carminati, G. Khattak, D. Moise, S. Vallecorsa, Data-parallel Training of Generative Adversarial Networks on HPC Systems for HEP Simulations (18 December). Presented at 25th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC, Bengaluru, 2018.
    F. Carminati, S. Vallecorsa, G. Khattak, 3D convolutional GAN for fast simulation (5 March). Presented at IXPUG Spring Conference, Bologna, 2018. cern.ch/go/9TqS
    F. Carminati, G. Khattak, D. Moise, S. Vallecorsa, Data-parallel Training of Generative Adversarial Networks on HPC Systems for HEP Simulations (18 December). Presented at 25th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC, Bengaluru, 2018.
    S. Vallecorsa, Machine Learning for Fast Simulation 2017 (June 24), Presented at ISC High Performance, Frankfurt, 2017. cern.ch/go/k6sV
    E. Orlova, Deep learning for fast simulation: development for distributed computing systems (15 August), Presented at CERN openlab summer students’ lightning talks, Geneva, 2017. cern.ch/go/NW9k
    A. Gheata, GeantV (Intel Code Modernisation) (21 September), Presented at CERN openlab Open Day, Geneva, 2017. cern.ch/go/gBS6
    S. Vallecorsa, GANs for simulation (May 2017), Fermilab, Talk at DS@HEP workshop, 2017. cern.ch/go/m9Bl
    S. Vallecorsa, GeantV – Adapting simulation to modern hardware (June 2017), Talk at PASC 2017 conference, Lugano, 2017. cern.ch/go/cPF8
    S. Vallecorsa, Machine Learning-based fast simulation for GeantV (June2017), Talk at LPCC workshop, CERN, 2017.cern.ch/go/QqD7
    S. Vallecorsa, Generative models for fast simulation (August 2017), Plenary talk at ACAT conference, Seattle, 2017.cern.ch/go/gl7l
    S. Vallecorsa, Three dimensional Generative Adversarial Networks for fast simulation, ACAT 2017. cern.ch/go/jz6C
    S. Vallecorsa et al., Tutorial on "3D convolutional GAN implementation in Neon'', Intel HPC Developers Conference 2017. cern.ch/go/ZtZ7

Exploring accelerated machine learning for experiment data analytics

Project goal

The project has two threads, each investigating a unique use case for the Micron Deep Learning Accelerator (a modular FPGA-based architecture). The first thread relates to the development of a real-time streaming machine inference engine prototype for the level-1 trigger of the CMS experiment.

The second thread focuses on prototyping a particle-identification system based on deep learning for the DUNE experiment. DUNE is a leading-edge, international experiment for neutrino science and proton-decay studies. It will be built in the US and is scheduled to begin operation in the mid-2020s.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Emilio Meschi, Paola Sala, Maria Girone
Team members
Thomas Owen James, Dejan Golubovic, Maurizio Pierini, Manuel Jesus Rodriguez, Anwesha Bhattacharya, Saul Alonso-Monsalve, Debdeep Paul, Niklas Böhm, Ema Puljak
Collaborator liaison(s)
Mark Hur, Stuart Grime, Michael Glapa, Eugenio Culurciello, Andre Chang, Marko Vitez, Dustin Werran, Aliasger Zaidy, Abhishek Chaurasia, Patrick Estep, Jason Adlard, Steve Pawlowski

Collaborators

Project background

The level-1 trigger of the CMS experiment selects relevant particle-collision events for further study, while rejecting 99.75% of collisions. This decision must be made with a fixed latency of a few microseconds. Machine-learning inference in FPGAs may be used to improve the capabilities of this system.

The DUNE experiment will consist of large arrays of sensors exposed to high-intensity neutrino beams. The use of convolutional neural networks has been shown to substantially boost particle-identification performance for such detectors. For DUNE, an FPGA solution is advantageous for processing ~ 5 TB/s of data.

Recent progress

For the CMS experiment, we studied in detail two potential use cases for a machine-learning approach using FPGAs. Data from Run 2 of the LHC was used to train a neural network. The goal of this is to improve the analysis potential of muon tracks from the level-1 trigger, as part of a 40 MHz ‘level-1 scouting’ data path. In addition, a convolutional neural network was developed for classifying and measuring energy showers for the planned high-granularity calorimeter upgrade of the CMS experiment. These networks were tested on the Micron FPGA hardware and were optimised for latency and precision.

For the DUNE part of the project, we tested the Micron inference engine and characterised its performance on existing software. Specifically, we tested it for running a neural network that can identify neutrino interactions in the DUNE detectors, based on simulated data. This enabled us to gain expertise with the board and fully understand its potential. The results of this benchmarking were presented at the 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019).

Next steps

The CMS team will focus on preparing a full scouting system for Run 3 of the LHC. This will comprise a system of around five Micron co-processors, receiving data on high-speed optical links.

The DUNE team plans to set up the inference engine as a demonstrator within the data-acquisition system of the ProtoDUNE experiment (a prototype of DUNE that has been built at CERN). This will work to find regions of interest (i.e. high activity) within the detector, decreasing the amount of data that needs to be sent to permanent storage.


Presentations

    M. J. R. Alonso, Fast inference using FPGAs food DUNE data reconstruction (7 November). Presented at 24th International Conference on Computing in High Energy and Nuclear Physics, Adelaide, 2019. cern.ch/go/bl7n
    M. J. R. Alonso, Prototyping of a DL-based Particle Identification System for the Dune Neutrino Detector (22 January). Presented at CERN openlab Technical Workshop, Geneva, 2020. cern.ch/go/zH8W
    T. O. James, FPGA-based Machine Learning Inference for CMS with the Micron Deep Learning Accelerator (22 January). Presented at CERN openlab Technical Workshop, Geneva, 2020. cern.ch/go/pM7P

NextGeneration Archiver for WinCC OA

Project goal

Our aim is to make control systems used for the LHC more efficient and smarter. We are working to enhance the functionality of WinCC OA (a SCADA tool used widely at CERN) and to apply data-analytics techniques to the recorded monitoring data, in order to detect anomalies and systematic issues that may impact upon system operation and maintenance.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Fernando Varela
Team members
Filippo Tilaro, Jakub Guzik, Anthony Hennessey, Rafal Kulaga, Piotr Golonka, Peter Sollander, Fernando Varela, Marc Bengulescu, Filip Siroky
Collaborator liaison(s)
Thomas Hahn, Juergen Kazmeier, Alexey Fishkin, Tatiana Mangels, Mikhail Kalinkin, Elisabeth Bakany, Ewald Sperrer

Collaborators

Project background

The HL-LHC programme aims to increase the integrated luminosity — and hence the rate of particle collisions — by a factor of ten beyond the LHC’s design value. Monitoring and control systems will therefore become increasingly complex, with unprecedented data throughputs. Consequently, it is vital to further improve the performance of these systems, and to make use of data-analytics algorithms to detect anomalies and anticipate future behaviour. Achieving this involves a number of related lines of work. This project focuses on the development of a modular and future-proof archiving system (NextGen Archiver) that supports different SQL and NOSQL technologies to enable data analytics. It is important that this can be scaled up to meet our requirements beyond 2020.

Recent progress

Two important milestones for the NextGeneration Archiver (NGA) project were achieved in 2019: preparation of a release for all ETM customers with WinCC OA 3.17 and start of deployment at the ALICE experiment.

Significant progress has been made with all areas of the NGA project, including providing support for redundancy, for complex queries, and for handling signal metadata. In order to improve the performance and scalability of queries, and to make sure that they do not negatively affect core components of the system, direct query functionality was also developed and tested.

In order to ensure reliability of the NGA in large systems with high throughput, several tests were performed at CERN. Existing test automation tools have been significantly extended in order to allow for better synchronisation of testing efforts at CERN and ETM.

Initial results from InfluxDB performance tests performed at CERN show that the technology will most likely not be able to replace the current Oracle technology used for systems with very large numbers of signals (in the range of hundreds of thousands). However, it could successfully act as a shorter-term storage, improving the performance of certain queries and enabling users to easily create web dashboards using Grafana.

Next steps

In 2020, work on the project will continue on many fronts. Increasing test coverage, especially for ‘corner cases’ and failure scenarios, remains one of the main priorities. Work on missing features will continue for all components of the NGA. Further tests of InfluxDB and Apache Kudu will help to determine their performance in large systems. The team will also provide support for ALICE as the experiment prepares to restart after the current long shutdown.

Publications

    P. Golonka, F. Varela-Rodriguez, Consolidation and Redesign of CERN Industrial Controls Frameworks, Proceedings for 17th Biennial International Conference on Accelerator and Large Experimental Physics Control Systems, New York, 2019. http://cern.ch/go/8RRL

Presentations

    F. M. Tilaro, R. Kulaga, Siemens Data Analytics and SCADA evolution status report (23 January). Presented at CERN openlab Technical Workshop, Geneva, 2019. cern.ch/go/kt7K

Evaluation of Power CPU architecture for deep learning

Project goal

We are investigating the performance of distributed learning and low-latency inference of generative adversarial networks (GANs) for simulating detector response to particle-collision events. The performance of a deep neural network is being evaluated on a cluster consisting of IBM Power CPUs (with GPUs) installed at CERN.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Maria Girone and Federico Carminati
Team members
Sofia Vallecorsa
Collaborator liaison(s)
Eric Aquaronne, Lionel Clavien

Collaborators

Project background

GANs offer potential as a possible way of greatly reducing the need for detailed Monte Carlo (MC) simulations in generating particle showers. Detailed MC is computationally expensive, so this could be a way to improve the overall performance of simulations in high-energy physics.

Using the large data sets obtained from MC-simulated physics events, the GAN is able to learn to generate events that mimic these simulated events. Once an acceptable accuracy range is achieved, the trained GAN can replace the classical MC simulation code, with an inference invocation of the GAN.

Recent progress

In accordance with the concept of data-parallel distributed learning, we trained a GAN model on a total of twelve GPUs, distributed over the three nodes that comprise the test Power cluster. Each GPU ingests a unique part of the physics data set for training the model.

The model we benchmarked is called ‘3DGAN’. It uses three-dimensional convolutions to simulate the energy patterns deposited by particles travelling through high-granularity calorimeters (part of the experiments’ detectors). More details about this can be found on the page about the fast-simulation project). In order to distribute the training workload across multiple nodes, 3DGAN uses an MPI-based tool called Horovod. Running on the test cluster, we achieved excellent scaling performance and improved the training time by an order of magnitude.

As planned, work also began in 2019 to prototype a deep-learning approach for the offline reconstruction of events at DUNE, a new neutrino experiment that will be built in the United States. Initial work focused on developing a model — based on a combination of convolutional and graph networks — to reduce the noise in the raw data produced by the detector. Preliminary results on MC-simulated data are very promising.

Next steps

We will work to further optimise our noise-reduction model for the DUNE data, testing its performance on real data collected from a prototype experiment built at CERN called ProtoDUNE. Furthermore, we will investigate the feasibility of running the model in low-latency environments for real-time applications, using FPGAs.

Our plan is to then extend this approach to perform several other steps in the data-processing chain. In the longer term, our ultimate goal is to develop a tool capable of processing the raw data from DUNE, thus making it possible to replace the entire offline reconstruction approach.


Presentations

    A. Hesam, Evaluating IBM POWER Architecture for Deep Learning in High-Energy Physics (23 January). Presented at CERN openlab Technical Workshop, Geneva, 2018. cern.ch/go/7BsK
    D. H. Cámpora Pérez, ML based RICH reconstruction (8 May). Presented at Computing Challenges meeting, Geneva, 2018. cern.ch/go/xwr7
    D. H. Cámpora Pérez, Millions of circles per second. RICH at LHCb at CERN (7 June). Presented as a seminar in the University of Seville, Seville, 2018.

Data analytics for industrial controls and monitoring 

Project goal

This project is working to render the industrial control systems used for the LHC more efficient and more intelligent. The aim is to develop a data-analytics platform that capitalises on the latest advances in artificial intelligence (AI), cloud and edge-computing technologies. The ultimate goal is to make use of analytics solutions provided by Siemens to provide non-expert end users with a turnkey data-analytics service.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Fernando Varela
Team members
Filippo Tilaro, Rafal Kulaga, Piotr Golonka, Peter Sollander, Fernando Varela, Marc Bengulescu, Filip Siroky
Collaborator liaison(s)
Thomas Hahn, Juergen Kazmeier, Alexey Fishkin, Tatiana Mangels, Elisabeth Bakany, Ewald Sperrer

Collaborators

Project background

The HL-LHC project aims to increase the integrated luminosity — and hence the rate of particle collisions — by a factor of ten beyond the LHC’s design value. Monitoring and control systems will therefore become increasingly complex, with unprecedented data throughputs. Consequently, it is vital to further improve the performance of these systems, and to make use of data-analytics algorithms to detect anomalies and to anticipate future behaviour. Achieving this involves a number of related lines of work. This particular project focuses on the development of a data-analytics platform that combines the benefits of cloud and edge computing.

Recent progress

In the first half of 2019, we focused on the monitoring of various LHC control systems, using two distinct analytics solutions from Siemens: Smart IIoT, a framework used to monitor a multitude of control signals in a distributed manner, and ELVis, a web-based platform for handling multiple streams of time-series data from sensors. Achieving tighter integration between ELVis and Smart IIoT was one of the main objectives for the first half of 2019. A single interface was developed to enable users to define complex event-processing rules, configure the cloud and edge infrastructure, and monitor the execution of the analyses.

In the second half of 2019, Filip Siroky, a new fellow funded by CERN openlab, joined the team. His work has focused on the following: optimising the ion-beam source for the LINAC3 accelerator at CERN; deploying Siemens’s Distributed Complex Event Processing (DCEP) technology to enable advanced data analytics and predictive maintenance for the oxygen-deficiency sensors in the LHC tunnel; and integrating an array of Siemens IoT infrared sensors for detecting room occupancy into the central room booking system at CERN.

Next steps

One of the main objectives for 2020 is to integrate the DCEP technology with the control systems of other equipment groups at CERN: cryogenics, electricity, and cooling and ventilation. The other aim is to provide a service for collection of generic AI algorithms that could easily be employed by people who are not data scientists, helping them to perform advanced analytics on controls data.

 

 


Presentations

    F. Tilaro, F. Varela, Model Learning Algorithms for Anomaly Detection in CERN Control Systems (25 January). Presented at BE-CO Technical Meeting, Geneva, 2018. cern.ch/go/7SGK
    F. Tilaro, F. Varela, Industrial IoT in CERN Control Systems (21 February). Presented at Siemens IoT Conference, Nuremberg, 2018.
    F. Tilaro, F. Varela, Optimising CERN control systems through Anomaly Detection & Machine Learning (29 August). Presented at AI workshop for Future Production Systems, Lund, 2018.
    F. Tilaro, F. Varela, Online Data Processing for CERN industrial systems (12 November). Presented at Siemens Analytics Workshop, Munich, 2018.