Members of CERN’s research community expend significant efforts to understand how they can get the most value out of the data produced by the LHC experiments. They seek to maximise the potential for discovery and employ new techniques to help ensure that nothing is missed. At the same time, it is important to optimise resource usage (tape, disk, and CPU), both in the online and offline environments. Modern machine-learning technologies — in particular, deep-learning solutions — offer a promising research path to achieving these goals. Deep-learning techniques offer the LHC experiments the potential to improve performance in each of the following areas: particle detection, identification of interesting events, modelling detector response in simulations, monitoring experimental apparatus during data taking, and managing computing resources.

 

Exploring accelerated machine learning for experiment data analytics

 

Project goal

The project has two threads, each investigating a unique use-case for the Micron Deep Learning Accelerator (a modular FPGA-based architecture). The first thread relates to the development of a real-time streaming machine inference engine prototype for the level-1 trigger of the CMS experiment.

The second thread focuses on prototyping a particle-identification system based on deep learning for the DUNE experiment. DUNE is a leading-edge, international experiment for neutrino science and proton-decay studies. It will be built in the US and is scheduled to begin operation in the mid-2020s.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Emilio Meschi, Paola Sala, Maria Girone
Team members
Thomas Owen James, Dejan Golubovic, Maurizio Pierini, Manuel Jesus Rodriguez, Anwesha Bhattacharya, Saul Alonso-Monsalve, Debdeep Paul, Niklas Böhm, Ema Puljak
Collaborator liaison(s)
Mark Hur, Stuart Grime, Michael Glapa, Eugenio Culurciello, Andre Chang, Marko Vitez, Dustin Werran, Aliasger Zaidy, Abhishek Chaurasia, Patrick Estep, Jason Adlard, Steve Pawlowski

Collaborators

Project background

The level-1 trigger of the CMS experiment selects relevant particle-collision events for further study, while rejecting 99.75% of collisions. This decision must be made with a fixed latency of a few microseconds. Machine-learning inference in FPGAs may be used to improve the capabilities of this system.

The DUNE experiment will consist of large arrays of sensors exposed to high-intensity neutrino beams. The use of convolutional neural networks has been shown to substantially boost particle-identification performance for such detectors. For DUNE, an FPGA solution is advantageous for processing ~ 5 TB/s of data.

 

Recent progress

For the CMS experiment, we studied in detail two potential use cases for a machine-learning approach using FPGAs. Data from Run 2 of the LHC was used to train a neural network. The goal of this is to improve the analysis potential of muon tracks from the level-1 trigger, as part of a 40 MHz ‘level-1 scouting’ data path. In addition, a convolutional neural network was developed for classifying and measuring energy showers for the planned high-granularity calorimeter upgrade of the CMS experiment. These networks were tested on the Micron FPGA hardware and were optimised for latency and precision.

For the DUNE part of the project, we tested the Micron inference engine and characterised its performance on existing software. Specifically, we tested it for running a neural network that can identify neutrino interactions in the DUNE detectors, based on simulated data. This enabled us to gain expertise with the board and fully understand its potential. The results of this benchmarking were presented at the 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019).

 

 

Next steps

The CMS team will focus on preparing a full scouting system for Run 3 of the LHC. This will comprise a system of around five Micron co-processors, receiving data on high-speed optical links.

The DUNE team plans to set up the inference engine as a demonstrator within the data-acquisition system of the ProtoDUNE experiment (a prototype of DUNE that has been built at CERN). This will work to find regions of interest (i.e. high activity) within the detector, decreasing the amount of data that needs to be sent to permanent storage.

 


Presentations

    M. J. R. Alonso, Fast inference using FPGAs food DUNE data reconstruction (7 November). Presented at 24th International Conference on Computing in High Energy and Nuclear Physics, Adelaide, 2019.cern.ch/go/bl7n
    M. J. R. Alonso, Prototyping of a DL-based Particle Identification System for the Dune Neutrino Detector (22 January). Presented at CERN openlab Technical Workshop, Geneva, 2020.cern.ch/go/zH8W
    T. O. James, FPGA-based Machine Learning Inference for CMS with the Micron Deep Learning Accelerator (22 January). Presented at CERN openlab Technical Workshop, Geneva, 2020.cern.ch/go/pM7P

NextGeneration Archiver for WinCC OA

Project goal

Our aim is to make control systems used for the LHC more efficient and smarter. We are working to enhance the functionality of WinCC OA (a SCADA tool used widely at CERN) and to apply data-analytics techniques to the recorded monitoring data, in order to detect anomalies and systematic issues that may impact upon system operation and maintenance.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Fernando Varela
Team members
Filippo Tilaro, Jakub Guzik, Anthony Hennessey, Rafal Kulaga, Piotr Golonka, Peter Sollander, Fernando Varela, Marc Bengulescu, Filip Siroky
Collaborator liaison(s)
Thomas Hahn, Juergen Kazmeier, Alexey Fishkin, Tatiana Mangels, Mikhail Kalinkin, Elisabeth Bakany, Ewald Sperrer

Collaborators

Project background

The HL-LHC programme aims to increase the integrated luminosity — and hence the rate of particle collisions — by a factor of ten beyond the LHC’s design value. Monitoring and control systems will therefore become increasingly complex, with unprecedented data throughputs. Consequently, it is vital to further improve the performance of these systems, and to make use of data-analytics algorithms to detect anomalies and anticipate future behaviour. Achieving this involves a number of related lines of work. This project focuses on the development of a modular and future-proof archiving system (NextGen Archiver) that supports different SQL and NOSQL technologies to enable data analytics. It is important that this can be scaled up to meet our requirements beyond 2020.

Recent progress

Two important milestones for the NextGeneration Archiver (NGA) project were achieved in 2019: preparation of a release for all ETM customers with WinCC OA 3.17 and start of deployment at the ALICE experiment.

Significant progress has been made with all areas of the NGA project, including providing support for redundancy, for complex queries, and for handling signal metadata. In order to improve the performance and scalability of queries, and to make sure that they do not negatively affect core components of the system, direct query functionality was also developed and tested.

In order to ensure reliability of the NGA in large systems with high throughput, several tests were performed at CERN. Existing test automation tools have been significantly extended in order to allow for better synchronisation of testing efforts at CERN and ETM.

Initial results from InfluxDB performance tests performed at CERN show that the technology will most likely not be able to replace the current Oracle technology used for systems with very large numbers of signals (in the range of hundreds of thousands). However, it could successfully act as a shorter-term storage, improving the performance of certain queries and enabling users to easily create web dashboards using Grafana.

Next steps

In 2020, work on the project will continue on many fronts. Increasing test coverage, especially for ‘corner cases’ and failure scenarios, remains one of the main priorities. Work on missing features will continue for all components of the NGA. Further tests of InfluxDB and Apache Kudu will help to determine their performance in large systems. The team will also provide support for ALICE as the experiment prepares to restart after the current long shutdown.

Publications

    P. Golonka, F. Varela-Rodriguez, Consolidation and Redesign of CERN Industrial Controls Frameworks, Proc. 17th Biennial International Conference on Accelerator and Large Experimental Physics Control Systems, New York, 2019. http://cern.ch/go/8RRL

Presentations

    F. M. Tilaro, R. Kulaga, Siemens Data Analytics and SCADA evolution status report (23 January). Presented at CERN openlab Technical Workshop, Geneva, 2019. http://cern.ch/go/kt7K

Fast simulation

Project goal

We are developing fast-simulation tools based on machine learning — rather than primarily using classical Monte Carlo — to simulate particle transport in the detectors of the LHC experiments. Such tools could play a significant role in helping the research community to cope with the LHC’s increasing computing demands.

The tools we are developing should be able to simulate a large variety of particles, energies, and detectors — all in a fraction of the time needed for classical simulation of particle transport. Our final objective is to integrate our tool in the existing code. This work is being carried out in collaboration with SURFsara and Cineca, as well as with Intel.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Federico Carminati
Team members
Sofia Vallecorsa, Gulrukh Khattak, Andrea Luiselli, Hans Pabst
Collaborator liaison(s)
Claudio Bellini, Marie-Christine Sawley, Saletore Vikram

Collaborators

Project background

Over half of the WLCG’s computing workload is the result of a single activity, namely detector simulation. A single code, called Geant4, is used for this. Optimising this code has the potential to significantly reduce computing requirements, thus unlocking resources for other tasks.

Fast-simulation techniques have been developed in the past. However, the latest developments in machine learning (particularly in relation to deep neural networks) make it possible to develop fast-simulation techniques that are both more flexible and more accurate than existing ones.

Recent progress

Training time has turned out to be a major bottleneck for the meta-optimisation of our generative adversarial network. This includes not only the network weight, but also its architecture and the convergence parameters. Much of our work in 2018 concentrated on addressing this. We implemented distributed versions of our training code both on GPUs and CPUs, and we tested their performance and scalability in different environments (HPC clusters and cloud). The results are very encouraging: we observed almost linear speedup as the number of processors increased, with very limited or no degradation in results.

The other main area of work in 2018 related to the extension of the fast simulation tool to incorporate a larger set of kinematic conditions. We successfully extended the parameters related to incoming particles, integrating the angle of impact in the conditioning parameters. The tool is now mature enough to start planning its test integration with a classical Monte-Carlo code, such as Geant4.

Next steps

We plan to continue improving the accuracy of the simulation, with particular attention to the tails of particle showers and single-cell energy distribution. We will also continue to investigate HPC training and explore various framework for hyper-parameter training. Finally, we will extend the simulation tool to different detectors, and collaborate on its integration into the existing simulation frameworks.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Publications

    F. Carminati et al., A Deep Learning tool for fast detector simulation. Poster presented at the 18th International Supercomputing Conference 2018, Frankfurt, 2018. First prize awarded for best research poster.
    G. Khattak, Training Generative Adversarial Models over Distributed Computing System (2018), revised selected papers. cern.ch/go/8Ssz
    D. Anderson, F. Carminati, G. Khattak, V. Loncar, T. Nguyen, F. Pantaleo, M. Pierini, S. Vallecorsa, J-R. Vlimant, A. Zlokapa, Large scale distributed training applied to Generative Adversarial Networks for calorimeter Simulation. Presented at the 23rd international Conference on Computing in High Energy and Nuclear Physics (CHEP 2018). Proceedings in publication.
    F. Carminati, G. Khattak, S. Vallecorsa, 3D convolutional GAN for fast simulation. Presented at the 23rd international Conference on Computing in High Energy and Nuclear Physics (CHEP 2018). Proceedings in publication.
    F. Carminati, S. Vallecorsa, G. Khattak, V. Codreanu, D. Podareanu, H. Pabst , V. Saletore, Distributed Training of Generative Adversarial Networks for Fast Detector Simulation. ISC 2018 Workshops, LNCS 11203, pp. 487–503, 2018. cern.ch/go/wLP6
    G. Khattak, S. Vallecorsa, F. Carminati, Three Dimensional Energy Parametrized Generative Adversarial Networks for Electromagnetic Shower Simulation. 2018 25th IEEE International Conference on Image Processing (ICIP), Geneva, Pages 3913-3917, 2018. cern.ch/go/7PHp
    G. Khattak, S. Vallecorsa, F. Carminati, D. Moise, Data-Parallel Training of Generative Adversarial Networks on HPC Systems for HEP Simulations. 2018 IEEE 25th International Conference on High Performance Computing (HiPC), Geneva, Pages 162-171, 2018. cern.ch/go/kTX9
    F. Carminati et al., Calorimetry with Deep Learning: Particle Classification, Energy Regression, and Simulation for High-Energy Physics, NIPS 2017. cern.ch/go/7vc8
    F. Carminati et al., Three dimensional Generative Adversarial Networks for fast simulation, ACAT 2017. cern.ch/go/BN6r

Presentations

    F. Carminati, S. Vallecorsa, G. Khattak, 3D convolutional GAN for fast simulation (5 March). Presented at IXPUG Spring Conference, Bologna, 2018. http://cern.ch/go/9TqS
    F. Carminati, G. Khattak, S. Vallecorsa, Three-dimensional energy parametrized adversarial networks for electromagnetic shower simulation (7 October). Presented at 2018 IEEE International Conference on Image Processing, Athens, 2018. http://cern.ch/go/lVr8
    F. Carminati, V. Codreanu, G. Khattak, H. Pabst, D. Podareanu, V. Saletore, S. Vallecorsa, Fast Simulation with Generative Adversarial Networks (12 November). Presented at The International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, 2018. http://cern.ch/go/Z6Wg
    F. Carminati, V. Codreanu, G. Khattak, H. Pabst, D. Podareanu, V. Saletore, S. Vallecorsa, Fast Simulation with Generative Adversarial Networks (12 November). Presented at The International Conference for High Performance Computing, Networking, Storage, and Analysis, Dallas, 2018. cern.ch/go/Z6Wg
    F. Carminati, S. Vallecorsa, G. Khattak, 3D convolutional GAN for fast simulation, IXPUG Spring Conference 2018. cern.ch/go/9TqS
    F. Carminati, G. Khattak, S. Vallecorsa, Three-dimensional energy parametrized adversarial networks for electromagnetic shower simulation (7 October). Presented at 2018 IEEE International Conference on Image Processing, Athens, 2018. cern.ch/go/lVr8
    F. Carminati, G. Khattak, D. Moise, S. Vallecorsa, Data-parallel Training of Generative Adversarial Networks on HPC Systems for HEP Simulations (18 December). Presented at 25th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC, Bengaluru, 2018.
    F. Carminati, S. Vallecorsa, G. Khattak, 3D convolutional GAN for fast simulation (5 March). Presented at IXPUG Spring Conference, Bologna, 2018. cern.ch/go/9TqS
    F. Carminati, G. Khattak, D. Moise, S. Vallecorsa, Data-parallel Training of Generative Adversarial Networks on HPC Systems for HEP Simulations (18 December). Presented at 25th IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC, Bengaluru, 2018.
    S. Vallecorsa, Machine Learning for Fast Simulation 2017 (June 24), Presented at ISC High Performance, Frankfurt, 2017. cern.ch/go/k6sV
    E. Orlova, Deep learning for fast simulation: development for distributed computing systems (15 August), Presented at CERN openlab summer students’ lightning talks, Geneva, 2017. cern.ch/go/NW9k
    A. Gheata, GeantV (Intel Code Modernisation) (21 September), Presented at CERN openlab Open Day, Geneva, 2017. cern.ch/go/gBS6
    S. Vallecorsa, GANs for simulation (May 2017), Fermilab, Talk at DS@HEP workshop, 2017. cern.ch/go/m9Bl
    S. Vallecorsa, GeantV – Adapting simulation to modern hardware (June 2017), Talk at PASC 2017 conference, Lugano, 2017. cern.ch/go/cPF8
    S. Vallecorsa, Machine Learning-based fast simulation for GeantV (June2017), Talk at LPCC workshop, CERN, 2017.cern.ch/go/QqD7
    S. Vallecorsa, Generative models for fast simulation (August 2017), Plenary talk at ACAT conference, Seattle, 2017.cern.ch/go/gl7l
    S. Vallecorsa, Three dimensional Generative Adversarial Networks for fast simulation, ACAT 2017. cern.ch/go/jz6C
    S. Vallecorsa et al., Tutorial on "3D convolutional GAN implementation in Neon'', Intel HPC Developers Conference 2017. cern.ch/go/ZtZ7

Evaluation of Power CPU architecture for deep learning

Project goal

We are investigating the performance of distributed learning and low-latency inference of generative adversarial networks (GANs) for simulating particle collision events. The performance of a deep neural network is being evaluated on a cluster consisting of IBM Power CPUs (with GPUs) installed at CERN.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Maria Girone and Federico Carminati
Team members
Sofia Vallecorsa, Daniel Hugo Cámpora Pérez, Niko Neufeld
Collaborator liaison(s)
Eric Aquaronne, Lionel Clavien

Collaborators

Project background

GANs offer potential as a possible way of eliminating the need for classical Monte Carlo (MC) simulations in generating particle showers. Classical MC is computationally expensive, so this could be a way to improve the overall performance of simulations in high-energy physics.

Using the large data sets obtained from MC-simulated physics events, the GAN is able to learn to generate events that mimic these simulated events. Once an acceptable accuracy range is achieved, the trained GAN can replace the classical MC simulation code, with an inference invocation of the GAN.

Recent progress

In accordance with the concept of data-parallel distributed learning, we trained the GAN on a total of twelve GPUs, distributed over the three nodes that comprise the test Power cluster. Each GPU ingests a unique part of the physics data set for training the model. The neural network was implemented with a combination of software frameworks, optimised for Power architectures: Keras, TensorFlow, and Horovod. We used an MPI to distribute the workloads over the GPUs. As a result, we achieved great scaling performance, and we were able to improve the training time by an order of magnitude. With the trained model, we achieved a speedup of four orders of magnitude, compared to using classical MC simulation.

At the LHCb experiment, a convolutional neural networks was also tested as a way of identifying particles from a certain type of electromagnetic radiation trace observed in particular sub-detectors. We fed a dataset with more than 5 million MC-generated particles through a deep neural network consisting of 4 million parameters. We used the same cluster to accelerate our exploration of this approach, achieving promising results.

Next steps

At the LHCb experiment, work will continue to improve the particle-identification performance of our new approach by incorporating new parameters into the model. We are seeking to identify the topology of the neural network that will best suit our problem; collaborating closely with IBM is key to achieving this.

We will also prototype a deep-learning approach for the offline reconstruction of events at DUNE, a new neutrino experiment that will be built in the United States of America. We believe that IBM’s Power architecture could be well suited to handling the large amounts of raw data that will be generated by this experiment.

 

 


Presentations

    A. Hesam, Evaluating IBM POWER Architecture for Deep Learning in High-Energy Physics (23 January). Presented at CERN openlab Technical Workshop, Geneva, 2018. http://cern.ch/go/7BsK
    D. H. Cámpora Pérez, ML based RICH reconstruction (8 May). Presented at Computing Challenges meeting, Geneva, 2018. http://cern.ch/go/xwr7
    D. H. Cámpora Pérez, Millions of circles per second. RICH at LHCb at CERN (7 June). Presented as a seminar in the University of Seville, Seville, 2018.

Data analytics in the cloud

Project goal

This project is testing and prototyping solutions that combine data engineering with machine-learning and deep-learning tools. These solutions are being run using cloud resources — in particular resources and tools from Oracle Cloud Infrastructure (OCI) — and address a number of use cases of interest to CERN’s community. Notably, this activity will make it possible to compare the performance, maturity, and stability of solutions deployed on CERN’s infrastructure with the deployment on the OCI.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Eric Grancher and Eva Dafonte Perez
Team members
Luca Canali, Riccardo Castellotti
Collaborator liaison(s)
Barry Gleeson, Vincent Leocorbo, Don Mowbray, Cristobal Pedregal-Martin

Collaborators

Project background

Big-data tools — particularly related to data engineering and machine learning — are evolving rapidly. As these tools reach maturity and are adopted more broadly, new opportunities are arising for extracting value out of large data sets.

Recent years have seen growing interest from the physics community in machine learning and deep learning. One important activity in this area has been the development of pipelines for real-time classification of particle-collision events recorded by detectors of the LHC detectors. Filtering events using so-called “trigger” systems is set to become increasingly complex as upgrades to the LHC increase the rate of particle collisions.

Recent progress

The project launched at the end of 2018. We began by developing and porting data pipelines — related to data-analysis and machine-learning use cases of interest — to the cloud (Kubernetes). The workloads had originally been developed to run on CERN’s Hadoop and Spark infrastructure.

Next steps

In 2019, the project will investigate two main use cases. Firstly, the replication of data-reduction systems employed at the CMS experiment, with a view to exploiting the scalability of OCI to improve upon current performance. Secondly, the deployment of specific training of the models on OCI using GPUs to test the performance limits of such models on cloud-native solutions. The training models in question are those detailed in the paper listed under publications.

 

 

Publications

    T. Nguyen et al., Topology classification with deep learning to improve real-time event selection at the LHC, 2018. http://cern.ch/go/8trZ

Data analytics for industrial controls and monitoring 

Project goal

This project is working to render the industrial control systems used for the LHC more efficient and more intelligent. The aim is to develop a data-analytics platform that capitalises on the latest advances in artificial intelligence (AI), cloud and edge-computing technologies. The ultimate goal is to make use of analytics solutions provided by Siemens to provide non-expert end users with a turnkey data-analytics service.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Fernando Varela
Team members
Filippo Tilaro, Rafal Kulaga, Piotr Golonka, Peter Sollander, Fernando Varela, Marc Bengulescu, Filip Siroky
Collaborator liaison(s)
Thomas Hahn, Juergen Kazmeier, Alexey Fishkin, Tatiana Mangels, Elisabeth Bakany, Ewald Sperrer

Collaborators

Project background

The HL-LHC project aims to increase the integrated luminosity — and hence the rate of particle collisions — by a factor of ten beyond the LHC’s design value. Monitoring and control systems will therefore become increasingly complex, with unprecedented data throughputs. Consequently, it is vital to further improve the performance of these systems, and to make use of data-analytics algorithms to detect anomalies and to anticipate future behaviour. Achieving this involves a number of related lines of work. This particular project focuses on the development of a data-analytics platform that combines the benefits of cloud and edge computing.

Recent progress

In the first half of 2019, we focused on the monitoring of various LHC control systems, using two distinct analytics solutions from Siemens: Smart IIoT, a framework used to monitor a multitude of control signals in a distributed manner, and ELVis, a web-based platform for handling multiple streams of time-series data from sensors. Achieving tighter integration between ELVis and Smart IIoT was one of the main objectives for the first half of 2019. A single interface was developed to enable users to define complex event-processing rules, configure the cloud and edge infrastructure, and monitor the execution of the analyses.

In the second half of 2019, Filip Siroky, a new fellow funded by CERN openlab, joined the team, His work has focused on the following: optimising the ion beam source for the LINAC3 accelerator at CERN; deploying Siemens’s Distributed Complex Event Processing (DCEP) technology to enable advanced data analytics and predictive maintenance for the oxygen-deficiency sensors in the LHC tunnel; and integrating an array of Siemens IoT infrared sensors for detecting room occupancy into the central room booking system at CERN.

Next steps

One of the main objectives for 2020 is to integrate the DCEP technology with the control systems of other equipment groups at CERN: cryogenics, electricity, and cooling and ventilation. The other aim is to provide a service for collection of generic AI algorithms that could easily be employed by people who are not data scientists, helping them to perform advanced analytics on controls data.

 

 


Presentations

    F. Tilaro, F. Varela, Model Learning Algorithms for Anomaly Detection in CERN Control Systems (25 January). Presented at BE-CO Technical Meeting, Geneva, 2018. http://cern.ch/go/7SGK
    F. Tilaro, F. Varela, Industrial IoT in CERN Control Systems (21 February). Presented at Siemens IoT Conference, Nuremberg, 2018.
    F. Tilaro, F. Varela, Optimising CERN control systems through Anomaly Detection & Machine Learning (29 August). Presented at AI workshop for Future Production Systems, Lund, 2018.
    F. Tilaro, F. Varela, Online Data Processing for CERN industrial systems (12 November). Presented at Siemens Analytics Workshop, Munich, 2018.

Oracle cloud technologies for data analytics on industrial control systems

Project goal

This project is working to assess the capabilities of Oracle Autonomous Data Warehouse Cloud (ADWC) and Oracle Autonomous Analytics Cloud (AAC). These technologies are being tested for use in handling the masses of data that come from the control and monitoring systems in place for CERN’s accelerator complex. Specifically, our goal is to try to use these technologies to integrate different existing datasets, to improve the performance and efficiency for the most important and challenging data retrieval/analysis, and to unlock new possibilities for data exploration.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Eric Grancher and Eva Dafonte Perez
Team members
Manuel Martin Marquez, Sébastien Masson, Franck Pachot
Collaborator liaison(s)
Cemil Alper, Dimitry Dolgushin, David Ebert, Vincent Leocorbo, Pauline Maher, Cristobal Pedregal-Martin, Reiner Zimmermann

Collaborators

Project background

The LHC is one of the largest, most complex machines ever built. Keeping it — and the rest of the accelerator complex at CERN — running efficiently requires state-of-the-art control systems. More than 2.5 terabytes of monitoring data is generated per day, coming in from over a million signals spread across the accelerators and detectors. A complex “Industrial Internet of Things” (IIoT) system is in place to persist this data, making it possible for scientists and engineers to gain insights about temperatures, magnetic field strengths, beam intensities, and much more. This plays a vital role in ensuring the highest levels of operational efficiency.

The current system to persist, access, and analyse the controls and monitoring data is based on Oracle Database. Today, significant effort is dedicated to improving performance and coping with increasing demand — in terms of both data volume and analysis of bigger datasets.

Recent progress

We organised our work in 2018 into three phases. In the initial phase, we carried out a high-level feasibility study of ADWC and AAC, making sure the technology could manage the extreme demands of our IIoT systems and our complex analytics queries. In this phase, we also explored the flexibility of provisioning, as well as the ability of the technology to automate updates, backups, and patches.

The second phase was dedicated to the evaluation of various procedures for migrating the data from our current on-premises architectures to Oracle’s cloud services. In particular, we considered the complexity of the data format, partitioning, indexing, etc. This work made it possible for us to evaluate the initial workload and data-analysis performance on a representative subset of the data, helping us to gain insights into the advanced optimisation features of AAC. We were also able to use Oracle Hybrid Columnar Compression to reduce storage requirements to about a tenth of what they previously were, as well as reducing the requirement for full scans. Thus, the performance for data retrieval and analytics tasks was significantly improved. On top of this, the system offered transparent and automated access to Oracle’s “Exadata SmartScan” and “Exadata Storage Indexes” features. This reduced — or, in some cases, removed entirely — the dependency on indexes.

In the last phase, we also worked with AAC to offer seamless data analytics based on collaborative and interactive dashboards. Our most recent work focuses on elasticity and scalability. In particular, we are working to increase the data volume used to one terabyte and increase the complexity of the workloads and analysis.

Next steps

This will lead to a comparison between the Autonomous Database’s capacities and other databases platform including the current on-premises setup.

 

 

 


Presentations

    M. Martin Marquez, Boosting Complex IoT Analysis with Oracle Autonomous Data Warehouse Cloud (June). Presented at Oracle Global Leaders Meeting – EMEA, Budapest, 2018.
    E. Grancher, M. Martin Marquez, S. Masson, Boosting Complex IoT Analysis with Oracle Autonomous Data Warehouse Cloud (23 October). Presented at Oracle Openworld 2018, San Francisco, 2018. http://cern.ch/go/RBZ6
    E. Grancher, M. Martin Marquez, S. Masson, Managing one of the largest IoT Systems in the world (December). Presented at Oracle Global Leaders Meeting – EMEA, Sevilla, 2018.

Intel big-data analytics

Project goal

At CERN, researchers are always exploring the latest scalable solutions needed to tackle a range of data challenges at the laboratory, related to both physics analysis and machine analytics. This project aims to help optimise analytics solutions for data integration, data ingestion, data transformation, performance, scalability, benchmarking, resource management, data visualisation, and hardware utilisation.

R&D topic
Machine learning and data analytics
Project coordinator(s)
Luca Canali, Maria Girone, Eric Grancher
Team members
Evangelos Motesnitsalis, Viktor Kozlovszky, Viktor Khristenko, Matteo Migliorini, Vasileios Dimakopoulos, Matteo Cremonesi, Oliver Gutsche, Bian Bianny, Klaus-Dieter Oertel
Collaborator liaison(s)
Claudio Bellini, Mike Riess

Collaborators

Project background

The LHC experiments continue to produce large amount of physics data, which offers numerous possibilities for new discoveries. Big-data technologies, such as Apache Spark, hold great potential for helping us to optimise our existing physics data-processing procedures, as well as our solutions for industrial control and online processing. Through this project, we are working to design and optimise solutions based on open-source big-data technologies. This work is being carried out in collaboration with Intel, the CMS experiment, the CERN IT department, the Fermi National Accelerator Laboratory (Fermilab) in the United States, and DIANA/HEP (a collaborative endeavour to develop state-of-the-art software tools for high-energy physics experiments).

Recent progress

In 2018, the project mostly focused on use cases related to the processing of physics data at scale. In particular, we built on two key data-engineering challenges that were tackled in the previous year: the development of a mature Hadoop-XRootD Connector library, which makes it possible to read files from the CERN’s EOS storage system, and the Spark-ROOT library, which makes it possible to read ROOT files in Spark DataFrames (ROOT is an object-oriented program and library developed at CERN that provides tools for big data processing, statistical analysis, visualisation, and storage). We were able to produce, scale up, and optimise physics data-processing workloads on Apache Spark and test them with over one petabyte of open data from the CMS experiment.

In addition, we worked to address challenges related to the application of machine-learning solutions on physics data, using Intel BigDL (a distributed deep-learning library for Apache Spark) alongside a combination of Keras (an open-source neural network library) and TensorFlow (an open-source machine-learning framework). This led to promising results. The compatibility of the developed workloads with popular open-source analytics and machine-learning frameworks makes them very appealing, with various analysis groups from the CMS experiment choosing to carry out further development of these solutions.

Next steps

We will repeat the workload tests on top of virtualised/containerised cloud-native infrastructure, complete with Kubernetes. This will include running at CERN and performing tests on public clouds.

Furthermore, we also have plans for extending the techniques developed in the project to tackle more workloads. For example, we will work to address more complex physics data-processing challenges, such as use cases related to machine learning for online data processing (streaming).

 

 

Publications

    O. Gutsche et al., CMS Analysis and Data Reduction with Apache Spark. Proceedings for 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (21 August), Seattle, 2017. cern.ch/go/H6Xj

Presentations

    O. Gutsche, Status of CMS Big Data Project (April 04), Presented at R&D meeting of CMS Spring Offline and Computing Week 2017, Geneva, 2017. cern.ch/go/hBC6
    O. Gutsche, Data Analytics in Physics Data Reduction (April 27), Presented at CERN openlab workshop on Machine Learning and Data Analytics, Geneva, 2017. cern.ch/go/8JNM
    M. Cremonesi, Infrastructure for Large Scale HEP data analysis (May 11), Presented at DS@HEP 2017 at Fermilab, Illinois, 2017. cern.ch/go/tL6c
    S. Sehrish, A path toward HEP data analysis using high performance computing (May 11), Presented at DS@HEP 2017 at Fermilab, Illinois, 2017. cern.ch/go/S9tD
    O. Gutsche, Status and Plans of the CMS Big Data Project (May 29), Presented at CERN Database Futures Workshop, Geneva, 2017. cern.ch/go/C7TJ
    O. Gutsche, CMS Analysis and Data Reduction with Apache Spark (August 22), Presented at 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2016), Seattle, 2017. cern.ch/go/JTm7
    E. Motesnitsalis, Intel Big Data Analytics (21 September), Presented at CERN openlab Open Day, Geneva, 2017. cern.ch/go/8MM6
    E. Motesnitsalis et al., Physics Data Analytics and Data Reduction with Apache Spark (10 October), Presented at Extremely Large Database Conference ‘XLDB’ 2017, Clermont-Ferrand, 2017. cern.ch/go/l9LJ
    V. Khristenko, HEP Data Processing with Apache Spark (December 6), Presented at CERN Hadoop User Forum, Geneva, 2017. cern.ch/go/D7x6
    E. Motesnitsalis, Hadoop and Spark Services at CERN (19 April). Presented at DataWorks Summit, Berlin, 2018. cern.ch/go/LF8r
    E. Motesnitsalis, From Collision to Discovery: Physics Analysis with Apache Spark (April). CERN Spring Campus, Riga, 2018.
    V. Khristenko, Physics Analysis with Apache Spark in the CERN Hadoop Service and DEEP-EST Environment (25 May). Presented at IT Technical Forum, Geneva, 2018.
    E. Motesnitsalis, From Collision to Discovery: Physics Analysis with Apache Spark (7 August). Presented at IT Lectures CERN openlab Summer Student Programme, Geneva, 2018. cern.ch/go/W9HF
    E. Motesnitsalis, Big Data at CERN (20 September). Presented at Second International PhD School on Open Science Cloud, Perugia, 2018. cern.ch/go/JL8Q
    M. Cremonesi et al., Using Big Data Technologies for HEP Analysis (July). Presented 23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP), Sofia, 2018. cern.ch/go/D8wv