Project goal

CERN’s accelerator chain — culminating in the LHC — is extremely complex. We are working with Oracle to assess cutting-edge technologies for big-data analytics. Our goal is to use these technologies to help us gain new insights from the large amounts of monitoring data the accelerator chain’s systems generate.

R&D topic
R&D Topic 3: Machine learning and data analytics
Project coordinator(s)
Eric Grancher and Eva Dafonte Perez
Technical team members
Manuel Martin Marquez, Antonio Romero Marin
Collaborator liaison(s)
Chris Lysnkey, Ryan Stark, Mark Hornick, Davide Basilio Bartolini

Collaborators

Project background

Both the accelerators at CERN and the experiments’ cathedral-sized detectors produce huge amounts of systems-monitoring data. The effective exploitation of this data is vital in helping us to ensure the smooth and efficient running of this highly complex infrastructure. Technologies such as Hadoop, Apache Spark, and Apache Kafka, are already playing important roles. They are helping us to improve analytics performance on bigger datasets, do real-time analytics, do graph-based analysis, carry out ‘data discovery’ via advanced visualisation techniques, and more. Nevertheless, important challenges remain in terms of both the integration of new technologies and in terms of training users.

Recent progress

During 2017, we contributed to work aimed at assessing the feasibility of the Future Circular Collider, a proposed successor to the LHC. For this, we defined a technology infrastructure using Oracle Big Data Discovery; this enables the analysis of terabytes of technical engineering data produced by approximately 50,000 sensors and monitoring devices in CERN’s accelerator complex.

We also worked to develop a real-time analytics framework based on Apache Kafka. We then integrated this with Oracle Stream Explorer as an interface with which to interact with the data streams and process complex events.

Another area of work involved the definition and development of a proof-of-concept analytics system using Oracle Parallel Graph Analytics (PGX). We explored the integration of this with Apache Spark, as well as running it over the cloud, to help us extract important value out of the complex data generated by the accelerator complex at CERN.

In addition, we carried out significant outreach work in 2017: we shared information about big-data analytics with a range of organisations, participated in a number of important conferences (including the Oracle Global Leaders Summit in Nicosia, Cyprus, and the Kafka Summit, in San Francisco, California, USA), and lectured as part of ESADE Business School’s ‘Big Data Analytics for Executives’ programme in Madrid, Spain. Finally, we also defined new analytics approaches for the scholarly publication system used at CERN in 2017.

Next steps

Going forwards, we are keen to enhance the analytics services described above by making it possible to run them over the cloud. We will also work to help expand the use of these services across CERN, while supporting the evolution of PGX and its integration with Apache Spark and various cloud technologies. A final area ripe for investigation is the evolution of real-time analytics frameworks coupled with data-source-agnostic interfaces, such as Oracle Big Data SQL.


Presentations

    M. Martin Marquez, CERN's Journey into Big Data and Analytics: An Opportunity Full of Challenges (1 February), Presented at Oracle Modern Business Experience, London, 2017.cern.ch/go/kN7W
    M. Martin Marquez, Neil Sholay, Michael Connaughton, Big Data and Analytics Analyst Summit (1 February), Presented at Oracle Modern Business Experience, London, 2017. cern.ch/go/l6ml
    M. Martin Marquez, CERN IoT Systems and Predictive Maintenance (23 March), Presented at Oracle Road to Big Data, Madrid, 2017. cern.ch/go/9mfF
    M. Martin Marquez, G. Gabriel, F. Amalfi, How Your University Can Accelerate Enterprise Research at Scale: CERN's Experience with Oracle Big Data Platforms (24 May), webinar, 2017. cern.ch/go/8cCd
    A. Paragas, Zenodo Keyword Auto-Suggest using Parallel Graph Analytics (11 August), Presented at CERN openlab summer students’ lightning talks, Geneva, 2017. cern.ch/go/Q8zd
    M. Martin Marquez, Oracle Analytics-as-a-Service (21 September), Presented at CERN openlab Open Day, Geneva, 2017. cern.ch/go/6LZQ
    M. Martin Marquez, Accelerating Particles to Explore the Mysteries of the Universe and How Kafka can Help on that, Kafka Summit, San Francisco, 2017. cern.ch/go/h6nh
    M. Martin Marquez, Analytics customer reception Q&A panel (1 October), Presented at Oracle Open World, San Francisco, 2017. cern.ch/go/h96j
    M. Martin Marquez, Franco Amalfi, Enterprise Research at Scale: CERN’s Experience With Oracle Big Data Platform (3 October), Presented at Oracle Open World, San Francisco, 2017. cern.ch/go/J9fJ