Designing and operating distributed data infrastructures and computing centres poses challenges in areas such as networking, architecture, storage, databases, and cloud. These challenges are amplified and added to when operating at the extremely large scales required by major scientific endeavours. CERN is evaluating different models for increasing computing and data-storage capacity, in order to accommodate the growing needs of the LHC experiments over the next decade. All models present different technological challenges. In addition to increasing the on-premise capacity of the systems used for traditional types of data processing and storage, explorations are being carried out into a number of complementary distributed architectures and specialised capabilities offered by cloud and HPC infrastructures. These will add heterogeneity and flexibility to the data centres, and should enable advances in resource optimisation.

 

Oracle WebLogic on Kubernetes

Project goal

This project is working to improve our deployment of the Oracle WebLogic infrastructure, in order to make it portable, repeatable, and faster. This will help us to be more efficient in our daily work.

R&D topic
Data-centre technologies and infrastructures
Project coordinator(s)
Artur Wiecek
Team members
Antonio Nappi, Lukas Gedvilas, Luis Rodriguez Fernandez, Theodoros Rizopoulos, Aimilios Tsouvelekakis
Collaborator liaison(s)
Monica Riccelli, Will Lyons, Maciej Gruzka, Cristobal Pedregal-Martin, David Ebert, Dmitrij Dolgušin

Collaborators

Project background

The Oracle WebLogic service has been active at CERN for many years, offering a very stable way to run applications core to the laboratory. However, we would like to reduce the amount of time we spend on maintenance tasks or creating new environments for our users. We therefore started to explore solutions that could help us to improve the way in which we deploy Oracle WebLogic. Kubernetes has now made our deployment much faster, reducing the time spent on operational tasks and enabling us to focus more on developers’ needs.

Recent progress

Progress was made in a range of areas in 2019. We first reorganised the structure of our repositories, in order to better organise our work and to split Docker images into smaller ones — in line with trends towards microservices. We have defined and unified the way we were managing secrets in Kubernetes and in virtual machines. We also improved the way our users interact with the new infrastructure by providing them with a new way to deploy applications via REST API, using standard technologies like Rundeck.

In addition, we sped up and simplified the configuration of WebLogic using an open-source tool provided by Oracle. This has enabled us to build the WebLogic domain when a container was starting, instead of storing it in Docker images.

Next steps

The project itself is now relatively stable, with much testing already done. We are now in the migration phase, with many development-and-testing environments already moved to Kubernetes. Our main goal is now to migrate the remaining such environments in 2020, as well as the production environment. We would also like to integrate new standard technologies, like Prometheus and Fluentd, into our systems for monitoring and logging.

Publications

    A. Nappi. HAProxy High Availability Setup. Databases at CERN blog. 2017.cern.ch/go/9vPf
    A. Nappi. HAProxy Canary Deployment. Databases at CERN blog. 2017. cern.ch/go/89ff

Presentations

    A. Nappi, WebLogic on Kubernetes at CERN (May 16). Presented at WebLogic Server Summit, Rome, 2019.
    A. Nappi, One Tool to Rule Them All: How CERN Runs Application Servers on Kubernetes (16 September) Presented at Oracle Code One 2019, San Francisco, 2019. cern.ch/go/DbG9
    D. Ebert (Oracle), M. Martin, A. Nappi, Advancing research with Oracle Cloud (18 September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/LH6Z
    E. Screven, A. Nappi, Cloud Platform and Middleware Strategy and Roadmap (17 September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/d8PC
    M. Riccelli, A. Nappi, Kubernetes: The Glue Between Oracle Cloud and CERN Private Cloud (17 September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/Bp8w
    A.Nappi, L. Rodriguez Fernández, WebLogic on Kubernetes (17 January), Presented at CERN openlab meeting with Oracle in Geneva, Geneva, 2017. cern.ch/go/6Z8R
    S. A. Monsalve, Development of WebLogic 12c Management Tools (15 August), Presented at CERN openlab summer students’ lightning talks, Geneva 2017. cern.ch/go/V8pM
    A. Nappi, L. Rodriguez Fernández, WebLogic on Kubernetes (15-17 August), Presented at Oracle Workshop Bristol, Bristol, 2017. cern.ch/go/6Z8R
    A. Nappi, WebLogic on Kubernetes (21 September), Presented at CERN openlab Open Day, Geneva, 2017. cern.ch/go/6Z8R
    A.Nappi, L. Rodriguez Fernández, Oracle Weblogic on Containers: Beyond the frontiers of your Data Centre Openday (21 September), Presented at CERN openlab Open Day, Geneva, 2017. cern.ch/go/nrh8
    A. Nappi, L. Gedvilas L. Rodríguez Fernández, A. Wiecek, B. Aparicio Cotarelo (9-13 July), Presented at 23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP), Sofia, Bulgaria, 2018. cern.ch/go/dW8J
    A. Nappi, L. Gedvilas L. Rodríguez Fernández, A. Wiecek, B. Aparicio Cotarelo (9-13 July), Presented at 23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP), Sofia, Bulgaria, 2018. cern.ch/go/dW8J
    L. Rodriguez Fernandez, A. Nappi, Weblogic on Kubernetes (11 January). Presented at CERN Openlab Technical Workshop, Geneva, 2018. cern.ch/go/6Z8R
    B. Cotarelo, Oracle Weblogic on Kubernetes (July). Presented at 23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP), Sofia, 2018. cern.ch/go/6MVQ
    M. Riccelli, D. Cabelus, A. Nappi, Running a Modern Java EE Server in Containers Inside Kubernetes (23 October). cern.ch/go/b6nl

Oracle Management Cloud

Project goal

We are testing Oracle Management Cloud (OMC) and providing feedback to Oracle, including proposals for the evolution of the platform. We are assessing the merits and suitability of this technology for applications related to databases at CERN, comparing it with our current on-premises infrastructure.

R&D topic
Data-centre technologies and infrastructures
Project coordinator(s)
Eva Dafonte Perez, Eric Grancher
Team members
Aimilios Tsouvelekakis
Collaborator liaison(s)
Simone Indelicato, Vincent Leocorbo, Cristobal Pedregal-Martin, David Ebert, Dmitrij Dolgušin

Collaborators

Project background

The group responsible for database services within CERN’s IT department uses and provides specialised monitoring solutions to teams across the laboratory that use database infrastructure. Since the beginning of 2018, we have an agreement in place with Oracle to test OMC, which offers a wide variety of monitoring solutions.

At CERN, as at other large organisations, it is very important to be able to monitor at all times what is happening with systems and applications running both locally and in the cloud. Thus, we conducted tests of OMC involving hosts, application servers, databases, and Java applications

Recent progress

Improvements proposed to Oracle during the previous year were implemented within new releases of the platform in 2019. Initial investigation shows that the platform has been enhanced with features covering most of our needs.

Furthermore, we deployed the registration application for the CERN Open Days in Oracle Cloud Infrastructure (see project ‘Developing a ticket reservation system for the CERN Open Days 2019’ for further details). The application made use of the Oracle Autonomous Transaction Processing (ATP) database to store visitor information. The behaviour of the ATP database was monitored using OMC, providing meaningful insights into the stresses put on the database and the database-hosting system during the period in which registration for the event was open.

Next steps

The next step for OMC is to use it as a monitoring platform for all the projects that are to be deployed in the Oracle Cloud Infrastructure.


Presentations

    A. Tsouvelekakis, Oracle Management Cloud: A unified monitoring platform (23 January). Presented at CERN openlab Technical Workshop, Geneva, 2019. cern.ch/go/Z7j9
    A. Tsouvelekakis, Enterprise Manager and Management Cloud CAB (April). Presented at Oracle Customer Advisory Board, Redwood Shores, 2019. cern.ch/go/tZD8
    A. Tsouvelekakis, CERN: Monitoring Infrastructure with Oracle Management Cloud (September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/mMd8

Developing a ticket reservation system for the CERN Open Days 2019

Project goal

The goal of this project was to deliver a web-based ticketing system, with zero downtime, for the CERN Open Days 2019. The system had to be capable of handling multiple ticket-release dates and high loads, as well as providing a user-friendly and responsive interface.

 

R&D topic
Data-centre technologies and infrastructures
Project coordinator(s)
Artur Wiecek
Team members
Antonio Nappi, Franck Pachot, Luis Rodriguez Fernandez, Nuno Guilherme, Matos De Barros, Thomas Løkkeborg, Viktor Kozlovszky
Collaborator liaison(s)
Pauline Gillet-Mahrer, Christian Jacobsen, Eric Mitha, Cristobal Pedregal-Martin, David Ebert, Dmitrij Dolgušin

Collaborators

Project background

With the LHC stopped for upgrade and maintenance during 2019, CERN took the opportunity to organise one of its major Open Days events in September. At these events, members of the public are able to visit many areas that are normally restricted. Our team's responsibility was to design, deliver, and maintain the ticketing system for the Open Days. The event took place over a full weekend, and was attended by 75,000 people.

Recent progress

The project was delivered in 75 days and was maintained for three months from mid-June. The system was made available in English and French, and it had a multi-layer architecture. Thanks to the cutting-edge technologies used, adherence to coding standards, and the fine-tuning possibilities offered by the various layers, we were able to successfully deliver a highly available and scalable system.

We used an Angular framework, with Angular material design on the front-end application layer and a REST API based on the Java Spring boot application as the back-end layer. These were run in containers, hosted in a Kubernetes environment. Each of the components were individually scalable based on the load, as was the Oracle Autonomous Transaction Processing database. All the components were hosted on the Oracle cloud, using cloud-native services like a load balancer, a container engine for Kubernetes, and Oracle Analytics Cloud. The system was stress tested and was capable of handling 20,000 parallel users across a period of six minutes.

The reservation system played an important role in the organisation of this large event.

Next steps

The project came to a successful close on 16 September 2019. As part of the project, we tested and evaluated the capabilities of the Oracle Cloud, which will serve as the foundation for a planned upcoming project on disaster recovery.

 

The project members would also like to thank the following people at CERN for their assistance:

  • Sébastien Masson, Manuel Martin Marquez, Aimilios Tsouvelekakis and Aparicio Cotarelo for their consultation services.
  • Francois Briard, the project owner.
  • Tobias Betz, for his assistance with the mail templates and the countdown timer.
  • Liviu Valsan, Sebastian Lopienski, and Jose Carlos Luna Duran for their assistance with security.
  • Bas Wallet and Ewa Lopienska for their work on the user interface and experience.
  • Ana Godinho, Odile Martin, Achintya Rao, and Corinne Pralavorio for their translation work.

Publications

    V. Kozlovszky, Open Days reservation system's high level overview – 2019. Databases at CERN blog. 2019. cern.ch/go/ff6k
    V. Kozlovszky, Internationalization of the 2019 Open Days reservation system. Databases at CERN blog. 2019. cern.ch/go/6qQB

Presentations

    E. Grancher, V. Kozlovszky, 100% Oracle Cloud: Registering 90,000 People for CERN Open Days (16 September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/6xZZ
    E. Grancher, 11 Months with Oracle Autonomous Transaction Processing (18 September). Presented at Oracle OpenWorld 2019, San Francisco, 2019. cern.ch/go/8PQ6

Heterogeneous I/O for Scale

 

Project goal

We are working to develop a proof of concept for an FPGA-based I/O intermediary. The potential impact would be to change the way data ingestion happens when using remote storage locations. In view of the enormous amounts of data to be employed in the future for data analytics, it is crucial to efficiently manage the flow in order to harness the computational power provided by high-performance computing (HPC) facilities.

R&D topic
Data-centre technologies and infrastructures
Project coordinator(s)
Maria Girone, Viktor Khristenko
Collaborator liaison(s)
Ulrich Bruening (University of Heidelberg), Mondrian Nuessle (Extoll GmbH)

Collaborators

Project background

One of the common aspects of all data-intensive applications is the streaming of recorded data from remote storage locations. This often imposes constraints on the network and forces a compute node to introduce complex logic to perform aggressive caching in order to remove latency. Moreover, this substantially increases the memory footprint of the running application on a compute node. This project, abbreviated to ‘HIOS’, aims to provide a scalable solution for such data-intensive workloads by introducing heterogeneous I/O units directly on the compute clusters. This makes it possible to offload the aggressive caching functionality onto these heterogeneous units. By removing this complicated logic from compute nodes, the memory footprint decreases for data-intensive applications. Furthermore, the project will investigate the possibility to include additional logic, coding/decoding, serialisation, I/O specifics, directly onto such units.

An integral part of the project will be the ability to integrate developed units directly with current HPC facilities. One of the main outcomes of the project will be the reduced time required to extract insights from large quantities of acquired information, which, in turn, directly impacts society and scientific discoveries.

HIOS is one of the 170 breakthrough projects receiving funding through the ATTRACT initiative. ATTRACT, which is part of the European Union’s Horizon 2020 programme, is financing breakthrough ideas in the fields of detection and imaging.

 

Recent progress

Development work began in September 2019. The first task was to provide the full specification and initial implementation that will become the proof of concept.

Next steps

In 2020, work will be carried out to complete the proof of concept and to submit the deliverables defined within the ATTRACT project.

Dynamical Exascale Entry Platform – Extreme Scale Technologies (DEEP-EST)

Project goal

The main focus of the project is to build a new kind of a system that makes it possible to run efficiently a wide range of applications — with differing requirements — on new types of high-performance computing (HPC) resources. From machine-learning applications to traditional HPC applications, the goal is to build an environment which is capable of accommodating workloads that pose completely different challenges for the system.

R&D topic
Data-centre technologies and infrastructures
Project coordinator(s)
Maria Girone (for DEEP-EST task 1.7)
Team members
Viktor Khristenko (for DEEP-EST task 1.7)

Collaborators

Project background

DEEP-EST is a project funded by the European Commission that launched in 2017, following on from the successful DEEP and DEEP-ER projects. The project involves 27 partners in more than 10 countries and is coordinated from Jülich Supercomputing Centre at Forschungszentrum Jülich in Germany.

Overall, the goal is to create a modular supercomputer that best fits the requirements of diverse, increasingly complex, and newly emerging applications. The innovative modular supercomputer architecture creates a unique HPC system by coupling various compute modules according to the building-block principle: each module is tailored to the needs of a specific group of applications, with all modules together behaving as a single machine.

Specifically, the prototype consists of three compute modules: the cluster module (CM), the extreme scale booster (ESB), and the data-analytics module (DAM).

  • Applications requiring high single-thread performance are targeted to run on the CM nodes, where Intel® Xeon® Scalable Processor (Skylake) provide general-purpose performance and energy efficiency.
  • The architecture of ESB nodes was tailored for highly scalable HPC software stacks capable of extracting the enormous parallelism provided by Nvidia V100 GPUs.
  • Flexibility, large memory capacities using Intel® Optane™ DC persistent memory and different acceleration capabilities (provided by Intel Stratix 10 and Nvidia V100 GPU on each node) are key features of the DAM; they make it an ideal platform for data-intensive and machine-learning applications.

CERN, in particular the CMS Experiment, participates by providing one of the applications that are used to evaluate this new supercomputing architecture.

Recent progress

We ported the software used at CMS for the reconstruction of particle collision events for hadronic and electromagnetic calorimeters. We then optimised these workloads for running on Nvidia V100 GPUs, comparing performance against the CPU-based systems currently used in production.

Next steps

We will incorporate MPI offloading into the ‘CMSSW’ software framework used at the CMS experiment, in order to be able to run different parts of the reconstruction on different hardware. We will also explore the use of FPGAs for reconstruction at CMS. Finally, we will explore the possibility of capitalising on new and faster memory interconnection between CPUs and accelerators, using Intel® OneAPI  for development and debugging.

Publications

    The following deliverables were submitted to the European Commission:
    Deliverable 1.2: Application Use Cases and Traces
    Deliverable 1.3: Application Distribution Strategy
    Deliverable 1.4: Initial Application Ports

Presentations

    V. Khristenko, CMS Ecal Reconstruction with GPUs (23 October). Presented at CMS ECAL DPG Meeting, Geneva, 2019. cern.ch/go/FC6j
    V. Khristenko, CMS Hcal Reconstruction with GPUs (8 November). Presented at CMS HCAL DPG Meeting, Geneva 2019. cern.ch/go/P6Js
    V. Khristenko, Exploiting Modular HPC in the context of DEEP-EST and ATTRACT projects (22 January). Presented at CERN openlab Technical Workshop, Geneva, 2020. cern.ch/go/rjC7

Kubernetes and Google Cloud

Project goal

The aim of this project is to demonstrate the scalability and performance of Kubernetes and the Google Cloud, validating this set-up for future computing models. As an example, we are using the famous Higgs analysis that led to the 2013 Nobel Prize in Physics, thus also showing that real analysis can be done using CERN Open Data.

R&D topic
Data-centre technologies and infrastructures
Project coordinator(s)
Ricardo Manuel Brito da Rocha
Collaborator liaison(s)
Karan Bhatia, Andrea Nardone, Mark Mims, Kevin Kissell

Collaborators

Project background

As we look to improve the computing models we use in high-energy physics (HEP), this project serves to demonstrate the potential of open and well established APIs, such as Kubernetes. They open up a wide range of new possibilities in terms of how we deploy our workloads.

Based on a challenging and famous use case, we’re working to demonstrate that these new tools —together with the virtually unlimited capacity offered by public cloud providers — make it possible to rethink how analysis workloads can be scheduled and distributed. This could lead to further improvements in the efficiency of our systems at CERN.

The project also provides an excellent opportunity to show how, given enough resources, anyone can replicate important physics analysis work using the open data published by CERN and the LHC experiments.

Recent progress

The initial goal of the project has been fulfilled: we have demonstrated that Kubernetes and the Google Cloud is a viable and extremely performant set-up for running HEP analysis work. The code required, as well as the set-up, is fully documented and publicly available (see publications).

The outcome of the project was presented at a number of high-profile conferences, including a keynote presentation at KubeCon Europe 2019, an event attended by over 8000 people. A live demo of the whole set-up, using data from the CERN Open Data Portal, was shown on stage.

The set-up — as well as the data set used — has been prepared for publication as a Google Cloud official tutorial. This will enable anyone to trigger a similar execution using their own public cloud resources. This tutorial will be published in early 2020, once the text has been finalised.

Next steps

This project was initially self-contained, with a clear target for the presentation at KubeCon Europe 2019. However, the project has now grown beyond this initial, limited scope. Future steps should include:

  • Further investigating how using public cloud can improve physics analysis.
  • Working to provide on-demand, bursting to public cloud capabilities for our on-premise resources.
  • Seeking to understand how we can best define policies and accounting procedures for using public cloud resources in this manner.

 

 

Publications

    R. Rocha, L. Heinrich, higgs-demo. Project published on GitHub. 2019. cern.ch/go/T8QQ

Presentations

    R. Rocha, L. Heinrich, Reperforming a Nobel Prize Discovery on Kubernetes (21 May). Presented at Kubecon Europe 2019, Barcelona, 2019. cern.ch/go/PlC8
    R. Rocha, L. Heinrich, Higgs Analysis on Kubernetes using GCP (19 September). Presented at Google Cloud Summit, Munich, 2019. cern.ch/go/Dj8f
    R. Rocha, L. Heinrich, Reperforming a Nobel Prize Discovery on Kubernetes (7 November). Presented at the 4th International Conference on Computing in High-Energy and Nuclear Physics (CHEP), Adelaide, 2019. cern.ch/go/6Htg
    R. Rocha, L. Heinrich, Deep Dive into the Kubecon Higgs Analysis Demo (5 July). Presented at CERN IT Technical Forum, Geneva, 2019. cern.ch/go/6zls

EOS productisation

Project goal

This project is focused on the evolution of CERN’s EOS large-scale storage system. The goal is to simplify the usage, installation, and maintenance of the system. In addition, the project aims to add native support for new client platforms, expand documentation, and implement new features/integration with other software packages.

R&D topic
Data-centre technologies and infrastructures
Project coordinator(s)
Luca Mascetti
Team members
Elvin Sindrilaru
Collaborator liaison(s)
Gregor Molan, Branko Blagojevic, Ivan Arizanovic, Svetlana Milenkovic

Collaborators

Project background

Within the CERN IT department, a dedicated group is responsible for the operation and development of the storage infrastructure. This infrastructure is used to store the physics data generated by the experiments at CERN, as well as the files of all members of personnel.

EOS is a disk-based, low-latency storage service developed at CERN. It is tailored to handle large data rates from the experiments, while also running concurrent complex production workloads. This high-performance system now provides more than 300 petabytes of raw disks.

EOS is also the key storage component behind CERNBox, CERN’s cloud-synchronisation service. This makes it possible to sync and share files on all major mobile and desktop platforms (Linux, Windows, macOS, Android, iOS), with the aim of providing offline availability to any data stored in the EOS infrastructure.

Recent progress

Comtrade's team continued to acquire further knowledge of EOS, profiting from their visit to CERN and from working side-by-side with members of the development and operations teams. This helped them to improve their work on EOS installation, documentation, and testing.

In particular, a dedicated document describing best practices for operating EOS in large-scale environments was produced, as well as a full-stack virtual environment hosted at Comtrade. This shows the potential of the system when used as a geographically distributed storage system.

Next steps

The project will focus on improving and updating the EOS technical documents, for future administrators and operators. The next main goal is to host dedicated hardware resources at CERN to support prototyping of an EOS-based appliance. This will enable Comtrade to create a first version of a full storage solution and to offer it to potential customers in the future.

In addition, the team will investigate the possibility of developing a native Windows client for EOS.

Publications

    X. Espinal, M. Lamanna, From Physics to industry: EOS outside HEP, Journal of Physics: Conference Series (2017), Vol. 898, https://doi.org/10.1088/1742-6596/898/5/052023. cern.ch/go/7XWH

Presentations

    X. Espinal, M. Lamanna, From Physics to industry: EOS outside HEP, Journal of Physics: Conference Series (2017), Vol. 898, https://doi.org/10.1088/1742-6596/898/5/052023. cern.ch/go/7XWH
    L. Mascetti, Comtrade EOS productization (23 January). Presented at CERN openlab technical workshop, Geneva, 2019. cern.ch/go/W6SQ
    G. Molan, EOS Documentation and Tesla Data Box (4 February). Presented at CERN EOS workshop, Geneva, 2019. cern.ch/go/9QbM
    L. Mascetti, EOS Comtrade project (23 January). Presented at CERN openlab Technical workshop, Geneva, 2020. cern.ch/go/l9gc
    L. Mascetti, CERN Disk Storage Services (3 February 2020). Presented at CERN EOS workshop, Geneva, 2020. cern.ch/go/pF97
    G. Molan, Preparing EOS for Enterprise Users (27 January 2020). Presented at Cloud Storage Services for Synchronization and Sharing (CS3), Copenhagen, 2020. cern.ch/go/tQ7d
    G. Molan, EOS Documentation for Enterprise Users (3 February 2020). Presented at CERN EOS workshop, Geneva, 2020. cern.ch/go/swX8
    G. Molan, EOS Windows Native Client (3 February 2020). Presented at CERN EOS workshop, Geneva, 2020. cern.ch/go/P7DX
    G. Molan, EOS Storage Appliance Prototype (5 February 2020). Presented at CERN EOS workshop, Geneva, 2020. cern.ch/go/q8qh