Modernising code plays a vital role in preparing for future upgrades to the LHC and the experiments. It is essential that software performance is continually increased by making use of modern coding techniques and tools, such as parallel programming languages, portability libraries, etc. It is also important to ensure that software fully exploits the features offered by modern hardware architecture, such as many-core platforms, acceleration coprocessors, and innovative heterogeneous combinations of CPUs, GPUs, FPGAs, or dedicated deep-learning architectures. At the same time, it is of paramount importance that physics performance is not compromised in its drive to ensure maximum efficiency.

 

High-performance distributed caching technologies (DAQDB)

Project goal

We are exploring the suitability of a new infrastructure for key-value storage in the data-acquisition systems of particle-physics experiments. DAQDB (Data Acquisition Database) is a scalable and distributed key-value store that provides low-latency queries. It exploits Intel® Optane™ DC persistent memory, a cutting-edge non-volatile memory technology that could make it possible to decouple real-time data acquisition from asynchronous event selection.

R&D topic
Computing performance and software
Project coordinator(s)
Giovanna Lehmann Miotto
Team members
Adam Abed Abud, Fabrice Le Goff, Lola Stankovic
Collaborator liaison(s)
Claudio Bellini, Grzegorz Jereczek, Jan Lisowiec, Andrea Luiselli, Maciej Maciejewski, Jakub Radtke, Malgorzata Szychowska, Norbert Szulc

Collaborators

Project background

Upgrades to the LHC mean that the data rates coming from the detectors will dramatically increase. Data will need to be buffered while waiting for systems to select interesting collision events for analysis. However, the current buffers at the readout nodes can only store a few seconds of data due to capacity constraints and the high cost of DRAM. It is therefore important to explore new, cost-effective solutions — capable of handling large amounts of data — that capitalise on emerging technologies.

Recent progress

The idea of DAQDB has proven to be very interesting. Nevertheless, the effort required to develop a mature product of this complexity was recognised to be too large for the available resources. Work thus continued in two parallel strands:

1) An evaluation of Distributed Asynchronous Object Storage (DAOS) was started, with the aim of assessing whether the future DAQ needs of large experiments, such as ATLAS and CMS, may be addressed through this underlying technology, combined with a custom software layer.

2) Performance measurements of Intel® Optane™ DC persistent memory devices continued, in the context of data-acquisition systems, achieving very promising results.

Next steps

The DAQDB project concluded in 2020. A possible follow-up activity is being discussed between Intel and experiment representatives; this would focus on the further development and use of DAOS.

Publications

    D. Cicalese et al., The design of a distributed key-value store for petascale hot storage in data acquisition systems. Published in EPJ Web Conf. 214, 2019. cern.ch/go/xf9H
    P. Czarnul, G. Gołaszewski, G. Jereczek, M. Maciejewski, Development and benchmarking a parallel Data AcQuisition framework using MPI with hash and hash+tree structures in a cluster environment. Published at the 19th International Symposium on Parallel and Distributed Computing, 2020. cern.ch/go/lK78
    A. A. Abud, D. Cicalese, G. Jereczek, F. L. Goff, G. L. Miotto, J. Love, M. Maciejewski, R. K. Mommsen, J. Radtke, J. Schmiegel, M. Szychowska, Let’s get our hands dirty: a comprehensive evaluation of DAQDB, key-value store for petascale hot storage. Published at the 24th International Conference on Computing in High Energy and Nuclear Physics, 2020. cern.ch/go/7JGF

Presentations

    M. Maciejewski, Persistent Memory based Key-Value Store for Data Acquisition Systems (25 September). Presented at IXPUG 2019 Annual Conference, Geneva, 2019. cern.ch/go/9cFB
    G. Jereczek, Let's get our hands dirty: a comprehensive evaluation of DAQDB, key-value store for petascale hot storage (5 November). Presented at the 4th International Conference on Computing in High-Energy and Nuclear Physics (CHEP), Adelaide, 2019. cern.ch/go/9cpL8
    J. Radtke, A Key-Value Store for Data Acquisition Systems (April). Presented at SPDK, PMDK and VTune(tm) Summit 04'19, Santa Clara, 2019. cern.ch/go/H6Rl
    G. Jereczek, The design of a distributed key-value store for petascale hot storage in data acquisition systems (12 July). Presented at 23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP), Sofia, 2018. cern.ch/go/6hcX
    J. M. Maciejewski, A key-value store for Data Acquisition Systems (12 September). Presented at ATLAS TDAQ week, Cracow, 2018.
    G. Jereczek, M. Maciejewski, Data Acquisition Database (12 November). Presented at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, 2018.
    M. Maciejewski, J. Radtke, The Design of Key-Value Store for Data Acquisition Systems (5 December). Presented at NMVe Developer Days, San Diego, 2018.
    P. Czarnul, G. Gołaszewski, G. Jereczek, M. Maciejewski, Development and benchmarking a parallel Data AcQuisition framework using MPI with hash and hash+tree structures in a cluster environment (5-8 July). Presented at the 19th International Symposium on Parallel and Distributed Computing, Warsaw, 2020. cern.ch/go/W9zL
    A. A. Abud, Experience and performance of persistent memory for the DUNE data acquisition system (12-24 October). Presented at the 22nd virtual IEEE Realtime Conference, 2020. cern.ch/go/P8sD

Testbed for GPU-accelerated applications 

Project goal

The goal of this project is to adapt computing models and software to exploit fully the potential of GPUs. The project, which began in late 2018, consists of ten individual use cases.

The technical coordinators are follows:

Andrea Bocci, Felice Pantaleo, Maurizio Pierini, Federico Carminati, Vincenzo Innocente, Marco Rovere, Jean-Roch Vlimant, Vladimir Gligorov, Daniel Campora, Riccardo De Maria, Adrian Oeftiger, Lotta Mether, Ian Fisk, Lorenzo Moneta, Jan Kieseler, and Sofia Vallecorsa.

R&D topic
Computing performance and software
Project coordinator(s)
Maria Girone
Team members
Mary Touranakou, Thong Nguyen, Javier Duarte, Olmo Cerri, Roel Aaij, Dorothea Vom Bruch, Blaise Raheem Delaney, Ifan Williams, Niko Neufeld, Viktor Khristenko, Florian Reiss, Guillermo Izquierdo Moreno, Luca Atzori, Miguel Fontes Medeiros
Collaborator liaison(s)
Cosimo Gianfreda (E4), Daniele Gregori (E4), Agnese Reina (E4), Piero Altoé (NVIDIA), Andreas Hehn (NVIDIA), Tom Gibbs (NVIDIA)

Collaborators

Project background

Heterogeneous computing architectures will play an important role in helping CERN address the computing demands of the HL-LHC.

Recent progress

This CERN openlab project supports several use cases at CERN. This section outlines the progress made in the two main use cases that were worked on in 2020.

Allen: a high-level trigger on GPUs for LHCb

Allen’ is an initiative to develop a complete high-level trigger (the first step of the data-filtering process following particle collisions) on GPUs for the LHCb experiment. It has benefitted from support through CERN openlab, including consultation from engineers at NVIDIA.

The new system processes 40 Tb/s, using around 350 of the latest generation NVIDIA GPU cards. Allen matches — from a physics point of view — the reconstruction performance for charged particles achieved on traditional CPUs. It has also been shown that Allen will not be I/O or memory limited. Plus, not only can it be used to perform reconstruction, but it can also take decisions about whether to keep or reject events.

A diverse range of algorithms have been implemented efficiently on Allen. This demonstrates the potential for GPUs not only to be used as accelerators, but also as complete and standalone data-processing solutions.

In May 2020, Allen was adopted by the LHCb collaboration as the new baseline first-level trigger for Run 3. The Technical Design Report for the system was approved in June. From the start, Allen has been designed to be a framework that can be used in a general manner for high-throughput GPU computing. A workshop was held with core Gaudi developers and members of the CMS and ALICE  experiments to discuss how best to integrate Allen into the wider software ecosystem beyond the LHCb experiment. The LHCb team working on Allen is currently focusing on commissioning the system for data-taking in 2022 (delayed from the original 2021 start date due to the COVID-19 pandemic). A readiness review is taking place in the first half of 2021.

End-to-end multi-particle reconstruction for the HGCal based on machine learning

The CMS High-Granularity Calorimeter (HGCal) will replace the end-cap calorimeters of the CMS detector for the operation of the High-Luminosity LHC. With about 2 million sensors and high lateral and transversal granularity, it provides huge potential for new physics discoveries. We aim to exploit this using end-to-end optimisable graph neural networks.

Profiting from new machine-learning concepts developed within the group and the CERN openlab collaboration with the Flatiron Institute in New York, US, we were able to develop and train a first

prototype for directly reconstructing incident particle properties from raw detector hits. Through our direct contact with NVIDIA, we were able to implement custom tensorflow GPU kernels. Together with the dedicated neural network structure, these enabled us to process the hits of an entire particle-collision event in one go on the GPU.

Next steps

Work related to each of the project’s use cases will continue in 2021.


Presentations

    A. Bocci, Towards a heterogeneous High Level Trigger farm for CMS (13 March). Presented at ACAT2019, Saas Fee, 2019. cern.ch/go/D9SF
    F. Pantaleo, Patatrack: accelerated Pixel Track reconstruction in CMS (2 April). Presented at Connecting the Dots 2019, Valencia, 2019. cern.ch/go/7D8W
    R. Kansal, Deep Graph Neural Networks for Fast HGCAL Simulation (13 August). Presented at CERN openlab summer-student lightning talk session, Geneva, 2019. cern.ch/go/qh6G
    A. Bocci, Heterogeneous reconstruction: combining an ARM processor with a GPU (4 November). Presented at CHEP2019, Adelaide, 2019. cern.ch/go/7bmH
    A. Bocci, Heterogeneous online reconstruction at CMS (7 November). Presented at 24th International Conference on Computing in High-Energy and Nuclear Physics (CHEP) 2019, Adelaide, 2019. cern.ch/go/l9JN