Modernising code plays a vital role in preparing for future upgrades to the LHC and the experiments. It is essential that software performance is continually increased by making use of modern coding techniques and tools, such as parallel programming languages, portability libraries, etc. It is also important to ensure that software fully exploits the features offered by modern hardware architecture, such as many-core platforms, acceleration coprocessors, and innovative heterogeneous combinations of CPUs, GPUs, FPGAs, or dedicated deep-learning architectures. At the same time, it is of paramount importance that physics performance is not compromised in its drive to ensure maximum efficiency.

 

Testbed for GPU-accelerated applications 

Project goal

The goal of this project is to adapt computing models and software to exploit fully the potential of GPUs. The project, which began in late 2018, consists of ten individual use cases. The project, which began in late 2018, consists of ten individual use cases.

The technical coordinators are follows:

Andrea Bocci, Felice Pantaleo, Maurizio Pierini, Federico Carminati, Vincenzo Innocente, Marco Rovere, Jean-Roch Vlimant, Vladimir Gligorov, Daniel Campora, Riccardo De Maria, Adrian Oeftiger, Lotta Mether, Ian Fisk, Lorenzo Moneta, Sofia Vallecorsa.

R&D topic
Computing performance and software
Project coordinator(s)
Maria Girone
Team members
Mary Touranakou, Thong Nguyen, Javier Duarte, Olmo Cerri, Jan Kieseler, Roel Aaij, Dorothea Vom Bruch, Blaise Raheem Delaney, Ifan Williams, Niko Neufeld, Viktor Khristenko, Florian Reiss, Guillermo Izquierdo Moreno, Luca Atzori, Miguel Fontes Medeiros
Collaborator liaison(s)
Cosimo Gianfreda (E4), Daniele Gregori (E4), Agnese Reina (E4), Piero Altoé (NVIDIA), Andreas Hehn (NVIDIA), Tom Gibbs (NVIDIA).

Collaborators

Project background

Heterogeneous computing architectures will play an important role in helping CERN address the computing demands of the HL-LHC.

Recent progress

This section outlines the progress made in the six main use cases that were worked on in 2019, related to computing performance and software, as well as machine learning and data analytics.

1. Simulation of sparse datasets for realistic detector geometries

We are working to generate realistic detector geometries using deep generative models, such as adversarially trained networks or variational autoencoders. To this end, we are investigating custom loss functions able to deal with the specificity of LHC data. We plan to optimise the model inference on GPUs, with a view to delivering a production-ready version of the model to the LHC experiments.

Work got underway in late 2019. We began designing and training the model on two benchmark datasets (the Modified National Institute of Standards and Technology dataset and the LHC Jet dataset) before moving to the actual problem: generation of detector hits in a realistic setup. Once we have converged on the full model design and the custom setup (loss function, etc.), we plan to move to a realistic dataset and scale up the problem.

2.Patatrack’ software R&D incubator

The Patatrack initiative is focused on exploiting new hardware and software technologies for sustainable computing at the CMS experiment. During 2019, the Patatrack team demonstrated that it is possible to run some of its particle-collision reconstruction algorithms on NVIDIA GPUs. Doing so led to a computing performance increase of an order of magnitude.

The algorithms were initially developed using NVIDIA’s CUDA platform. They were then ported on an ad hoc basis to run on conventional CPUs, with identical results and close to native performance. When run on an NVIDIA Tesla T4 GPU, the algorithm achieves twice the performance compared to when it is run on a full dual socket Intel Xeon Skylake Gold node.

Performance portability will be explored during 2020. A comparison of the tested solutions — in terms of supported features, performance, ease of use, integration in wider frameworks, and future prospects — will also be carried out.

3. Benchmarking and optimisation of TMVA deep learning

ROOT is an important data-analysis framework used at CERN. This use case is focused on optimising training and evaluating the performance of the ROOT’s TMVA (Toolkit for Multivariate Data Analysis) on NVIDIA GPUs.

During 2019, thanks to the contribution of Joanna Niermann (a student supported by CERN openlab), a new implementation of the convolution operators of the TMVA was performed. This made use of NVIDIA’s cuDNN library and led to a significant boost in performance when training or evaluating deep-learning models. 

We also performed a comparison study with Keras and TensorFlow. This showed better computational performances for the TMVA implementation, especially when using smaller models. Our new implementation was released in the new ROOT version 6.20.

In addition, we developed new GPU implementations using cuDNN for recurrent neural networks. These also showed very good computational performance and will be integrated into the upcoming ROOT version 6.22.   

4. Distributed training

There is a growing need within the high-energy physics community for an HPC-ready turnkey solution for distributed training and optimisation of neural networks. We aim to build software with a streamlined user interface to federate various available frameworks for distributed training. The goal is to bring the most performance to the end user, possibly through a ‘training-as-a-service’ system for HPC.

In 2019, we prepared our software for neural-network training and optimisation. User-friendly and HPC-ready, it has been trialled at multiple institutions. There are, however, outstanding performance issues to be ironed out. We will also work in 2020 to build a community around our final product.

5. Integration of SixTrackLib and PyHEADTAIL

For optimal performance and hardware-utilisation, it is crucial to share the particle state in-place between the codes SixTrackLib (used for tracking single particles following collisions) and PyHEADTAIL (used for simulating macro-particle beam dynamics). This helps to avoid the memory and run-time costs of maintaining two copies on the GPU. The current implementation allows this by virtue of implicit context sharing, enabling seamless hand-off of control over the shared state between the two code-bases. After a first proof-of-concept implementation was created at an E4-NVIDIA hackathon in April 2019, the solution was refined and the API of the libraries was adapted to support this mode of operation.

Work carried out within this project made it possible for PyHEADTAIL to rely on SixTrackLib for high-performance tracking on GPUs, resulting in performance improvements of up to 2-3 orders of magnitude compared to state-of-the-art single-threaded CPU-based code. We also exposed SixTrackLib to new applications and use-cases for particle tracking, which led to several improvements and bug fixes.

We are now working on further optimisation, as well as extending it to new applications. We will also work to expand the user community in 2020.

6. Allen: a high-level trigger on GPUs for LHCb

‘Allen’ is an initiative to develop a complete high-level trigger (the first step of the data-filtering process following particle collisions) on GPUs for the LHCb experiment. It has benefitted from support through CERN openlab, including consultation from engineers at NVIDIA.

The new system processes 40 Tb/s, using around 500 of the latest generation NVIDIA GPU cards. Allen matches — from a physics point of view — the reconstruction performance for charged particles achieved on traditional CPUs. It has also been shown that Allen will not be I/O or memory limited. Plus, not only can it be used to perform reconstruction, but it can also take decisions about whether to keep or reject events.

A diverse range of algorithms have been implemented efficiently on Allen. This demonstrates the potential for GPUs not only to be used as ‘accelerators’ in high-energy physics, but also as complete and standalone data-processing solutions.

Allen is now in the final stages of an LHCb collaboration review to decide whether it will be used as the new baseline solution for the next run of the LHC.

Next steps

As outlined above, work related to each of these use cases will continue in 2020. Work will also begin on a number of new use cases.


Presentations

    A. Bocci, Towards a heterogeneous High Level Trigger farm for CMS (13 March). Presented at ACAT2019, Saas Fee, 2019. cern.ch/go/D9SF
    F. Pantaleo, Patatrack: accelerated Pixel Track reconstruction in CMS (2 April). Presented at Connecting the Dots 2019, Valencia, 2019. cern.ch/go/7D8W
    R. Kansal, Deep Graph Neural Networks for Fast HGCAL Simulation (13 August). Presented at CERN openlab summer-student lightning talk session, Geneva, 2019. cern.ch/go/qh6G
    A. Bocci, Heterogeneous reconstruction: combining an ARM processor with a GPU (4 November). Presented at CHEP2019, Adelaide, 2019. cern.ch/go/7bmH
    A. Bocci, Heterogeneous online reconstruction at CMS (7 November). Presented at 24th International Conference on Computing in High-Energy and Nuclear Physics (CHEP) 2019, Adelaide, 2019. cern.ch/go/l9JN

High-performance cloud caching technologies

Project goal

We’re exploring the suitability of a new infrastructure for key-value storage in the data-acquisition systems of particle-physics experiments. DAQDB (Data Acquisition Database) is a scalable and distributed key-value store that provides low-latency queries. It exploits Intel Optane DC Persistent Memory, a cutting-edge non-volatile memory technology that could make it possible to decouple real-time data acquisition from asynchronous event selection.

R&D topic
Computing performance and software
Project coordinator(s)
Giovanna Lehmann Miotto
Team members
Danilo Cicalese, Fabrice Le Goff, Jeremy Love, Remigius K Mommsen
Collaborator liaison(s)
Grzegorz Jereczek, Maciej Maciejewski, Jakub Radtke, Jakub Schmiegel, Malgorzata Szychowska, Aleksandra Jereczek, Adrian Pielech, Claudio Bellini

Collaborators

Project background

Upgrades to the LHC mean that the data rates coming from the detectors will dramatically increase. Data will need to be buffered while waiting for systems to select interesting collision events for analysis. However, the current buffers at the readout nodes can only store a few seconds of data due to capacity constraints and high cost of DRAM. It is therefore important to explore new, cost-effective solutions — capable of handling large amounts of data — that capitalise on emerging technologies.

Recent progress

During 2018, we worked to assess the potential of this new approach. We collaborated closely with Intel on this, coordinating with them on key design choices for the project.

Dedicated servers are needed to make use of the newest hardware, such as persistent memory and NVMe SSDs. We set up the new hardware at CERN and integrated it into the existing data-acquisition software platforms for the ATLAS experiment. We then tested DAQDB thoroughly, providing feedback to the developers.

In addition, we explored a range of alternative solutions for modifying the current ATLAS data-acquisition dataflow and integrating the key-value store. We successfully integrated the data-acquisition software and were also able to separate the readout from the storage nodes, thus making them independent.

Next steps

In the first part of 2019, we will evaluate the system’s performance in a small-scale test. We will then focus our attention on assessing and improving performance in a more realistic scenario, with a large-scale experiment. We will test Intel® Optane™ DC memory technology and Intel® Optane™ DC SSD. In parallel, we will begin work to integrate the system into other experiments, such as CMS and ProtoDUNE.

 

 


Presentations

    G. Jereczek, The design of a distributed key-value store for petascale hot storage in data acquisition systems (12 July). Presented at 23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP), Sofia, 2018. http://cern.ch/go/6hcX
    M. Maciejewski, A key-value store for Data Acquisition Systems (12 September). Presented at ATLAS TDAQ week, Cracow, 2018.
    G. Jereczek, M. Maciejewski, Data Acquisition Database (12 November). Presented at The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), Dallas, 2018.
    M. Maciejewski, J. Radtke, The Design of Key-Value Store for Data Acquisition Systems (5 December). Presented at NMVe Developer Days, San Diego, 2018.

Code modernisation

Project goal

Through the fast-simulation project, we're working to develop the next-generation simulation software used for describing the passage of particles through matter. We aim to recast classical particle-transport simulation in a form that enhances both code- and data-locality and its vectorisation potential, using instruction-level parallelism to improve performance. Another important goal is to integrate generic fast-simulation techniques based on machine-learning approaches.

R&D topic
Computing performance and software
Project coordinator(s)
Federico Carminati
Team members
Andrei Gheata, Andrea Luiselli (Intel), Sofia Vallecorsa, Andrea Zanetti (Intel)
Collaborator liaison(s)
Claudio Bellini, Laurent Duhem

Collaborators

Project background

The main driving force behind this work is the LHC experiments’ vital need to increase the throughput of their simulated data samples for the HL-LHC era. Conservative projections suggest that simulation needs are likely to increase by a factor of ten compared to today. We expect that a speed-up of a factor of up to five can be achieved through code modernisation, with the additional speed-up being driven by fast-simulation approaches. Our research into new vectorisation techniques and the development of vectorised modules also has benefits in many other areas of computing for high-energy physics beyond simulation.

Recent progress

During 2017, we prototyped a new scheduling approach that splits the stepping procedure for particle tracks into stages, accumulating several particles before actually performing the actions involved at each stage. This approach makes it possible to vectorise additional components of the framework, such as magnetic-field propagation and the physics models. This new version has a much smaller memory footprint, is topology aware, and can be configured to benefit from memory locality. In addition, we carried out work to vectorise the modules for magnetic field propagation, as well as those for geometry navigation.

Next steps

Development work on technologies to accelerate physics simulation workloads will continue; this work is being pursued as part of an international collaboration and is one of the main R&D activities of the ‘software development for experiments’ group at CERN’s Experimental Physics department. The project is now targeting possible applications in the existing LHC experiment frameworks, as well as extension to use cases in other fields of scientific research, such as image analysis or biological simulation.

 

 

Publications

    G. Amadio et al., GeantV alpha release, Proc. Advanced Computing and Analysis Techniques in Physics Research, Seattle, USA, 2017. cern.ch/go/67dM

Presentations

    F. Carminati, A. Gheata and S. Vallecorsa, The GeantV prototype on KNL 2017 Intel Xeon Phi User's Group (IXPUG) (11 March), Presented at Annual Spring Conference, Cambridge, 2017. cern.ch/go/D7sG