Dynamical Exascale Entry Platform – Extreme Scale Technologies (DEEP-EST)

Project goal

The main focus of the project is to build a new kind of a system that makes it possible to run efficiently a wide range of applications — with differing requirements — on new types of high-performance computing (HPC) resources. From machine-learning applications to traditional HPC applications, the goal is to build an environment which is capable of accommodating workloads that pose completely different challenges for the system.

R&D topic
Data-centre technologies and infrastructures
Project coordinator(s)
Maria Girone (for DEEP-EST task 1.7)
Team members
Viktor Khristenko (for DEEP-EST task 1.7)

Collaborators

Project background

DEEP-EST is a project funded by the European Commission that launched in 2017, following on from the successful DEEP and DEEP-ER projects. The project involves 27 partners in more than 10 countries and is coordinated from Jülich Supercomputing Centre at Forschungszentrum Jülich in Germany.

Overall, the goal is to create a modular supercomputer that best fits the requirements of diverse, increasingly complex, and newly emerging applications. The innovative modular supercomputer architecture creates a unique HPC system by coupling various compute modules according to the building-block principle: each module is tailored to the needs of a specific group of applications, with all modules together behaving as a single machine.

Specifically, the prototype consists of three compute modules: the cluster module (CM), the extreme scale booster (ESB), and the data-analytics module (DAM).

  • Applications requiring high single-thread performance are targeted to run on the CM nodes, where Intel® Xeon® Scalable Processor (Skylake) provide general-purpose performance and energy efficiency.
  • The architecture of ESB nodes was tailored for highly scalable HPC software stacks capable of extracting the enormous parallelism provided by Nvidia V100 GPUs.
  • Flexibility, large memory capacities using Intel® Optane™ DC persistent memory and different acceleration capabilities (provided by Intel Stratix 10 and Nvidia V100 GPU on each node) are key features of the DAM; they make it an ideal platform for data-intensive and machine-learning applications.

CERN, in particular the CMS Experiment, participates by providing one of the applications that are used to evaluate this new supercomputing architecture.

Recent progress

We ported the software used at CMS for the reconstruction of particle collision events for hadronic and electromagnetic calorimeters. We then optimised these workloads for running on Nvidia V100 GPUs, comparing performance against the CPU-based systems currently used in production.

Next steps

We will incorporate MPI offloading into the ‘CMSSW’ software framework used at the CMS experiment, in order to be able to run different parts of the reconstruction on different hardware. We will also explore the use of FPGAs for reconstruction at CMS. Finally, we will explore the possibility of capitalising on new and faster memory interconnection between CPUs and accelerators, using Intel® OneAPI  for development and debugging.

Publications

    The following deliverables were submitted to the European Commission:
    Deliverable 1.2: Application Use Cases and Traces
    Deliverable 1.3: Application Distribution Strategy
    Deliverable 1.4: Initial Application Ports

Presentations

    V. Khristenko, CMS Ecal Reconstruction with GPUs (23 October). Presented at CMS ECAL DPG Meeting, Geneva, 2019. cern.ch/go/FC6j
    V. Khristenko, CMS Hcal Reconstruction with GPUs (8 November). Presented at CMS HCAL DPG Meeting, Geneva 2019. cern.ch/go/P6Js
    V. Khristenko, Exploiting Modular HPC in the context of DEEP-EST and ATTRACT projects (22 January). Presented at CERN openlab Technical Workshop, Geneva, 2020. cern.ch/go/rjC7