Dynamical Exascale Entry Platform – Extreme Scale Technologies (DEEP-EST)

Project goal

The main focus of the project is to build a new kind of a system that makes it possible to run efficiently a wide range of applications — with differing requirements — on new types of high-performance computing (HPC) resources. From machine-learning applications to traditional HPC applications, the goal is to build an environment which is capable of accommodating workloads that pose completely different challenges for the system.

Collaborators:

DEEP-EST collaboration

R&D topic
Data-centre technologies and infrastructures
Project coordinator(s)
Maria Girone (for DEEP-EST task 1.7)
Team members
Viktor Khristenko (for DEEP-EST task 1.7)

Collaborators

Project background

DEEP-EST is an EC-funded project that launched in 2017, following on from the successful DEEP and DEEP-ER projects. The project involves 27 partners in more than 10 countries and is coordinated from Jülich Supercomputing Centre at Forschungszentrum Jülich in Germany.

Overall, the goal is to create a modular supercomputer that best fits the requirements of diverse, increasingly complex, and newly emerging applications. The innovative modular supercomputer architecture creates a unique HPC system by coupling various compute modules according to the building-block principle: each module is tailored to the needs of a specific group of applications, with all modules together behaving as a single machine.

Specifically, the prototype consists of three compute modules: the cluster module (CM), the extreme scale booster (ESB), and the data-analytics module (DAM).

  • Applications requiring high single-thread performance are targeted to run on the CM nodes, where Intel® Xeon® Scalable Processor (Skylake) provide general-purpose performance and energy efficiency.
  • The architecture of ESB nodes was tailored for highly scalable HPC software stacks capable of extracting the enormous parallelism provided by Nvidia V100 GPUs.
  • Flexibility, large memory capacities using Intel® Optane™ DC persistent memory and different acceleration capabilities (provided by Intel Stratix 10 and Nvidia V100 GPU on each node) are key features of the DAM; they make it an ideal platform for data-intensive and machine-learning applications.

CERN, in particular the CMS Experiment, participates by providing one of the applications that are used to evaluate this new supercomputing architecture.

Recent progress

During the past year, the focus was on testing the functionality developed during the previous year using the full DEEP-EST prototype system and optimising for the needs of the CMS Experiment. In particular, the tests we ran included more than 1000 cores and just under 100 Nvidia GPUs simultaneously, close to 100% of the available computing resources for our prototype. Furthermore, tests included not only traditional CMS Experiment reconstruction, but also a complex deep-learning pipeline. The workflow was used to benchmark the usage of Nvidia GPUs with MPI (over InfiniBand) for the purpose of distributed training of a model based on Graph Neural Networks.

Next steps

The work is ongoing to integrate the usage of Message Passing Interface (MPI) to add the ability to use not only GPUs available on the node that is performing data processing, but also on the remotely connected nodes. This will allow more efficient utilisation of heterogeneous resources, providing an overall speed up (i.e. higher throughput) of the CMS reconstruction application. The DEEP-EST project came to a successful conclusion during the first quarter of 2021.

Publications

    The following deliverables were submitted to the European Commission:
    Deliverable 1.2: Application Use Cases and Traces
    Deliverable 1.3: Application Distribution Strategy
    Deliverable 1.4: Initial Application Ports
    M. Girone, Common challenges for HPC integration into LHC computing. Published on ZENODO, 2020. cern.ch/go/7bt9
    The following deliverable was submitted to the European Commission:
    Deliverable 1.5: Final report on application experience

Presentations

    V. Khristenko, CMS Ecal Reconstruction with GPUs (23 October). Presented at CMS ECAL DPG Meeting, Geneva, 2019. cern.ch/go/FC6j
    V. Khristenko, CMS Hcal Reconstruction with GPUs (8 November). Presented at CMS HCAL DPG Meeting, Geneva 2019. cern.ch/go/P6Js
    V. Khristenko, Exploiting Modular HPC in the context of DEEP-EST and ATTRACT projects (22 January). Presented at CERN openlab Technical Workshop, Geneva, 2020. cern.ch/go/rjC7