Evaluation of Power CPU Architectures for Deep Learning

Project Goal

We are investigating the performance of distributed training and inference of different deeplearning models on a cluster consisting of IBM Power8 CPUs (with NVIDIA V100 GPUs) installed at CERN. A series of deep neural networks is being developed to reproduce the initial steps in the data-processing chain of the DUNE experiment. In order to do so we have investigated how to adapt computer vision techniques to detector data analysis. More specifically, a combination of convolutional neural networks and graph neural networks are being designed for various tasks in the data processing chain of neutrino experiments:  reducing noise, selecting specific portions of the data to focus on during the reconstruction step (region selector) and clustering the detector output into particle trajectories.

Background

Neutrinos are elusive particles: they have a very low probability of interacting with other matter. In order to maximise the likelihood of detection, neutrino detectors are built as large, sensitive volumes. Such detectors produce very large data sets. Although large in size, these data sets are usually very sparse, meaning dedicated techniques are needed to process them efficiently. Deeplearning methods are being investigated by the community with great success. 

Progress & Achievements

We have developed a series of deep neural network architectures based on a combination of  twodimensional convolutional layers and graphs, including a model inspired by the popular U-Net architecture. These networks can  analyse both real and simulated data from protoDUNE and perform region selection,  de-noising and clustering  tasks, which are usually applied to the raw detector data before any other processing is run.

All of these methods improve on the classical approaches currently integrated in the experiment software stack and they are implemented using a platform agnostic format, ONNX Runtime, which allows the model to run on different hardware.  In particular, in order to reduce training time and set up hyper-parameter scans, the training process for the networks is parallelised and has been benchmarked on the IBM Minsky cluster.

In accordance with the concept of data-parallel distributed learning, we trained our models on a total of twelve GPUs, distributed over the three nodes that comprise the test Power cluster. Each GPU ingests a unique part of the physics dataset for training the model.

This project has come to a conclusion in March 2023.

 

Project Coordinator: Maria Girone, Sofia Vallecorsa

Technical Team: Marco Rossi

IBM Collaboration Liaisons: Eric Aquaronne, Oliver Bethmann

In partnership with: IBM