In collaboration with Intel, the CERN openlab Platform Competence Centre (PCC) continues to focus on hardware efficiency, software optimisations and performance measurements, as well as acceleration. This year again, the strong emphasis on teaching and knowledge dissemination has enabled a broader audience to enjoy the fruits of the PCC’s work. 2013 also saw the start of ICE-DIP, the Intel- CERN European Doctorate Industrial Program, which is a European industrial doctorate scheme hosted by CERN and Intel Labs Europe with two Early Stage Researchers (ESRs) joining the PCC. Overall, the last 12 months constituted yet another year of intensive studies and development with tangible effects.
Several changes in the staffing of the PCC team occurred in 2013, which was indeed a dynamic year of growth. In January, Andrzej Nowak took over the PCC leadership responsibilities. At the end of her contract, Mirela-Madalina Botezatu departed for an industrial PhD at ETH Zurich with a major IT company. Liviu Valsan, the PCC hardware manager, joined the CERN IT procurement team, and was replaced by Pawel Szostek, a fellow with strong experience in software and data mining. Out of the five researchers recruited in 2013 for the ICE-DIP project, two are part of the PCC: Aram Santogidis and Przemyslaw Karpinski. Finally, Georgios Bitzes was hired as a Technical Student.
Benchmarking and evaluations
2013 saw the release of the dual socket Intel® Xeon® E5-2600v2 family platform ‘Ivy Bridge’, as well as the release of the ‘Haswell’ microarchitecture from Intel. The PCC team manually upgraded around 40 openlab systems to the new Intel Xeon E5-2695v2 processors, benchmarked the dual-socket platforms and disseminated its experience at a major physics conference. The paper was presented at the International Conference on Computing in High Energy and Nuclear Physics (CHEP) 2013, held in Amsterdam, Netherlands, in October. In the same paper, the team also included experimental results from a desktop system equipped with an Intel® Xeon® E3-1280 v3 CPU, a representative of the new ‘Haswell’ microarchitecture whose most interesting feature is the addition of the Advanced Vector Extensions 2 (AVX2). Most notably, this new generation of processors is capable of single instruction multiply and adds a highly valued function in many physics calculations. A separate, comprehensive report covering this feature, as well as vectorization opportunities and other performance benefits brought by AVX2, was published on the CERN openlab website and is listed in the Education section of this annual report.
Low-power computing is still a point of high importance on the agenda, which was a reason for the team to take interest in a new pre-production platform from the Intel® Atom™ family (‘Avoton’). The impressive performance-per-Watt results obtained on single-blade servers were seen as an optimistic message from the semiconductor world, and the team is looking forward to reproduce them in larger, full-chassis configurations throughout 2014.
A major part of the PCC efforts is dedicated to software, in particular the evaluation of new software versions or new methodologies. In preparation for new capabilities of platforms, the standard benchmark portfolio has been completely revised and new micro benchmarks have been developed. MLFit, the highly parallel prototype data-analysis application underwent a major review in collaboration with an expert from the CERN Physics department. The new, leaner version sacrifices some generality for much improved scalability and is now used in addition to its predecessor.
The multi-threaded Geant4 prototype, on which the team has collaborated with the authors since 2008, saw the light of day with a major Geant4 release towards the end of 2013. It offers significant memory savings in multi-threaded mode, as well as the possibility of running on the Intel®Xeon Phi™ coprocessor. Several scientific communities are already making use of this new functionality.
The PCC also participated in some of the CERN Physics department’s work on Geant V, which is the porting of Geant paradigms and logic to a vector-compatible prototype. With the aim of achieving high throughput even at the lowest levels of the architecture, the project has already demonstrated good speed-ups. This activity was carried out as part of the PCC’s close involvement in the Concurrency Forum, which is a virtual organisation consisting of High Energy Physics stakeholders. Its aim is to ensure that the latest developments in computing architectures are exploited to the maximum extent.
Advanced development and tuning work on large software packages would not be possible without the right tools. In 2013, the pilot compiler project established by the PCC enjoyed another year of growth, as Intel tools made available CERN-wide gained more regular users. Amongst other publicly available software, the PCC also maintains its own builds of the Linux kernel. The package offers specially patched tools for cutting-edge performance tuning, to which the PCC contributes in collaboration with key industry players, as well as sophisticated satellite libraries. It was also reported that these custom kernel builds showed improved double-digit performance on some experimental frameworks. To further support tuning activities, the PCC prepared a report covering the performance impact of various common performance-measurement methods. Along these lines, the PCC also started new work on its own performance analysis packages and disassembly interfaces, where custom measurement tools can now be scripted in Python with ease, and perform better than those commonly available.
In another software-related highlight, the PCC team evaluated Cilk Plus, a promising parallelisation technology. It is currently an extension of the Intel compiler, and is being implemented in the Gnu C Compiler (GCC). Cilk Plus offers simple, yet powerful and much awaited parallelisation aids on two levels: thread (or task) parallelism is implemented using a spawn function and elemental functions, while data parallelism is expressed through familiar Matlab- or Python-like syntax. The results of the evaluation are contained in a report published on the CERN openlab website and listed in the Education section of his annual report. While programmability is apparently very high, some improvements on the performance front can still be made.
Intel Xeon Phi coprocessor
The PCC activities focusing on the Intel Xeon Phi started in 2008 and have borne plenty of fruit in 2013. The hardware laboratory now boasts 16 Xeon Phi systems, made available to active collaborators from CERN and its experiments. For example, the PCC was consulted by the scientists of the Alpha Magnetic Spectrometer (now launched into space) to double the performance of some test benchmarks. The multi-threaded Geant4 prototype was benchmarked in collaboration with the Geant4 team, delivering very good scalability. The outcome of these five years of the PCC’s experience with the Intel Xeon Phi were summarised at the Annual Meeting of the Concurrency Forum at Fermilab, Chicago, USA, in February, and later in a comprehensive overview paper published at CHEP 2013.
The PCC remains very active on the educational front: in 2013 alone, nine events were organised. The traditional workshops covering ‘Parallelism, Compilers and Performance’ as well as ‘Numerical Computing’ attracted well over 100 attendees and multiple expert speakers. In addition, two special workshops took place: one focused on the ‘Intel Xeon Phi Platform’, and another on ‘Advanced Performance Tuning’. The latter event lasted three days and welcomed world-class speakers from major industry players, as well as CERN experts (full list available in the Education section of this annual report).
As in previous years, the CERN School of Computing (CSC) invited the PCC to deliver a teaching programme covering Computer Architecture and Performance Tuning. Given the popularity of the topic and the expanding knowledge of the CSC audience, the PCC co-organised the first thematic CSC as a related (but new) activity. In this week-long school, which took place in Split, Croatia, in June, 18 highly qualified participants were trained on advanced technologies and practices for efficient computing – including acceleration and co-processing. At the International Symposium on Computer Architecture, a major international computer science conference in Tel Aviv, Israel, the PCC collaborated with Intel to share its experience concerning the performance tuning of large workloads. The PCC expertise is well recognised by the IT sector, but also by a larger spectrum of industries: for instance, a large organisation from the finance industry expressed strong interest in the expertise developed by the team, which resulted in the organisation of a special training session. Overall, about two dozen guests from various institutions visited the PCC over the course of the year, offering public talks and expertise, while taking back home observations and feedback.
Previous activities for the Platform Competence Centre covering 2012 are available here. Further content can be found archived on the previous phases' website here