Several changes in the staffing of the PCC team occurred during the year. Mirela-Madalina Botezatu joined the Intel PCC team as a technical student in February. Thomas Bach, a CERN technical student, joined the PCC in October in order to compose his Master thesis. Julien Leduc, who had come to the end of his three-year openlab fellowship, accepted a Staff position in the Data Storage and Services (DSS) group in the IT Department. Liviu Vâlsan, who was recruited to replace Julien in his CERN openlab position, came with a broad portfolio of skills from the ATLAS online computing group and was able to integrate smoothly into the PCC team thanks to his extensive experience. Finally, Alfio Lazzaro, who had been a CERN openlab fellow for almost three years, specialising in software parallelisation techniques, accepted a position with CRAY at the Swiss Supercomputer Centre in Lugano (CSCS) at the end of his CERN openlab contract. On the Huawei PCC side, Maitane Zotes Resines was hired to work on the joint project, supervised at CERN by Alberto Pace and Dirk Düllmann, while joined on her daily work by Huawei engineers visiting CERN for periods of several months.
Benchmarking and evaluation
The team successfully benchmarked both the Intel Sandy Bridge (Intel Xeon processor E5-2600 family) dual socket server and the four?socket one (with E5-4600 processors). Both reports highlighted the improvements compared to previous generations of a compatible nature and were published on the CERN openlab new website. With the emergence of the Intel® Xeon Phi™ as a mainstream accelerated compute platform, the PCC also focused on other power efficient compute solutions.
The CERN openlab cluster was enhanced with 80 Intel Xeon E5-2600 processor-based servers (at various frequencies) allowing the PCC team to offer state-of-the-art equipment to the workshop participants and collaborators seeking to do research and run benchmarks on the most modern hardware available. These Xeon processor systems are the first generation to offer the Advanced Vector eXtensions with 256-bit wide vectors – a doubling compared to previous systems. The PCC team also maintained a set of individual servers, ranging from low power Intel® Atom™ Boards to “enterprise” servers.
In November Intel announced the availability to the Xeon Phi co-processor, previously known as the “MIC” (Many Integrated Cores) accelerator. The openlab team has been an active partner in its development for many years and proposed multiple improvements, ranging from enhanced double-precision support in the vector-based instruction set, Linux as the operating system on the card hosting the co-processor, and the use of open-source software as broadly as possible. The team evaluated multiple versions of the Phi processor (pre-alpha, alpha, and beta) and obtained interesting results for the applications that vectorised well. This was true, for instance, for the MLfit application mentioned on page 37, but also a trackfitting prototype developed for heavy-ion experiments, such as ALICE. During the year, the PCC team gradually opened up the access to several physics groups who wanted to gain experience on this new number-crunching platform.
A major part of the PCC efforts are dedicated to software, in particular the evaluation of new software versions or new methodologies. The team spent considerable time evaluating the Intel® Cluster Studio XE 2013 Software Suite and reported multiple issues to Intel. The team maintained regular contacts with the SFT (Software design for experiments) group in CERN Physics (PH) Department as a complementary effort to the PCC workshops and teaching activities. In particular, “Concurrency Forums” were held every other week to review strategies for enhancing and rewriting portions of the LHC software frameworks with a goal to enhance the exposed parallelisation. The excellent contacts with key people in Intel SSG helped drive the overall effort.
A constructive dialogue was established with the Intel Exascale Lab in Paris where the researchers assist in optimising the OpenGate medical framework that is built on top of Geant4, a toolkit for the simulation of the passage of particles through matter. Given that the PCC has considerable experience with the Multithreaded Geant4 having participated in its development, the teams met both in Versailles and at CERN to discuss how to leverage this new opportunity and, in addition, how to interact with the software developers of Geant5, a prototype that promises to exploit both parallelisation and vectorisation. A statement of work is planned for the joint activities in 2013 and 2014.
Conferences and publications
Andrzej Nowak presented a paper at CHEP2012 in New York entitled “The future of commodity computing and many-core versus the interests of HEP software”, which was well received by the High Energy Physics community and generated a peak of interest in cutting-edge computing technologies, such as the Intel Xeon Phi. A poster, “Many-core experience with HEP software at CERN openlab” was also selected for lightning talks during the conference. Sverre Jarp gave two presentations at the International Supercomputer Conference 2012 in Hamburg in June. The PCC was also invited to deliver a keynote at the Intel European Research and Innovation Conference 2012 in Barcelona, where Andrzej Nowak presented the work of CERN and CERN openlab to over 400 attendees. Dirk Düllmann presented the “CERN Cloud Storage Evaluation”, related to the work done with Huawei, at the HEPiX Fall 2012 conference, in Beijing.
One of the PCC highlights of the year was the publication of an extensive whitepaper on software methodologies for vectorisation and parallelisation using MLfit, a maximum likelihood data analysis application. The report was the answer to a direct request from Intel’s Software and Services Group (SSG) to perform an evaluation of the multiple software technologies that now exist. The key software methods used were OpenMP, Intel Threading Building Blocks (TBB), Intel Cilk Plus, and the auto-vectorisation capability of the Intel compiler (Composer XE). Somewhat surprisingly, the Message Passing Interface (MPI) was successfully added, although the focus of the report was on single-node rather than multi-node performance optimisation. The paper concluded that the best implementation in terms of both ease of implementation and the resulting performance was a combination of the Intel Cilk Plus Array Notation for vectorisation and a hybrid TBB and MPI approach for parallelisation. A data mining study on the relationship between Compiler Flags and Performance Events was also published by the team. The idea behind the study was to see if it is possible to quickly identify the performance bottlenecks which exist in a given code and determine the compiler flags that are likely to alleviate a performance issue without compromising the accuracy and reproducibility of the results.
Workshops and teaching efforts
The openlab/PCC team has been running its educational programme comprising regular workshops with a focus on performance tuning and parallelisation since 2007. In this record year, with ten multi-day tutorials offered by the PCC, it was decided to expand the portfolio by proposing a two-day hands-on workshop on Numerical Computing (using IEEE-754 floating-point arithmetic). The first instance was organised in February 2012 with teachers from CERN, ENS in Lyon, Intel SSG, and Intel Research. The workshop was very quickly oversubscribed, and hence was repeated in September, also with a high attendance. In addition, the team was asked by Intel to repeat the workshop in CASPUR in Rome. The two standard two-day workshops were run as usual in the spring, but were collapsed into a three-day workshop, named “Parallelism, Compilers and Performance” in the autumn. This is a formula that will be pursued in the future, as it gives the opportunity to teach a wider portfolio of topics. In addition to these courses, several special workshops were also organised. The first one, held in early July, focused on advanced VTune Amplifier usage, as well as more lectures on TBB, both taught by well-known Intel experts. Finally, a workshop was co-organised with the ATLAS experiment on the Gooda performance analyser. In addition to the presenter David Levinthal from Google/US, the participants enjoyed the presence of several other renowned performance experts: Stéphane Eranian from Google/Grenoble, Patrick Demichel from HP/Grenoble, and Ahmad Yasin from Intel/Israel. The PCC was also invited to give a half-day tutorial at the IEEE ISPA Symposium in Madrid in July, and taught at the CERN School of Computing (CSC) in August as well as the INFN ESC12 School in October.
The PCC’s efforts to expand its collaboration with Intel on additional planes were particularly successful in 2012. In late 2011, CERN openlab, LHCb, Intel, and a group of Irish universities co-authored an EU FP7 project proposal focused on next-generation data acquisition capabilities. This year the ICE-DIP project application was scored very highly by evaluators, particularly because of the cutting edge science proposed – only less than 10% of the proposals are actually funded. As a result, CERN and Intel will benefit from over €1.25 million of funding to hire five bright PhD candidates, who will work with the CERN experiments on world?class cutting-edge technology, a shining example of joint industrial and public sector collaboration.