Technical documents

Find the latest technical documents published by members of our collaboration below.

Title Date File
Machine Learning applications on OpenStack log data analysis

A massive amount of data is generated by the Openstack cloud services in the format of service logs. Besides timestamps and log level fields, these logs contain additional information useful for pattern analysis.

Automation Tools for Invenio

Invenio is an open source framework, initially developed at CERN, but with many external users and contributors at this moment and prospects of growing even more in the future. Its nature as a digital

Graph Neural Network Inference on FPGA

Graph Neural Network possess prospect in track reconstruction for the Large Hadron Collider use-case due to high dimensional and sparse data.

Summer-student report: Automation Tools for Invenio

Invenio is an open source framework, initially developed at CERN, but with many external users and contributors at this moment and prospects of growing even more in the future. Its nature as a digital

Summer-student report: Neuromorphic Computing in High Energy Physics

At particle colliders, more data are produced than what the experiments can store for further analysis.  This is why the incoming collisions are processed in real time by a so-called trigger system. At the 


With the pervasiveness of high-speed computers and processors, computer companies are  looking for new technologies to incorporate into their products and use as a competitive advantage  in the market. Two modern and rapidly growing techniques are quantum computing and the use 

Summer-student report: Performance monitoring using intel performance counters for HEP applications

The HPC service at CERN provides linux batch infrastructure to run high performance computing  applications that require MPI clusters.The HPC cluster  is therefore dedicated to run MPI programs. 

Summer-student report: EOS Winston: Expert Systems for Automated Diagnosis and Remediation

This report describes EOS Winston, an event driven alerting and mitigation automation platform.  Through the use of expert rules and online anomaly detection algorithms, it catches events which 

Summer-student report: Portable Early Prediction of Sepsis from Clinical Data on Intel Myriad X

Sepsis is a life-threatening condition where microbes present in the blood stream cause an  unregulated immune response from the body which can result in tissue damage, multi-organ failure 

Summer-student-report: Deep I/O Performance Analysis of CernVM-FS using Modern Linux Tools

This report describes performance analysis of the CernVM-FS FUSE which is a software distribution  service used in high-energy physics research. The performance analysis was conducted in both kernel 

Summer-student report: EOS Integration into OpenStack Manila

The purpose of this report is to provide a brief overview of what OpenStack is, focusing on the  advantages of the integration of its Manila component at CERN. Furthermore, this document briefly 

Summer-student-report: Continuous integration for containerized scientific workflows

On this project, we decided to implement two solutions that integrate REANA and GitLab. They vary  on two main points. The first one is the amount of configuration necessary to set up the integration, 

Summer-student-report: Building effective Restful APIs with Oracle Rest Data Services 19

In 2005, the first installation of the Oracle HTML DB came out in production. Very soon the CERN developer community adopted the technology, using it in all the areas of the organization, from administrative applications to accelerators control system.

Summer-student-report: Web - UI development IoT Security Framework

The IoT security framework is a computer security platform designed to assess the risks of various heterogeneous IoT devices. The framework is currently being developed at CERN and analyses different IoT devices connected to CERN’s General Purpose Network (GPN). The GPN mostly

Calorimetry with Deep Learning: Particle Simulation and Reconstruction for Collider Physics

Using detailed simulations of calorimeter showers as training data, we investigate the use of deep learning algorithms for the simulation and reconstruction of particles produced in high-energy physics collisions.

Summer-student-report: Evaluation of Erasure Coding and other features of Hadoop 3

Hadoop ecosystem is distributed computing platform for Big Data solutions by comprising  autonomous components such as HDFS, Spark, YARN etc. HDFS is a Hadoop Distributed File  System for data storage. Current HDFS supports 3x replication for data fault-tolerance. When a 

Summer-student-report: Big Data Analysis and Machine Learning at Scale with Oracle Cloud Infrastructure

This work has successfully deployed two different use cases of interest for High Energy Physics  using cloud resources:   CMS Big data reduction: This use case consists in running a data reduction workloads for 

Summer-student-report: Function-as-a-Service on Kubernetes using Knative

T​he CERN Cloud Infrastructure team provides compute resources as a service to teams  across CERN. Users can provision resources to process experiment data, host web  applications, and accomplish other computing tasks. 


Summer-student-report: Benchmarking and optimising large scale parallel workflows

The main idea of this project is to carry out performance analysis on the RDataFrame class within the  ROOT operational framework. For this purpose, scalability analysis are performed on the execution 

Summer-student-report: Anomaly Detection in the Elasticsearch Service

The Elasticsearch Service is a distributed search and analytics engine widely used across CERN. Currently,  issues in the service are resolved manually after being detected through internal monitoring by service 

Summer-student-report: Benchmarking tools for NextGen Archiver for WinCC OA

On this project we focused on benchmarking Influx against Oracle database. One of the  primary reason is ETM/Seimens were already working on Influx database backend. To  perform benchmarking using the Query Benchmark Tool we needed to have same data 

Summer-student-report: Performance study of parquet codecs

This report describes the work carried out to study and evaluate the performance and  footprint of different parquet compression codecs on data retrival and  analytics scenarios Parquet is a standard-de-facto and the data format used to persist 

Summer-student-report: Improving BioDynaMo build system

When developing new programs or scientific libraries most of the efforts are focused on providing  efficient algorithms, the state-of-the-art techniques and maximum flexibility. However, in order for a 

Summer-student-report: Evaluate ElastAlert for IT-DB use cases

The Database Services Group (IT-DB) is responsible for providing database and middleware services to  the laboratory. For these services, it is necessary to provide proper monitoring solutions to different user 

Summer-student-report: Real-Time Server Monitoring and CNN Inference on FPGA

Neutrinos are subatomic particles, very similar to an electron, but without any electrical charge and  a very negligible rest mass. They are the most abundant and perhaps the most mysterious matter  particles in the universe!  


Summer-student-report: Using deep learning for particle identification and energy estimation in CMS HGCAL L1 trigger

In run 4 of the LHC, the extreme high luminosity is expected to generate an enormous pileup of up to 200  proton-proton collisions for each bunch crossing. This has to be read out at 750 kHz with a maximum 

Summer-student-report: Apache Spark on Hadoop YARN & Kubernetes for Scalable Physics Analysis

Big Data Technologies popularity continues to increase each year. The vast amount of data produced at the LHC experiments, which will increase further after the upgrade to HL-LHC, makes the exploration of new ways to perform physics

Summer-student-report: HGCAL Fast Simulation with Deep Learning

This project uses Wasserstain Generative Adversatial Networks (WGANs) to supply the demand for large simulation samples in the event of the CMS Phase II Upgrade. The distributions of real

Summer-student-report: Achieve a 0-downtime CERN Database infrastructure

At CERN we have many systems which provide critical services and scheduling downtime for them is quite difficult. Live kernel patching is a technique which aims to update the system without

Summer-student-report: Introducing heterogeneous farms in the CMS framework

The High Luminosity upgrade scheduled for 2026 will greatly increase the number of events per collision. Moore’s law will optimistically get a factor 4 performance gain, not enough to handle the

Summer-student-report: Java Mission Control Evaluation

This reports summarises the project I worked on during my internship with the IT-DB-IMS team. This  report will detail my efforts to configure various technologies to work with Java Mission Control, the 

Summer-student-report: Efficient unpacking of required software from CERNVM-FS

In recent times a tool for efficient unpacking of software work-flows from CernVM File System (CVMFS) into standalone images has become necessary. There are two types of use cases for such images: On the one hand they can be used to deliver

Summer-student-report: Benchmarking Machine Learning in HEP

The interest on machine learning workloads in the HEP community has increased exponentially in the last years, making more and more important the need of a thorough benchmarking of the most relevant/significant workloads that are going to run on the experiments. The purpose

Summer-student-report: Evaluating Ceph Deployments with Rook 31-08-18
Summer-student-report: Scanning Containers for Vulnerabilities on Kubernetes Clusters

On this project, we chose to work with Clair, the tool developed by CoreOS, which uses static analysis to find vulnerabilities in container images. To use Clair, we had to build a Python client,

Summer-student-report: Benchmarking Kudu and Oracle in typical WinCC OA historical data retrieval use cases

WinCC Open Architecture is a toolkit for creating Supervisory Control and Data  Acquisition (SCADA) applications, which is widely used at CERN. Hundreds of controls  applications, both in the accelerator complex and the experiments are based on it, 

Summer-student-report: KPIs Dashboard for Invenio-Related Services

The purpose of this report is to document the project I was working on for nine weeks during the summer of 2018. As part of the CERN openlab Summer Student Program 2018 I had the opportunity to work with the Digital Repositories (IT-CDA-DR) section at CERN on developing a

Summer-student-report: Technical Network Validation Using Open-shift

The interest in using containers to package applications is constantly growing in the software  development community, especially with new technologies such as Kubernetes, Open-shift  being adopted more frequently as well. This project also based on modularising the currently 

Summer-student-report: Automated Shelter Recognition in Refugee Camps

In June 2018, more than 68.5 Million people across the globe were reported to be fleeing war or persecution. Within the United Nations, UNOSAT is the organ in charge of collecting demo-

Summer-student-report: Develop streaming pipelines and analytics solutions for CERN's IoT Platform

There are two very popular concepts that we hear in the world of technology​, Big  Data and Internet of Things. Big data is referring to a data which size, complexity and  velocity is really high and is difficult to capture, pre-process and analyze it with 

Summer-student-report: Distributed BioDynaMo

Computer simulations have become a very powerful tool for scientific research. In order to fa- cilitate research in computational biology, the BioDynaMo project aims at a general platform for

Summer-student-report: GPGPU Accelerated Beam Dynamics Interfacing PyHEADTAIL with SixTrackLib

Simulations of beam dynamics vastly profit from parallelisation with high performance computing tech- niques. The two simulation libraries SixTrackLib and PyHEADTAIL are GPGPU accelerated. The former

Summer-student-report: Optimization of Data Transfer for 100 Gb/s Ethernet

In 2019 the LHCb experiment will go through an important upgrade, that will improve performance in many fields. One oh these fields is the DAQ system: it consists of a big flow of data that comes

Summer-student-report: Employing HPC for Heterogeneous HEP Data Processing

One of the most time consuming algorithms that is currently employed for the reconstruction of High Energy Physics (HEP) workflows is the local energy reconstruction. The time spent to execute this algorithm constitutes 24% of the total processing time, thus achieving substantial

Summer-student-report: POSEIDON - Analyzing the secrets of the Trident Node monitoring

Improving the performance of an application is an important objective carried out from the application  conception until its deprecation. Developers are constantly trying to improve the performance of their 

Summer-student-report: yXRootD PyPI distribution and new declarative file access API for XRootD Client

The project described in this report is related to XRootD framework development. It was divided into two parts. First part was about publishing XRootD python bindings called PyXRootD to Python Package Index. This makes PyXRootD installation much easier and resolves problem

Summer-student-report: Parallel Task Execution

Puppet is a great tool for making changes on systems, and ensuring that those changes happen. But Puppet is not intended to make this happen on many systems at the same time. Puppet is intended for eventual compliance over time. Each agent checks in over a period of time, al-

Summer-student-report: Thin Element Comparison Between MAD-X and SixTrack

In this report thin, single elements were compared between MAD-X and SixTrack. A testing framework for efficient comparisons between the two tracking codes was developed. A few dif- ferences between the tracking codes were found then documented and two bugs, one in the

Summer-student-report: OpenStack Infrastructure Optimization Service

CERN operates an OpenStack based private cloud to provide its users with resources on demand. It is one  of the largest OpenStack deployments in the world, with more than 300,000 cores over 9,000 hypervisors  [1].  

Summer-student-report: MPI Learn - distributed training

MPI Learn is a framework for the distributed training of neural networks. This platform is aimed at machine learning users, who can use it to train models faster, without dealing with the com-

Summer-student-report: Function as a Service

Function as a service (FaaS) is a category of cloud computing services that  provides a platform allowing customers to develop, run, and manage application  functionalities without the complexity of building and maintaining the infrastructure 

Summer-student-report: Natural Language Processing for Scientific Research

The goal of this Openlab project is to create a Smart Data Analytics Platform for Science that will host analytical tools, publish data, share resources, interact with bots, collaborate and build communities of researchers with various backgrounds in a single ecosystem. With

Summer-student-report: Deep Representation Learning for Trigger Monitoring

We propose a novel neural network architecture called Hierarchical Latent Autoencoder to exploit the underlying hierarchical nature of the CMS Trigger System for data quality monitoring.

Summer-student-report: Evaluation of Containers for HPC

Some of the main challenges in scientific computing today deal with performance-preserving portability of software and reproducibility of the final results; likewise, with the advent of modern

Summer-student-report: Information aggregation and analytics for ATLAS Frontier

Squid-Frontier system [1] is currently used to manage access to the COOL database [2].  This system includes many widely distributed computing sites and applications. Clients  presented by PanDA (Production ANd Distributed Analysis system, the ATLAS’ 

Summer-student-report: Malware analysis management

Malware Analysis Management (M.A.M.) or the automated sandbox analysis of  quarantined malware samples focuses on a detailed analysis of malware samples  reaching CERN through email traffic. M.A.M. is a side process of the main email pipeline 

Summer-student-report: REANA - user dashboard for reusable analysis platform

REANA is a reusable analysis platform which offers physicists the ability to structure their research  data analysis and run their computational workflows in a containerized computing cloud. 

A New Platform for Large-Scale Biological Simulation

Computer simulations have become a very powerful tool for scientific research. In order to facilitate research in computational biology, the BioDynaMo project aims at a general platform for biological computer simulations, which should be executable on hybrid cloud computing systems.

From Physics to industry: EOS outside HEP

In the competitive market for large-scale storage solutions the current main disk storage system at CERN EOS has been showing its excellence in the multi-Petabyte high-concurrency regime.

Exploring RapidIO Technology within a DAQ System Event Building Network

Exploring RapidIO RapidIO ( technology is a packet-switched high-performance fabric, which has been under active development since 1997. The technology is used in all 4G/LTE basestations worldwide.

RapidIO as a multi-purpose interconnect

RapidIO ( technology is a packet-switched high-performance fabric, which has been under active development since 1997. Originally meant to be a front side bus, it developed into a system level interconnect which is today used in all 4G/LTE base stations world wide.

A Deep Learning tool for fast detector simulation

Machine Learning techniques have been used in different applications by the HEP community: in this talk, we discuss the case of detector simulation.

An optimization approach for agent-based computational models of biological development

Current research in the field of computational biology often involves simulations on high-performance computer clusters. It is crucial that the code of such simulations is efficient and correctly reflects the model specifications.

CERN openlab: Engaging industry for innovation in the LHC Run 3-4 R&D programme

LHC Run3 and Run4 represent an unprecedented challenge for HEP computing in terms of both data volume and complexity. New approaches are needed for how data is collected and filtered, processed, moved, stored and analysed if these challenges are to be met with a realistic budget.

Extending an asynchronous messaging library using an RDMA-enabled interconnect

As computing power and I/O performance is increasing at an aggressive rate several RDMA enabled interconnect technologies have been entering the market, promising low latency and high throughput.

1000 things you always want to know about SSO but you never dare to ask 12-10-17
CERN openlab White Paper: Future IT Challenges in Scientific Research

In this white paper, CERN openlab sets out challenges to tackle together through joint R&D projects with our industry collaborators over the coming years.This unique public-private partnership between research and leading ICT companies is ideally placed to tackle these challenges, producing r

Exploring RapidIO Technology Within a DAQ System Event Building Network

RapidIO technology is a packet-switched high-performance fabric, which has been under active development since 1997. The technology is used in all 4G/LTE base stations worldwide.