Technical documents

Find the latest technical documents published by members of our collaboration below.

Title Date File
Summer-student-report: Evaluation of Erasure Coding and other features of Hadoop 3

Hadoop ecosystem is distributed computing platform for Big Data solutions by comprising  autonomous components such as HDFS, Spark, YARN etc. HDFS is a Hadoop Distributed File  System for data storage. Current HDFS supports 3x replication for data fault-tolerance. When a 

31-08-19
Summer-student-report: Big Data Analysis and Machine Learning at Scale with Oracle Cloud Infrastructure

This work has successfully deployed two different use cases of interest for High Energy Physics  using cloud resources:   CMS Big data reduction: This use case consists in running a data reduction workloads for 

31-08-19
Summer-student-report: Function-as-a-Service on Kubernetes using Knative

T​he CERN Cloud Infrastructure team provides compute resources as a service to teams  across CERN. Users can provision resources to process experiment data, host web  applications, and accomplish other computing tasks. 

 

31-08-19
Summer-student-report: Benchmarking and optimising large scale parallel workflows

The main idea of this project is to carry out performance analysis on the RDataFrame class within the  ROOT operational framework. For this purpose, scalability analysis are performed on the execution 

31-08-19
Summer-student-report: Anomaly Detection in the Elasticsearch Service

The Elasticsearch Service is a distributed search and analytics engine widely used across CERN. Currently,  issues in the service are resolved manually after being detected through internal monitoring by service 

31-08-19
Summer-student-report: Benchmarking tools for NextGen Archiver for WinCC OA

On this project we focused on benchmarking Influx against Oracle database. One of the  primary reason is ETM/Seimens were already working on Influx database backend. To  perform benchmarking using the Query Benchmark Tool we needed to have same data 

31-08-19
Summer-student-report: Performance study of parquet codecs

This report describes the work carried out to study and evaluate the performance and  footprint of different parquet compression codecs on data retrival and  analytics scenarios Parquet is a standard-de-facto and the data format used to persist 

31-08-19
Summer-student-report: Improving BioDynaMo build system

When developing new programs or scientific libraries most of the efforts are focused on providing  efficient algorithms, the state-of-the-art techniques and maximum flexibility. However, in order for a 

31-08-19
Summer-student-report: Evaluate ElastAlert for IT-DB use cases

The Database Services Group (IT-DB) is responsible for providing database and middleware services to  the laboratory. For these services, it is necessary to provide proper monitoring solutions to different user 

31-08-19
Summer-student-report: Real-Time Server Monitoring and CNN Inference on FPGA

Neutrinos are subatomic particles, very similar to an electron, but without any electrical charge and  a very negligible rest mass. They are the most abundant and perhaps the most mysterious matter  particles in the universe!  

 

31-08-19
Summer-student-report: Using deep learning for particle identification and energy estimation in CMS HGCAL L1 trigger

In run 4 of the LHC, the extreme high luminosity is expected to generate an enormous pileup of up to 200  proton-proton collisions for each bunch crossing. This has to be read out at 750 kHz with a maximum 

31-08-19
Summer-student-report: Apache Spark on Hadoop YARN & Kubernetes for Scalable Physics Analysis

Big Data Technologies popularity continues to increase each year. The vast amount of data produced at the LHC experiments, which will increase further after the upgrade to HL-LHC, makes the exploration of new ways to perform physics

31-08-18
Summer-student-report: HGCAL Fast Simulation with Deep Learning

This project uses Wasserstain Generative Adversatial Networks (WGANs) to supply the demand for large simulation samples in the event of the CMS Phase II Upgrade. The distributions of real

31-08-18
Summer-student-report: Achieve a 0-downtime CERN Database infrastructure

At CERN we have many systems which provide critical services and scheduling downtime for them is quite difficult. Live kernel patching is a technique which aims to update the system without

31-08-18
Summer-student-report: Introducing heterogeneous farms in the CMS framework

The High Luminosity upgrade scheduled for 2026 will greatly increase the number of events per collision. Moore’s law will optimistically get a factor 4 performance gain, not enough to handle the

31-08-18
Summer-student-report: Java Mission Control Evaluation

This reports summarises the project I worked on during my internship with the IT-DB-IMS team. This  report will detail my efforts to configure various technologies to work with Java Mission Control, the 

31-08-18
Summer-student-report: Efficient unpacking of required software from CERNVM-FS

In recent times a tool for efficient unpacking of software work-flows from CernVM File System (CVMFS) into standalone images has become necessary. There are two types of use cases for such images: On the one hand they can be used to deliver

31-08-18
Summer-student-report: Benchmarking Machine Learning in HEP

The interest on machine learning workloads in the HEP community has increased exponentially in the last years, making more and more important the need of a thorough benchmarking of the most relevant/significant workloads that are going to run on the experiments. The purpose

31-08-18
Summer-student-report: Evaluating Ceph Deployments with Rook 31-08-18
Summer-student-report: Scanning Containers for Vulnerabilities on Kubernetes Clusters

On this project, we chose to work with Clair, the tool developed by CoreOS, which uses static analysis to find vulnerabilities in container images. To use Clair, we had to build a Python client,

31-08-18
Summer-student-report: Benchmarking Kudu and Oracle in typical WinCC OA historical data retrieval use cases

WinCC Open Architecture is a toolkit for creating Supervisory Control and Data  Acquisition (SCADA) applications, which is widely used at CERN. Hundreds of controls  applications, both in the accelerator complex and the experiments are based on it, 

31-08-18
Summer-student-report: KPIs Dashboard for Invenio-Related Services

The purpose of this report is to document the project I was working on for nine weeks during the summer of 2018. As part of the CERN openlab Summer Student Program 2018 I had the opportunity to work with the Digital Repositories (IT-CDA-DR) section at CERN on developing a

31-08-18
Summer-student-report: Technical Network Validation Using Open-shift

The interest in using containers to package applications is constantly growing in the software  development community, especially with new technologies such as Kubernetes, Open-shift  being adopted more frequently as well. This project also based on modularising the currently 

31-08-18
Summer-student-report: Automated Shelter Recognition in Refugee Camps

In June 2018, more than 68.5 Million people across the globe were reported to be fleeing war or persecution. Within the United Nations, UNOSAT is the organ in charge of collecting demo-

31-08-18
Summer-student-report: Develop streaming pipelines and analytics solutions for CERN's IoT Platform

There are two very popular concepts that we hear in the world of technology​, Big  Data and Internet of Things. Big data is referring to a data which size, complexity and  velocity is really high and is difficult to capture, pre-process and analyze it with 

31-08-18
Summer-student-report: Distributed BioDynaMo

Computer simulations have become a very powerful tool for scientific research. In order to fa- cilitate research in computational biology, the BioDynaMo project aims at a general platform for

31-08-18
Summer-student-report: GPGPU Accelerated Beam Dynamics Interfacing PyHEADTAIL with SixTrackLib

Simulations of beam dynamics vastly profit from parallelisation with high performance computing tech- niques. The two simulation libraries SixTrackLib and PyHEADTAIL are GPGPU accelerated. The former

31-08-18
Summer-student-report: Optimization of Data Transfer for 100 Gb/s Ethernet

In 2019 the LHCb experiment will go through an important upgrade, that will improve performance in many fields. One oh these fields is the DAQ system: it consists of a big flow of data that comes

31-08-18
Summer-student-report: Employing HPC for Heterogeneous HEP Data Processing

One of the most time consuming algorithms that is currently employed for the reconstruction of High Energy Physics (HEP) workflows is the local energy reconstruction. The time spent to execute this algorithm constitutes 24% of the total processing time, thus achieving substantial

31-08-18
Summer-student-report: POSEIDON - Analyzing the secrets of the Trident Node monitoring

Improving the performance of an application is an important objective carried out from the application  conception until its deprecation. Developers are constantly trying to improve the performance of their 

31-08-18
Summer-student-report: yXRootD PyPI distribution and new declarative file access API for XRootD Client

The project described in this report is related to XRootD framework development. It was divided into two parts. First part was about publishing XRootD python bindings called PyXRootD to Python Package Index. This makes PyXRootD installation much easier and resolves problem

31-08-18
Summer-student-report: Parallel Task Execution

Puppet is a great tool for making changes on systems, and ensuring that those changes happen. But Puppet is not intended to make this happen on many systems at the same time. Puppet is intended for eventual compliance over time. Each agent checks in over a period of time, al-

31-08-18
Summer-student-report: Thin Element Comparison Between MAD-X and SixTrack

In this report thin, single elements were compared between MAD-X and SixTrack. A testing framework for efficient comparisons between the two tracking codes was developed. A few dif- ferences between the tracking codes were found then documented and two bugs, one in the

31-08-18
Summer-student-report: OpenStack Infrastructure Optimization Service

CERN operates an OpenStack based private cloud to provide its users with resources on demand. It is one  of the largest OpenStack deployments in the world, with more than 300,000 cores over 9,000 hypervisors  [1].  

31-08-18
Summer-student-report: MPI Learn - distributed training

MPI Learn is a framework for the distributed training of neural networks. This platform is aimed at machine learning users, who can use it to train models faster, without dealing with the com-

31-08-18
Summer-student-report: Function as a Service

Function as a service (FaaS) is a category of cloud computing services that  provides a platform allowing customers to develop, run, and manage application  functionalities without the complexity of building and maintaining the infrastructure 

31-08-18
Summer-student-report: Natural Language Processing for Scientific Research

The goal of this Openlab project is to create a Smart Data Analytics Platform for Science that will host analytical tools, publish data, share resources, interact with bots, collaborate and build communities of researchers with various backgrounds in a single ecosystem. With

31-08-18
Summer-student-report: Deep Representation Learning for Trigger Monitoring

We propose a novel neural network architecture called Hierarchical Latent Autoencoder to exploit the underlying hierarchical nature of the CMS Trigger System for data quality monitoring.

31-08-18
Summer-student-report: Evaluation of Containers for HPC

Some of the main challenges in scientific computing today deal with performance-preserving portability of software and reproducibility of the final results; likewise, with the advent of modern

31-08-18
Summer-student-report: Information aggregation and analytics for ATLAS Frontier

Squid-Frontier system [1] is currently used to manage access to the COOL database [2].  This system includes many widely distributed computing sites and applications. Clients  presented by PanDA (Production ANd Distributed Analysis system, the ATLAS’ 

31-08-18
Summer-student-report: Malware analysis management

Malware Analysis Management (M.A.M.) or the automated sandbox analysis of  quarantined malware samples focuses on a detailed analysis of malware samples  reaching CERN through email traffic. M.A.M. is a side process of the main email pipeline 

31-08-18
Summer-student-report: REANA - user dashboard for reusable analysis platform

REANA is a reusable analysis platform which offers physicists the ability to structure their research  data analysis and run their computational workflows in a containerized computing cloud. 

31-08-18
A New Platform for Large-Scale Biological Simulation

Computer simulations have become a very powerful tool for scientific research. In order to facilitate research in computational biology, the BioDynaMo project aims at a general platform for biological computer simulations, which should be executable on hybrid cloud computing systems.

28-11-16
From Physics to industry: EOS outside HEP

In the competitive market for large-scale storage solutions the current main disk storage system at CERN EOS has been showing its excellence in the multi-Petabyte high-concurrency regime.

01-09-17
Exploring RapidIO Technology within a DAQ System Event Building Network

Exploring RapidIO RapidIO (http://rapidio.org/) technology is a packet-switched high-performance fabric, which has been under active development since 1997. The technology is used in all 4G/LTE basestations worldwide.

10-05-17
RapidIO as a multi-purpose interconnect

RapidIO (http://rapidio.org/) technology is a packet-switched high-performance fabric, which has been under active development since 1997. Originally meant to be a front side bus, it developed into a system level interconnect which is today used in all 4G/LTE base stations world wide.

20-06-17
A Deep Learning tool for fast detector simulation

Machine Learning techniques have been used in different applications by the HEP community: in this talk, we discuss the case of detector simulation.

27-06-18
An optimization approach for agent-based computational models of biological development

Current research in the field of computational biology often involves simulations on high-performance computer clusters. It is crucial that the code of such simulations is efficient and correctly reflects the model specifications.

10-03-18
CERN openlab: Engaging industry for innovation in the LHC Run 3-4 R&D programme

LHC Run3 and Run4 represent an unprecedented challenge for HEP computing in terms of both data volume and complexity. New approaches are needed for how data is collected and filtered, processed, moved, stored and analysed if these challenges are to be met with a realistic budget.

04-03-17
Extending an asynchronous messaging library using an RDMA-enabled interconnect

As computing power and I/O performance is increasing at an aggressive rate several RDMA enabled interconnect technologies have been entering the market, promising low latency and high throughput.

20-03-17
1000 things you always want to know about SSO but you never dare to ask 12-10-17
CERN openlab White Paper: Future IT Challenges in Scientific Research

In this white paper, CERN openlab sets out challenges to tackle together through joint R&D projects with our industry collaborators over the coming years.This unique public-private partnership between research and leading ICT companies is ideally placed to tackle these challenges, producing r

21-09-17
Exploring RapidIO Technology Within a DAQ System Event Building Network

RapidIO technology is a packet-switched high-performance fabric, which has been under active development since 1997. The technology is used in all 4G/LTE base stations worldwide.

09-09-17