Project goal

The goal is to characterise and improve the performance of the Ceph block storage for OpenStack, the software platform used to manage CERN’s private computing cloud. Ceph is a free-software storage platform that provides interfaces for object-, block-, and file-level storage. Block storage is critical 
for the cloud, and plays a key role in supporting challenging use cases, such as those involving very large volumes of data, the need for high availability, and more.

Our work involves the development of tools to help us understand the critical bottlenecks in our Ceph system. This will enable us to precisely target future efforts to improve performance.

R&D topic
R&D Topic 1: Data-centre technologies and infrastructures
Project coordinator(s)
Daniel van der Ster and Tim Bell
Technical team members
Julien Collet
Collaborator liaison(s)
Brian Stein, Philip Williams

Collaborators

Project background

Ceph is an open-source storage system which has become the de-facto standard solution for OpenStack clouds around the world. Here at CERN, Ceph block storage is capacity-oriented, offering acceptable latency and IOPS for the bulk of our block storage use-cases. We’re working to deliver lower-latency and higher-IOPS storage, thus bringing a range of benefits to our users, such as the ability to run databases or interactive processing applications. Gaining a better understanding of the critical bottlenecks in the performance 
of this system will help us to take informed decisions about hardware procurement. This is particularly important where there is a trade-off between performance and capacity.

Recent progress

First, we developed a benchmarking suite for the assessment of cluster performance, in order to highlight the variations in performance between the different layers of Ceph. In parallel, various contributions to the Ceph project were developed and merged, in collaboration with the upstream team.

A Ceph “top” tool, enabling detailed evaluation of a cluster’s workload, was also implemented in 2018. This tool helps Ceph administrators to understand the behaviour of the cluster. We worked to integrate it as a Ceph core feature, resulting in a first basic implementation. Finally, a preliminary performance evaluation of a new all-flash, hyper-converged cluster was conducted. The first results show high levels of IOPS, a promising result that highlights the potential for deploying new use cases on top of Ceph at CERN.

Next steps

We plan on finalising the implementation of the built-in Ceph top tool, and will continue evaluating real CERN use cases on upcoming all-flash architectures.


Presentations

    D. van der Ster, Mastering Ceph Operations: Upmap and the Mgr Balancer (13 November). Presented at Ceph Day Berlin, Berlin, 2018.http://cern.ch/go/6hzm
    J. Collet, Ceph @ CERN (28 November). Presented at JTech Ceph day, Paris, 2018.