Through this project, we are developing a multi-science data-analysis platform; this is being built up around CERN-developed technologies like Zenodo (a research data repository), REANA (a system for instantiating data-analysis in the cloud), and CVMFS (a scalable, reliable and low-maintenance software distribution service). When finished, our platform will support a complete data-analysis life cycle, from data discovery through to access, processing, and end-user data analysis. The first use case for this platform relates to genomics, with ROOT — a data-processing framework created at CERN and used widely in the high-energy physics community — being used to store and process genomics data sequences.
In many research communities today, reproducibility, communication, and data pipelines are implemented in suboptimal ways. Through the GeneROOT project, we are therefore working to create a powerful system to capture and facilitate the habits of researchers. Our platform will allow for negotiation and sharing of common ‘values’ among scientists within a given ﬁeld and will help us to understand the reasoning behind why certain choices are made. Rather than providing a simple toolkit for researchers, we are creating a rich system through which researchers can challenge the ‘value chains’ within their own respective ﬁelds and potentially enhance their approach to performing research.
The GeneROOT project has now been running since mid-2016. Following the establishment of the project team, work gathered pace in 2017. To date, we have focused on the design and implementation of initial prototypes, mock ups, and the development of a minimal viable platform. Initial use cases for the platform were generated in collaboration with the biomedical community, with significant effort dedicated to understanding the requirements of the community for such a system.
King’s College London in the UK and SIDRA Medicine in Qatar are working with us to carry out initial tests with the platform. We are also exploring opportunities for collaboration with other research institutes, including Maastricht University Hospital and Maastricht University in the Netherlands, the European Bioinformatics Institute (EMBL-EBI) and the University of Cambridge in the UK. Initial prototype systems and minimal viable platforms are set to be shared with the communities for testing and feedback in early 2018.
In the coming year, our primary focus will shift to the implementation and architectural design of the ﬁrst prototypes. The understanding we have gained from our engagement with the research communities — as well as the technical requirements compiled — will feed into the development of these prototypes. Based on the feedback we receive, we will then work to improve the platform further and to deploy the ﬁrst product to the wider research community.
- J. J. Gonzalez Ortiz, ROOT based Genomics project (11 August), Presented at CERN openlab summer students’ lightning talks, Geneva, 2017. http://cern.ch/go/8Bst
- T. Aliyev, GeneROOT (21 September), Presented at CERN openlab Open Day, Geneva, 2017. http://cern.ch/go/8HXQ
- T. Aliyev, Big Data in Healthcare (29 September), Presented at Innovation in Healthcare: Futuro Summit, Florence, 2017. http://cern.ch/go/7KKX
- T. Aliyev, F. Rademakers, A Multi-Science Data Analysis Platform and the GeneROOT Use Case (8 December), Presented at CERN IT Technical Forum, Geneva, 2017. http://cern.ch/go/8zr8