Integration of Oracle Cloud Resources into CERN IT BC&DR Project

Project Goal

The aim is to establish a disaster recovery plan for CERN’s crucial on-premises Oracle databases. This initiative focuses on enabling seamless switchover via Oracle Data Guard Database replicas operating asynchronously within Oracle Cloud Infrastructure. To secure replication transport, it employs private links across the GÉANT network, connecting to the Oracle Fast Connect endpoint situated in the Frankfurt datacenter. 

Background

For a major incident impacting one of the CERN data centers leading to major part of infrastructure being unavailable for an extended period, it is necessary to have a strategy for where to build back the services deemed critical for CERN mission. Many could potentially be built back in the Cloud, considering data location and protection, as well as network bandwidth needs. In any case, a strategy and actions need to be developed and performed to allow for preservation of CERN critical data stored in on-premises Oracle databases and replicate them off-site (e.g. in Oracle Cloud Infrastructure), while still completely under full CERN control.

Progress

In 2023, the focus centered on enhancing Oracle Cloud Infrastructure (OCI) tenancy and streamlining automation tests for the Oracle Database replication process while assessing the performance of Disaster Recovery protocols. A comprehensive overhaul of the Virtual Cloud Network configuration was executed, aligning it closely with CERN’s internal network segmentation—dividing it into distinct subnets for general use, experiments, and technical purposes. Emphasis was placed on fortifying each subnet against unauthorized access, implementing robust logging, and devising detection infrastructure for the CERN computer security team. Within this context, a Key Management System proof of concept was undertaken. Leveraging Oracle Key Vault software, it established a highly available encryption key and a 3rd party secret store, ensuring replication between on-premises infrastructure and OCI. Multiple replication tests were conducted, varying in data size from a few gigabytes to tens of terabytes. Measurements of latency and throughput were captured to effectively calibrate expectations for comprehensive disaster recovery reconstruction and subsequent failover tests. Concurrently, the project implementation team pursued an expanded understanding of the OCI, engaging in various Oracle University trainings with the aim of obtaining formal certification credentials.

Next Steps

Production level database switch-over of all involved databases needs to be performed while capturing performance metrics to document required timeline for disaster recovery. To conclude this project a complete tear-down and rebuild of off-site Oracle Cloud Infrastructure resources needs to be planned to validate configuration and documentation of all parts. In addition, security hardening and audit of data governance, access, and lifecycle - in cooperation with CERN Computer security team and CERN Data Protection Office - is necessary.

 

Project Coordinator: Miroslav Potocky

Technical Team: Miroslav Potocky, Alexandros Stoumpis

Collaboration Liaisons from Oracle: Şengül Chardonnereau, Jérôme Designe, Sébastien Hurel, Stefan Jung, Cristobal Pedregal-Martin, Eva Dafonte Perez

In partnership with: Oracle