Project Goal
This project aims to solidify a robust disaster recovery plan for CERN’s critical on-premises Oracle databases, leveraging advancements made in infrastructure, security, and automation. The project aims to ensure seamless database fail-over and recovery through Oracle Data Guard Database replicas operating asynchronously in Oracle Cloud Infrastructure (OCI). Enhanced configurations, including Terraform automation and multi-region networking capabilities, support this effort while maintaining full data governance under CERN’s control.
Background
CERN’s critical databases are foundational to its mission, requiring stringent protection against prolonged infrastructure outages. Significant progress has already been achieved in replicating data securely to off-site OCI regions using private links via the GÉANT network and Oracle Fast Connect. Improving automation workflows has laid the groundwork for a resilient disaster recovery strategy. These efforts ensure not only data preservation but also operational flexibility for reconstruction and fail-over scenarios. The focus now shifts to optimizing costs, refining cloud integration, and finalizing the framework for production-level reliability.
Progress in 2024
A production-level switch-over for selected critical Oracle databases was planned for 2025Q1 during year-end technical stop and key metrics will be captured to ensure precise expectations for reconstruction and fail-over scenarios.
A complete tear-down and rebuild of off-site OCI resources was initiated several times to validate configurations and ensure comprehensive documentation for future use.
Terraform automation via OCI stacks is being incorporated into the deployment process to streamline and standardize the tear-down and rebuild of OCI resources, reducing manual intervention and improving efficiency.
Review of data governance, access controls, and life-cycle management processes were conducted to align with CERN’s data protection policies.
CERN expanded its OCI tenancy networking into additional regions to enable provisioning of GPU-based resources. This effort supports advanced computational needs beyond disaster recovery project and enhances resource flexibility across regions.
Next Steps
FinOps practices will be implemented to ensure cost optimization and transparency in cloud resource usage. This includes monitoring and managing cloud expenses, analysing cost-performance trade-offs, and identifying opportunities for savings without compromising performance or security.
A comprehensive comparison between cloud-based and on-premise disaster recovery solutions will be conducted. This analysis will focus on factors such as cost efficiency, scalability, performance, security, and long-term sustainability. The insights from this comparison will inform strategic decisions about future infrastructure investments and operations.
In collaboration with the CERN Computer Security Team and the Data Protection Office, security hardening efforts and data privacy review are planned to bolster protection against threats.
Project Coordinator: Miroslav Potocky
Technical Team: Miroslav Potocky, Alexandros Stoumpis
Collaboration Liaisons from Oracle: Şengül Chardonnereau, Sébastien Hurel, Stefan Jung, John Lathouwers, Cristobal Pedregal-Martin, Eva Dafonte Perez
In partnership with: Oracle