The organisation runs two physical data centres and a private cloud platform supporting public-sector applications, and this role exists to professionalise resilience and recovery across that infrastructure. The position focuses on data center design and cybersecurity audit work to define a pragmatic disaster recovery plan and staged move to a multidatacentre architecture.
The mission
The immediate project is an analysis and implementation programme that starts with a business impact analysis on three already-identified critical applications, and continues with a staged DRP and BCP rollout. The technical landscape includes two on-premise data centres, private cloud resources, and standard networking and security stacks; the work will define RTO and RPO targets and the steps required to meet them.
Day to day you will assess the full stack from physical site and power through virtualisation and application layers, produce a realistic migration and recovery trajectory, and supervise the progressive commissioning of services on the second data centre (DC2). You will also design resilient active/active or active/passive architectures, specify restart procedures that integrate cloud as a complementary recovery option, and coordinate with internal teams and external suppliers to validate assumptions and test recovery steps.
Your responsibilities
- Define the disaster recovery plan (DRP) and an accompanying business continuity plan (BCP) that meet agreed RTO and RPO outcomes for the three priority applications.
- Lead the technical analysis across physical infrastructure, virtualisation and application layers to identify gaps and reusable artefacts for recovery and restart procedures.
- Design resilient multi-datacentre architectures, documenting active/active and active/passive options with clear trade-offs for operations and costs.
- Establish and validate restart procedures that incorporate the private cloud as a fallback, then oversee progressive service enablement on DC2.
- Coordinate security and audit assessments to ensure recovery plans comply with cybersecurity requirements and internal controls.
- Document deliverables, run recovery exercises and hand over runbooks to operations with measurable acceptance criteria.
Your profile
Essential skills
- Proven capability in data center design and operations, from physical layer to application integration.
- Experience conducting business impact analysis and translating findings into RTO/RPO-based recovery trajectories.
- Demonstrable expertise in cybersecurity assessment or audit, able to align DRP requirements with security controls.
- Practical experience implementing resilient architectures, including active/active and active/passive configurations.
- Track record defining and running disaster recovery and restart procedures that integrate cloud as a complementary recovery environment.
- Strong stakeholder management and written documentation skills, able to produce runbooks and test reports for technical and non-technical audiences.
Education
- Degree in engineering, computer science or equivalent professional experience