The organisation is the IT operator for a major public-sector administration, responsible for running multi-tenant infrastructure and business applications for education and government services. This role exists to own major incident and problem processes across a portfolio of services, using ITIL v3/v4 practices and ServiceNow for incident, problem and RFC workflows, while working with Office 365 for reporting and stakeholder communication.
The mission
You will join a cross-functional operations team that ensures availability and resilience for services used by thousands of end users across schools and administration offices. The technical landscape combines on-premise data centres and public cloud tenants, monitored with tools such as Zabbix and OhDear, and tracked in ServiceNow at scale. Your work is central to keeping services running and to reducing recurrence of major incidents.
Day to day you will lead the coordination of major incidents from detection to service restoration and post-mortem. That includes mobilising multi-team responders, prioritising fixes under pressure, producing closure reports within the target timeframe, and driving root-cause analysis into concrete corrective actions aligned with Change Management. You will also coach support teams on process discipline and run periodic KPI reporting to stakeholders.
Your responsibilities
- Lead the end-to-end coordination of major incidents, ensuring fast restoration and clear, timebound communication to technical teams and business stakeholders
- Drive root-cause analysis and follow-through, converting incident outcomes into a tracked Problem Management backlog and measurable risk reduction
- Coach and enforce consistent Major Incident and Problem processes across support teams to improve operational quality and data accuracy in ServiceNow
- Produce concise, actionable post-mortems and service closure reports within the defined SLA window, and present findings to IT management
- Maintain and present KPI dashboards that surface trends, recurring issues and progress on corrective actions
Your profile
Essential skills
- Advanced mastery of Incident Management within ITIL v3/v4, with good knowledge of Problem, Change and CMDB practices
- Practical experience using ServiceNow for Incident/Problem/RFC workflows and tracking
- Competence with Microsoft Office 365 tools for reporting, documentation and stakeholder communication
- Ability to coordinate multi-team responses and make prioritisation decisions under pressure
- Experience diagnosing hybrid on prem cloud infrastructure issues and steering corrective action to reduce recurrence
Preferred skills
- Experience building dashboards and KPI reporting, for example ServiceNow Performance Analytics or Microsoft PowerBI
- Familiarity with monitoring tools such as Zabbix and OhDear
- Certifications such as ITIL Managing Professional, ISO 27001, PMP or Scrum Master are an advantage