Multiple monitoring stacks now feed the same incident pipeline, producing duplicate alerts and masking real problems. The logging and monitoring estate must be reconciled across Dynatrace, IBM Netcool Operation Insights, Sysdig and ELK to restore reliable signal. As a Production Data Engineer Logging And Monitoring with hands-on PowerShell and API integration experience you will design and implement the integrations and automations that cut noise and speed up detection and escalation.
The Mission
You will join the Logging and Monitoring squad inside a major Belgian financial institution, responsible for the reliability, performance and compliance of dozens of mission-critical applications. The technical landscape includes Dynatrace, IBM Netcool Operation Insights, Sysdig, ELK (Log as a Service) and IBM Cloud Logs, and the team operates within formal EVM (Event Management) and INC (Incident Management) processes. The squad is a small agile group working with platform teams and integration partners to deliver measurable operational improvement.
In this role you will own end-to-end monitoring and logging solutions: from event mapping and correlation rules to automation that enriches, prioritises and escalates alerts. You will build PowerShell scripts and API-driven integrations to remove manual steps, define repeatable logging deployment patterns for cloud and IBM dMZR As Code environments, and act as the escalation point for complex incidents. The work you deliver will be judged by concrete metrics, notably reductions in mean time to detection and mean time to recovery.
Your Responsibilities
- Implement and configure monitoring and logging solutions across Dynatrace, IBM Netcool Operation Insights, Sysdig and ELK, delivering a consolidated, de-duplicated event stream.
- Integrate event flows with EVM and INC processes to ensure consistent enrichment, routing and SLA-based escalation.
- Develop automation using PowerShell and API integrations to eliminate manual triage, enrich alerts and trigger escalation playbooks.
- Define templates and infrastructure-as-code patterns for IBM Cloud Logs and IBM dMZR As Code to enable repeatable, auditable deployments.
- Troubleshoot cross-platform incidents and lead technical resolution with platform teams and third-party providers.
- Produce runbooks, dashboards and training to raise squad and stakeholder capabilities and to improve on-call outcomes.
Your Profile
Essential Skills
- Senior, hands-on experience deploying and tuning Dynatrace, IBM Netcool Operation Insights, Sysdig and ELK in production environments.
- Strong scripting and automation skills with PowerShell and REST API integrations, with delivered production automation examples.
- Practical experience translating EVM (Event Management) and INC (Incident Management) requirements into monitoring rules and workflows.
- Excellent analytical troubleshooting skills, able to design durable correlation and suppression logic from noisy telemetry.
- Clear communicator and collaborative team member, experienced in small agile squads and in engaging both technical and non-technical stakeholders.
- Familiarity with IBM Cloud Logs and working in cloud infrastructure contexts.
Preferred Skills
- Experience with IT Cloud and IBM dMZR As Code.
- Knowledge of ITIL/ITSM practices and experience in regulated environments is an advantage.
Languages
- Dutch, B2 — active proficiency preferred, passive understanding of French acceptable.
- French, B2 — active proficiency preferred, passive understanding of Dutch acceptable.
- English, C1 — working proficiency for documentation and cross-border collaboration.