
Senior Engineer- Application Performance Monitoring
- Abu Dhabi
- Permanent
- Full-time
- To ensure end-to-end visibility of application health, performance, and user experience across critical business services.
- To implement and manage modern APM tools that support real-time insights, transaction tracing, and deep diagnostics of application behaviour.
- To support performance engineering, incident prevention, and rapid root cause analysis of application-related issues.
- To partner with DevOps, development, and infrastructure teams in creating a performance-centric culture backed by actionable telemetry.
- Implement, configure, and manage enterprise-grade APM tools across the application landscape.
- Build dashboards, alert policies, and SLA-based thresholds for business-critical applications.
- Enable distributed tracing and service maps to visualise and diagnose performance across complex dependencies.
- Collaborate with application owners and development teams to address recurring issues and improve code-level performance.
- Ensure seamless integration of APM tools with CI/CD pipelines and incident response workflows.
- Lead performance monitoring efforts during major application releases or seasonal peak loads.
- Conduct APM health checks and ensure consistent telemetry coverage across environments (dev, test, prod).
- Maintain monitoring standards and documentation including runbooks, dashboards, and escalation guides.
- Assist in defining and tracking service-level objectives (SLOs) and service-level agreements (SLAs).
- Contribute to post-incident reviews and performance retrospectives to enhance visibility and reduce MTTR.
- Support automation of alert routing, event correlation, and ticket enrichment using observability data.
- Keep abreast of trends in application monitoring, including OpenTelemetry adoption and AI-powered diagnostics.
- Support out-of-hours performance troubleshooting during P1/P2 incidents as part of a rotating schedule.
- In-depth experience with leading APM platforms such as Dynatrace, AppDynamics, New Relic, Instana, or Datadog APM.
- Strong understanding of distributed applications, microservices, container orchestration (Kubernetes), and service mesh technologies.
- Skilled in analysing transaction traces, identifying bottlenecks, and pinpointing slow database calls or code-level inefficiencies.
- Proficient in synthetic monitoring, real user monitoring (RUM), and load testing techniques.
- Familiarity with telemetry pipelines, correlation of metrics/logs/traces, and service-level indicators (SLIs).
- Capability to use data to produce baselines, detect anomalies, and support proactive performance tuning.