Senior Manager - AI Safety and Evaluation Engineering
Abu Dhabi Commercial Bank View all jobs
- Abu Dhabi
- Permanent
- Full-time
- To design, implement, and scale out technical measures to ensure the reliability, safety, and security of the bank's AI solutions, including rigorous evaluation frameworks for both predictive ML and Generative AI and standardized monitoring protocols, while serving as the technical liaison to second-line risk functions to ensure AI deployments meet the bank's risk and compliance standards, Awareness of AI Governance, AI Ethics, and Responsible AI principles is required, with a primary focus on technical practitioner responsibilities.
- Methodology Design: Define standardized evaluation rubrics and scoring systems for AI system performance, reliability, safety and safety. Tailor for different GenAI system archetypes (e.g., Conversational AI, RAG and variants, agents and agentic workflows, AI-enabled data access)
- Test Cases: Design comprehensive test cases covering deterministic logic, probabilistic outputs, and edge-case scenarios. This should include both benign as well as adversarial scenarios
- Explainability & Fairness: Implement XAI techniques (e.g., Shapley Values, Integrated Gradients) and unfair bias detection metrics to ensure model transparency and regulatory compliance - e.g., in lending and fraud
- Embedding into MLOps platform: work with MLOps platform product and engineering leaders to embed above as automated capabilities
- Integrated Observability Architecture: Partner with MLOps/ LLMOps teams to design and embed telemetry pipelines that capture inputs, outputs, and embeddings, ensuring real-time visibility into model health while maintaining strict PII and data privacy standards.
- Standardized Metric Frameworks: Establish "Golden Signals" (e.g., latency, drift, hallucination rates, and semantic similarity) and design automated "circuit-breakers" to intercept or reroute model traffic when performance or safety thresholds are breached.
- Act as a central consultant for individual AI project teams, providing training on evaluation best practices and safety protocols.
- Standardization: Establish and maintain a unified "Safety Stack" (tools for monitoring, testing, and explainability) and underlying methodology to ensure consistency across the bank's AI portfolio
- At least: 7 years of experience in ML Engineering, Data Science, or AI Evals/ Testing/ Safety Ideally with experience in financial sector
- Bachelor's degree in information technology, Computer Science, Engineering or a related discipline
- Core Skills: Expert Python proficiency; deep knowledge of statistical testing, XAI libraries, and LLM evaluation frameworks (e.g., Ragas, Giskard, Arize).
- Systems Design: Proven ability to design observability and monitoring systems at scale
- Leadership: Experience leading cross-functional initiatives or acting as a technical mentor/consultant.