Key responsibilities include day-to-day operational support, incident triage and resolution related to AI APIs, model serving, pipelines, and data flows; monitoring AI health, performance, quality, and cost; coordinating with platform and middleware teams; supporting deployment and configuration of AI services and agent frameworks; performing root cause analysis, and maintaining runbooks, FAQs, and operational documentation. The position requires 8–10 years of application or platform support experience with hands-on exposure to AI/ML environments, including AI platforms (e.g., Vertex AI or similar), LLM-based applications, RAG pipelines, AI APIs, Python scripting, and monitoring AI workloads for latency, usage, errors, and model drift.