Proven experience implementing and optimizing application performance monitoring and observability solutions, including metrics collection, distributed tracing, and log aggregation, to ensure high availability, rapid incident response, and continuous performance improvement across production systems. The primary objective of this role is to oversee and ensure reliability and availability of critical internally and externally facing applications and services hosted on a combination of on-premises and cloud-based infrastructure.