Lead the design, deployment, and maintenance of highly available, multi-site infrastructure across GCP (preferred) and AWS/Azure environments.
Manage and optimize Kubernetes clusters, implement automated deployments, and ensure robust observability and monitoring across cloud services.
Build, maintain, and enhance infrastructure as code using Terraform, along with scripting and automation in Python, Go, Node.js, or similar languages.
Troubleshoot and resolve complex system incidents, ensuring reliability, scalability, and security of critical applications.
Collaborate with engineering, DevOps, and security teams to establish best practices for CI/CD pipelines, multi-cloud architecture, and operational workflows.
Drive initiatives to improve system performance, availability, and automation, while mentoring junior engineers and contributing to technical strategy.