Advocate engineering-wide improvements in reliability, observability and promote antifragility * Identify and drive down toil with creative innovation and automation * Participate in on-call If You Got It - We Want It * Extensive experience with enterprise scale continuous delivery environments * 10+ years of experience in a DevOps or SRE role * Development with JavaScript/Node.js/TypeScript in a Linux/Mac environment * Experience with IaC tools like Terraform (preferred) or similar * Experience with sustainable incident response in a blameless environment * Knowledge of cloud platforms (prefer AWS) and container + orchestration technologies * Experience with APM and Observability and related tools such as, New Relic, Splunk, CloudWatch, Prometheus, Grafana/Kibana, Sentry etc. * Chart the future of Cribl's observability and reliability systems and practices * Conceptualize and direct the evolution of our reliability metrics, programs and process based on the state of the art and industry best practices * Engage with Product and Engineering teams to improve service delivery and reliability across the entire software lifecycle * Measure and monitor all production systems with an eye towards availability, latency and overall system health * Uncover risks and seek out the sources of errors and instability in our production systems.