Long Beach, CA30+ days ago
About You: 3+ years of Similar Site Reliability Engineering, Automation or DevOps role Kubernetes ecosystem experience/familiarity Proficient in cloud services and architectures (GCP preferred) Knowledge of deployment and infrastructure-as-code frameworks (e.g., Helm, Terraform) Nice to haves but not required: Advanced proficiency with Kubernetes Experience with deployments across cloud and on-premises environments Familiarity with GitLab Heavy experience with production monitoring, observability best practices and tools, log aggregation/analysis (e.g., Prometheus, Grafana, Loki) Experience with jFrog Artifactory Knowledge of Configuration Management tools (e.g., Ansible, Puppet, others) Experience with MLOps or Machine Learning Frameworks. Support infrastructural growth across multiple sites (Cloud and On-Premises) Ensure continued up-time on our services, processes, and infrastructure throughout the company and across all our sites Develop roadmaps for automated software deployments across the company Champion automation to reduce toil and increase development velocity Help define and instrument Service-Level Objectives Leverage Configuration Management to build and maintain consistency across services Apply infrastructure-as-code methodologies across configuration, orchestration, and elsewhere.