ResponsibilitiesOversee the deployment and management of containerized applications using Kubernetes, ensuring optimal performance and availabilityContribute to strategic planning regarding how the infrastructure solutions evolve to match the requirements of Data Center partnersLead the design, implementation, and maintenance of scalable and reliable systems on AWS and/or on‑premiseUtilize Terraform for infrastructure as code to automate the provisioning and management of cloud resourcesMonitor system performance and uptime, ensuring systems meet established service level objectives (SLOs)Support SOC2 security compliance requirements for data handlingMentor and guide team members in DevOps practices, promoting a culture of reliability and excellenceAdvocate for automation of operational tasks to enhance efficiency and reduce manual interventionCollaborate with cross‑functional teams to build and maintain CI/CD pipelinesTroubleshoot and resolve complex production issues, conducting root cause analysis and implementing corrective actionsParticipate in on‑call rotations and incident response teamsAssist in capacity planning, performance tuning, and technical decision‑makingDrive continuous improvement initiatives for processes and infrastructureMinimum Qualifications8+ years of development experience including extensive experience in platform engineering, SRE, or distributed systems, with clear senior or principal‑level impactExperience designing and operating infrastructure across on‑premises and cloud environmentsStrong proficiency in container orchestration, particularly KubernetesStrong proficiency with AWS services and architectureHands‑on experience with Terraform for infrastructure automationFamiliarity with monitoring tools (Prometheus, Grafana, or similar) and observability best practicesExcellent problem‑solving skills, leadership abilities, and attention to detailStrong communication and collaboration skills, with experience in driving technical outcomesWillingness to travel up to 20% of timeEnhanced Qualifications (Nice to Have)Bachelor's degree in Computer Science, Engineering, or a related fieldExperience supporting or enabling MLOps platforms, model deployment pipelines, or ML‑adjacent infrastructureAI Workload scheduling using KubernetesKnowledge of Apache Spark for large‑scale data processingKnowledge of database technologies (SQL, NoSQL)Understanding of networking concepts and security best practicesSalary Range$180,000 to $210,000 base compensation depending on experience and stock options. Karman, the company's distributed AI platform powered by a custom NVIDIA module, is transforming the way utility companies operate the grid edge and will enable data centers to unlock more compute for the same provisioned power.