What you will be doing: + Design and implement runtime features that orchestrate the lifecycle of runtime components across thousands of Kubernetes clusters without manual intervention + Build and maintain the systems that configure, package, validate, and distribute accelerated compute components + Develop Kubernetes controllers, CRDs, and operators that automate runtime installation, upgrade, and rollback operations with API driven workflows What we need to see: + Bachelors in Computer Science, or equivalent experience + 8+ years of professional experience, with at least 3 years of experience with Kubernetes development + Experience building production Kubernetes systems with significant expertise in controllers, operators, and CustomResourceDefinitions + Strong proficiency in Go and experience building scalable Go services that manage complex distributed systems + Hands-on experience with Helm, Kustomize, and managing Kubernetes manifest packaging and templating + Demonstrated ability to design and implement automation systems that replace manual processes with reliable, self-service tooling Ways to stand out from the crowd: + Experience working with NVIDIA Kubernetes components such as GPU operator, device plugins, or other HPC components in large scale production environments + Deep familiarity with OCI registries, artifact signing, SBOM generation, and supply chain security practices + Experience building multi-tenant platform services with focus on API design, versioning, and backward compatibility + Track record of migrating legacy systems to modern, automated platforms while maintaining zero-downtime operations and contributions to upstream Kubernetes/CNCF projects or experience extending Kubernetes API machinery + Deep understanding of Kubernetes architecture including API machinery, admission controllers, and resource lifecycle management NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing and Visualization. The Runtime team is responsible for providing an NVIDIA-Accelerated Kubernetes runtime that can be applied to any cluster using NVIDIA accelerators, empowering engineers with automation-first, self-service tools that minimize manual effort while enhancing reliability and reproducibility.