Senior Network Engineer - Supercomputing

Institute Of Foundation Models

Sunnyvale, CA

JOB DETAILS
SALARY
$200,000–$400,000 Per Year
SKILLS
Artificial Intelligence (AI), BGP, C Programming Language, C++ Programming Language, Communication Systems, Environmental Work, GPU (Graphics Processing Unit), Go Programming Language (Golang), MPI, Network Administration/Management, Network Architecture/Engineering, Network Programming, Network Protocols, Python Programming/Scripting Language, Rust Programming Language, Supercomputing, TCP/IP (Transmission Control Protocol/Internet Protocol)
LOCATION
Sunnyvale, CA
POSTED
Today

This role involves designing, optimizing, and maintaining high-performance networking solutions for GPU supercomputing clusters supporting AI training and inference workloads.

Responsibilities include developing RDMA-based communication systems, implementing GPUDirect RDMA, automating network management with IaC tools, integrating solutions with Kubernetes, and troubleshooting network issues to ensure reliability and performance.

The ideal candidate has experience with NVIDIA RDMA technologies, HPC or AI network environments, programming in Python, Go, Rust, C/C++, and familiarity with networking protocols like RDMA, InfiniBand, TCP/IP, and BGP. Knowledge of cluster management (Slurm), communication frameworks (NCCL, MPI), and container orchestration is essential.

This position offers a salary range of $200,000 - $400,000, visa sponsorship, and benefits including health coverage, bonuses, 401K, paid leave, and more.

About the Company

I

Institute Of Foundation Models