$142,800–$274,800 Per Year
Architectural Services, Artificial Intelligence (AI), CUDA (Compute Unified Device Architecture), Cloud Computing, Communication Skills, Computer Science, Data Modeling, Debugging Skills, Device Drivers, Ecosystems, Engineering, GPU (Graphics Processing Unit), Hardware Virtualization, High Reliability, Infrastructure Software, Mentoring, Microsoft Product Family, Microsoft Windows Azure, Network System Hardware, Operating Systems, Performance Management, Performance Tuning/Optimization, Problem Solving Skills, Resource Management, Resource Utilization, Software Engineering, Team Player, Technical Leadership, Technical/Engineering Design, VMS Operating System, Vehicle Fleets, Virtual Machine (VM), Virtualization, Virtualization Software, Web Client Plug-ins
The CoreAI GPU Infrastructure team builds the foundational accelerated compute platforms that power largescale AI training and inference across Azure. Our mission is to deliver secure, reliable, and highly efficient GPU infrastructure that enables multitenant AI systems at global scale while maximizing utilization, performance, and developer productivity.
This role sits at the intersection of cloud infrastructure, systems software, virtualization, and container platforms, working closely with CoreAI, Azure Infrastructure, OS, Networking, and Hardware teams to deliver end-to-end platform capabilities.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
Responsibilities
As the Principal engineer on the team, your responsibilities include:
- Design and build GPU accelerated infrastructure for training and inference workloads, spanning bare metal, virtual machines, and containerized environments.
- Develop systems for GPU device management, scheduling, isolation, and sharing (e.g., partial GPU allocation, multitenant usage).
- Build and operate advanced orchestration and resource governance scenarios using platforms such as AKS, Dynamic Resource Allocation (DRA), and related Kubernetes ecosystem capabilities to enable fair sharing, isolation, and efficient utilization of accelerated resources.
- Build and evolve virtualization and container stacks to support modern AI workloads, including secure and confidential compute scenarios.
- Optimize performance, reliability, and utilization across large GPU fleets, including scaleup and scale out configurations.
- Partner with networking and storage teams to enable high performance interconnects (e.g., RDMA/InfiniBand class networking) for distributed workloads.
- Drive end-to-end platform features from design through production, including observability, diagnostics, and operational excellence.
- Influence platform architecture and technical direction across teams through design reviews and technical leadership.
Qualifications
Required Qualifications:
- Bachelors Degree in Computer Science or related technical field and 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, Python or equivalent experience.
Other Requirements:
- Proven ability to design and operate largescale, production infrastructure with high reliability and performance requirements.
- Strong problem-solving skills and the ability to debug complex, cross layer systems issues.
- Demonstrated technical leadership, including mentoring engineers and driving cross team architectural alignment.
- Hands-on experience with virtualization and/or container platforms (e.g., VMs, Kubernetes, container runtimes).
- Strong collaboration and communication skills, with the ability to work across organizational boundaries.
Preferred Qualifications:
- Familiarity with distributed training and inference stacks (e.g., NCCL style collectives, model/data parallelism).
- Experience in building or operating multitenant AI platforms in cloud environments.
- Familiarity with high performance networking and low latency communication stacks.
- Familiarity with GPU accelerated computing (e.g., CUDA, GPU drivers, device plugins, or runtime integration).
- Familiarity with GPU virtualization, passthrough, or partitioning technologies.
- Knowledge of confidential computing, trusted execution environments, or hardware-backed isolation.
Impact & Growth:
- Work on mission critical infrastructure that directly powers largescale AI systems.
- Influence the future of cloud GPU platforms used by internal and external customers.
- Collaborate with experts across OS, hardware, networking, and AI platform teams.
- Opportunity to grow as a technical leader, shaping long term platform strategy.
Software Engineering IC5 - The typical base pay range for this role across the U.S. is USD $142,800 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.
M
Microsoft
DO WHAT YOU LOVE
Make your mark on the world’s most used technologies. Develop the next hit mobile application. Pioneer a startup that could be the next big thing. At Microsoft, you choose your path.
Headquartered in Redmond, Washington, Microsoft is a top innovator in both the consumer and enterprise technology industry. Just a few of the many things our products do are unleash creativity, connect businesses, and make learning more fun. But our continued success is based on one thing: our employees. We hire amazing, talented people and give them the opportunities—and the tools—to succeed.
WHY MICROSOFT?
As a Microsoft employee, you’re surrounded by a diverse group of the smartest people in your field. This fosters new ideas, better business results, and creates a dynamic work environment. In the office, you’re constantly challenged and supported by your colleagues. Every day holds something new and exciting.
We also offer unparalleled depth and breadth of career opportunities. As an industry leader in multiple fields, working for Microsoft means being able to do whatever you feel passionate about—and being able to make an impact in that field. From day one, we give our employees significant responsibility. This means that you’ll know that you directly contributed to something that has a positive impact on people worldwide. Whether you choose to work in management, dive deep into the newest technology, or explore multiple professions, you’ll find everything you need at Microsoft to drive your career—and to make a difference.
WE GET IT – YOU’RE MORE THAN YOUR JOB
Everyone works differently and is motivated by different things. We also understand that there’s more to you than your job. That’s why we offer competitive pay and a wide assortment of benefits-- to help you make the most of life at work and away from it.
GET THE BALL ROLLING