Senior Software QA Test Development Engineer - Diagnostics

NVIDIA Corporation

Santa Clara, CA

JOB DETAILS
SKILLS
Agile Programming Methodologies, Ansible, Artificial Intelligence (AI), Automation, Benchmarking, Bug Tracking/Defect Management, Buses, C Programming Language, C++ Programming Language, CPU (Central Processing Unit), CUDA (Compute Unified Device Architecture), CentOS, Computer Firmware, Continuous Deployment/Delivery, Continuous Integration, Debugging Skills, Deep Learning, DevOps, Docker, Fedora Linux, GPU (Graphics Processing Unit), Gaming, Gerrit, GitHub, High Tech Industry, Identify Issues, Interpersonal Skills, Java, JavaScript, Jenkins, K Virtual Machine (KVM), Linux Operating System, Mathematics, Memory Hardware, Microsoft Hyper-V, Natural Language Processing (NLP), Network Operations Center, Network Protocols, OEM (Original Equipment Manufacturer), OpenCL, Operating Systems, PCI Express (PCI-E), Parallel Programming, Physics, Process Improvement, Python Programming/Scripting Language, Quality Engineering, Quality Metrics, Red Hat Linux Operating System, Reliability Analysis, Reliability Testing, Root Cause Analysis, Server Programming/Applications, Software Design, Software Development, Software Testing, SuSE Linux, Telemetry, Test Automation, Test Case, Test Plan/Schedule, Test Tools, Testing, Ubuntu, Unix Shell Programming, User Interface/Experience (UI/UX), VMWare, Validation Testing, Vehicle Driving, Virtualization
LOCATION
Santa Clara, CA
POSTED
2 days ago
NVIDIA is the world leader in GPU Computing. We are passionate about markets include gaming, automotive, vision, HPC, datacenters and networking in addition to our traditional OEM business. NVIDIA is also well positioned as the 'AI Computing Company', and NVIDIA GPUs are the brains powering Deep Learning software frameworks, analytics, data centers, and driving autonomous vehicles. We have some of the most experienced and dedicated people in the world working for us. If you are dedicated, forward-thinking, and hard-working technical people across countries sounds exciting, this job is for you. NVIDIA is looking for an outstanding individual who thrives in a diverse work environment, has outstanding interpersonal skills and possesses a strong sense of engagement and continuous process improvement. This candidate must have enterprise server integration, strong Linux experience, reliability testing with various telemetries, scale out cluster, test plan development, track record in developing AI tools and NLP, DevOps, CI/CD experience to join our platform SWQA team.

What you'll be doing:
  • Responsible for the development and execution of NVIDIA HGX/DGX/MGX platform test plan on servers, OS, FW and CUDA SW stack from design doc.
  • Installing and testing various systems OS, server firmware and SW stack.
  • Drive support for root cause analysis on reliability and validation test failures to identify root cause(s) and achieve mitigation.
  • Build, develop/debug server and OS level automation front-end and back-end framework and tests
  • Review partner and supplier test results and prescribe additional reliability testing on components, servers, and packaging as needed.
  • Work in an agile software development team with very high production quality standards.
  • Manage bug lifecycle and collaborate with inter-groups to drive for solutions.
What we need to see:
  • Bachelor's Degree (or equivalent experience) in a STEM (Science, Technology, Engineering, Math or Physics) field
  • 5+ years proven experience; or master's degree.
  • Proven years of OS and server level automation, CI/CD process and DevOps experience using Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript
  • Strong server and Linux(Ubuntu, RedHat, CentOS, SuSE, Fedora and etc...) troubleshooting and debugging experience in a bare-metal and KVM/VMWare/Hyper-V environment.
  • Good knowledge and hands-on experience in model testing, AI tools/frameworks (TensorFlow, Pytorch, Cursor and etc...), NLP and LLM benchmarking
  • Experience in using AI development tools for test plans creation, test cases development and test cases automation
  • Strong experience in FW, BMC/OpenBMC, Network protocol, internal/external enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI spec, Redfish - huge plus
  • Proven years of experience in GitHub/Gitlab/Gerrit, PXE, SLURM, Stack/Kubernetes/Docker) - huge plus
Ways to stand out from the crowd:
  • AI related tools, LLM and NLP.
  • Experience working with NVIDIA GPU hardware is a strong plus.
  • Good to have solid understanding of virtualization in Linux (KVM, Docker orchestrated with Kubernetes)
  • Background in parallel programming ideally CUDA/OpenCL is a plus
    Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 140,000 USD - 224,250 USD for Level 3, and 168,000 USD - 270,250 USD for Level 4.

    You will also be eligible for equity and benefits .

    Applications for this job will be accepted at least until April 14, 2026.

    This posting is for an existing vacancy.

    NVIDIA uses AI tools in its recruiting processes.

    NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

    About the Company

    N

    NVIDIA Corporation

    Visualize your future . . . We Do
    NVIDIA is the world leader in graphics processing technologies, creating innovative, industry-changing products for computing, consumer electronics, and mobile devices. NVIDIA products are transforming visually-rich applications such as video games, film production, broadcasting, industrial design, space exploration, and medical imaging. We invest in our people and our technologies, support and fund industry research around the world, and consistently deliver high-quality products. NVIDIA's culture promotes and inspires a team of world-class employees to be at the top of their game. We've created an environment where talents are recognized and collaboration is valued. Our employees are shaping the world of tomorrow. . . today. We invite you to explore the opportunities available at NVIDIA to see what your future may hold.

    COMPANY SIZE
    10,000 employees or more
    INDUSTRY
    Computer Software
    FOUNDED
    1993
    WEBSITE
    http://www.nvidia.com