Technical Lead - System Validation Architect

Graphcore Ltd

Austin, TX

JOB DETAILS
SKILLS
Analysis Skills, Analysis Software, Artificial Intelligence (AI), Automation, Benchmarking, C Programming Language, C++ Programming Language, CPU (Central Processing Unit), Computer Firmware, Computer Programming, Computer Systems, Cross-Functional, Debugging Skills, Diversity, Hardware Development, Hardware Specification, High Speed I/O, Linux Operating System, Memory Hardware, Mentoring, Model Validation, Network Operations Center, PCI Express (PCI-E), Parallel Programming, Performance Analysis, Performance Engineering, Problem Solving Skills, Python Programming/Scripting Language, Quality Assurance Methodology, Software Design, Software Development, Software Engineering, System Architecture, System Validation, System-on-a-Chip (SoC), Systems Engineering, Systems Scalability, Team Player, Technical Leadership, Test Case, Test Plan/Schedule, Test Strategy, Testing, Validation Testing
LOCATION
Austin, TX
POSTED
30+ days ago

About us

Graphcore is one of the world's leading innovators in Artificial Intelligence compute.

It is developing hardware, software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.

As part of the SoftBank Group, Graphcore is a member of an elite family of companies responsible for some of the world's most transformative technologies. Together, they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.

Graphcore's teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists, silicon designers, software engineers and systems architects, Graphcore enjoys a culture of continuous learning and constant innovation.

Job Summary

We are seeking a Technical Lead - System Validation Architect to lead the architecture and execution of Linux-based validation frameworks for Arm-based data center SoCs. This role will define validation strategy, test coverage, and methodology across CPU, memory, interconnect, and high-speed I/O subsystems. You will provide technical leadership in validation architecture, automation, benchmarking, and debug to ensure robust system quality and scalability.

The Team

The Systems Validation Architecture team is responsible for defining and enabling scalable validation methodologies for Graphcore's next-generation AI compute platforms. The team collaborates closely with hardware, firmware, and systems engineering groups to deliver comprehensive validation coverage and high-quality system enablement.

Responsibilities and Duties

  • Define end-to-end validation strategy and coverage model:

  • Functional, stress, performance, and corner-case testing

  • Translate hardware specifications into structured, parameterized test plans

  • Guide the team in:

  • Selecting appropriate tools.

  • Defining workload models and parameter configurations

  • Establish standards for:

  • Test case definition (parameters, metrics, pass/fail criteria)

  • Result validation and reporting

  • Experience with multi-core and parallel programming, including workload scaling and CPU affinity management

  • Review Python-based automation, orchestration, and analysis

  • Collaborate with hardware, firmware, and system teams to debug issues

Candidate Profile

Essential:

  • Strong knowledge of Arm SoC architecture and Linux systems.
  • 8+ years of experience in system validation, performance engineering, or low-level systems development.
  • Deep understanding of CPU architecture, cache coherency, memory systems (DDR, HBM, NUMA), and high-speed I/O technologies such as PCIe.
  • Proven ability to define validation strategies, coverage models, and validation methodologies.
  • Hands-on experience using and tuning benchmarking tools such as stress-ng, fio, and iperf.
  • Strong Python programming skills for process automation, system coordination, and data examination.
  • Experience working with performance analysis software including perf and PMU counters.
  • Strong analytical, problem-solving, and ability to collaborate in multi-functional environments.

Desirable:

  • Experience working with large-scale or data center systems.
  • Strong programming skills in C/C++ and Python for system-level development.
  • Previous technical leadership or mentoring experience.
  • Experience with scalable validation infrastructure and automation frameworks.
  • Knowledge of AI infrastructure or hyperscale compute systems.

About the Company

G

Graphcore Ltd