Senior Staff Engineer, Software

Celestica International LP

Richardson, TX

JOB DETAILS
SKILLS
Aerospace and Defense, Ansible, Artificial Intelligence (AI), Automation, Aviation Industry, BGP, Best Practices, Building Systems, Capital Equipment, Cloud Computing, Communication Skills, Computer Networks, Computer Programming, Concurrency, Cross-Functional, Customer Support/Service, DHCP (Dynamic Host Configuration Protocol), Debugging Skills, Diagnostics Solutions/Software, Distributed Computing, GPU (Graphics Processing Unit), Identify Issues, Integrated Circuits (ICs), Layer 2 Protocols, Layer 3 Protocols, Manufacturing, Medical Equipment, Medical Products, Mentoring, Network Architecture/Engineering, Network Operations Center, Network Support, Operational Strategy, Operational Support, Performance Analysis, Performance Tuning/Optimization, Problem Solving Skills, Product Development, Production Support, Production Systems, Python Programming/Scripting Language, Reliability Engineering, Root Cause Analysis, Scalable System Development, Software Design, Software Engineering, Supply Chain, System Architecture, Systems Engineering, Systems Scalability, Team Player, Technical Leadership, Telemetry
LOCATION
Richardson, TX
POSTED
Today
Req ID: 129032
Region: Americas
Country: USA
State/Province: Texas
City: Richardson

General Overview

Functional Area: Engineering
Career Stream: Design - Software Engineering
Job Code: SSE-ENG-DSE
Job Level: Level 11
IC/MGR: Individual Contributor
Direct/Indirect Indicator: Indirect

Summary

We are seeking a Senior Staff Engineer to lead the design and development of next-generation AI infrastructure platforms focused on GPU-based data centers, networking, and orchestration systems.

This is a high-impact, hands-on technical leadership role where you will architect and build systems that enable deployment, monitoring, and optimization of large-scale infrastructure supporting AI workloads across modern data center environments.

You will operate at the intersection of:
  • Data center networking (L2/L3, AI fabrics)
  • Infrastructure management, monitoring, and diagnostics


This role requires deep technical expertise along with the ability to drive end-to-end solutions from architecture through deployment and troubleshooting.

Detailed Description

  • Lead the architecture, design, and development of scalable AI infrastructure platforms supporting GPU-based data center environments
  • Build and enhance orchestration systems responsible for infrastructure deployment, provisioning, monitoring, and lifecycle management
  • Design distributed systems with a focus on scalability, resiliency, fault tolerance, concurrency, and performance optimization
  • Develop infrastructure observability and diagnostics capabilities across GPU, networking, and storage environments
  • Define telemetry, health monitoring, and performance validation strategies for large-scale AI infrastructure deployments
  • Develop and support data center networking and orchestration workflows including ZTP, DHCP, provisioning, and automated infrastructure configuration
  • Work across modern AI fabric and data center networking architectures including Clos fabrics, EVPN, and L2/L3 networking environments
  • Write high-performance backend software and infrastructure services using Python or Go within Kubernetes-based environments
  • Troubleshoot and resolve complex infrastructure, networking, orchestration, and performance issues in live production data center environments
  • Lead root cause analysis efforts and drive issues through resolution across software, networking, and infrastructure layers
  • Partner cross-functionally with engineering, hardware, platform, lab, and customer teams to support deployments and operational success
  • Drive technical direction, architecture decisions, engineering best practices, and mentorship across the organization
  • Translate real-world deployment challenges into scalable engineering solutions that improve reliability, automation, and operational efficiency
  • Operate as a hands-on technical leader capable of driving initiatives from architecture and development through deployment and production support


Knowledge/Skills/Competencies

Required

  • 12+ years of experience in software engineering focused on infrastructure, distributed systems, networking, or large-scale platform development
  • Strong expertise in data center networking fundamentals including:
    • L2/L3 networking
    • BGP and EVPN
    • Clos fabrics and AI networking architectures
  • Proven experience designing and building scalable distributed systems in production environments
  • Hands-on experience with infrastructure orchestration, provisioning, and large-scale data center deployments
  • Strong programming experience in Python or Go
  • Experience building systems within Kubernetes-based environments
  • Strong understanding of system scalability, concurrency, resiliency, and performance optimization
  • Demonstrated ability to troubleshoot and debug complex multi-layer production systems
  • Strong communication and collaboration skills with the ability to work across technical and non-technical teams


Preferred

  • Experience with AI/ML infrastructure, GPU clusters, or high-performance computing (HPC) environments
  • Experience with AI infrastructure monitoring, observability, and diagnostics platforms
  • Familiarity with AI workload orchestration and scheduling systems
  • Experience with infrastructure automation tools such as Ansible
  • Experience supporting customer deployments and external stakeholder engagements
  • Background supporting large-scale data center or cloud infrastructure platforms


Typical Experience

  • 12+ Years


Typical Education

Bachelor degree or consideration of an equivalent combination of education and experience.

Educational Requirements may vary by Geography

Notes

This job description is not intended to be an exhaustive list of all duties and responsibilities of the position. Employees are held accountable for all duties of the job. Job duties and the % of time identified for any function are subject to change at any time.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
Celestica's policy on equal employment opportunity prohibits discrimination based on race, color, creed, religion, national origin, gender, sexual orientation, gender identity, age, marital status, veteran or disability status, or other characteristics protected by law.
This policy applies to hiring, promotion, discharge, pay, fringe benefits, job training, classification, referral and other aspects of employment and also states that retaliation against a person who files a charge of discrimination, participates in a discrimination proceeding, or otherwise opposes an unlawful employment practice will not be tolerated. All information will be kept confidential according to EEO guidelines.

COMPANY OVERVIEW:
Celestica (NYSE, TSX: CLS) enables the world's best brands. Through our recognized customer-centric approach, we partner with leading companies in Aerospace and Defense, Communications, Enterprise, HealthTech, Industrial, Capital Equipment and Energy to deliver solutions for their most complex challenges. As a leader in design, manufacturing, hardware platform and supply chain solutions, Celestica brings global expertise and insight at every stage of product development - from drawing board to full-scale production and after-market services for products from advanced medical devices, to highly engineered aviation systems, to next-generation hardware platform solutions for the Cloud. Headquartered in Toronto, with talented teams spanning 40+ locations in 13 countries across the Americas, Europe and Asia, we imagine, develop and deliver a better future with our customers.

Celestica would like to thank all applicants, however, only qualified applicants will be contacted.
Celestica does not accept unsolicited resumes from recruitment agencies or fee based recruitment services.

This location is a US ITAR facility and these positions will involve the release of export controlled goods either directly to employees or through the employee's movement within the facility. As such, Celestica will require necessary information from all applicants upon an applicant's acceptance of employment to determine if any export control exemptions or licenses must be filed.

About the Company

C

Celestica International LP