Software Development Engineer - AI/LLM Network - Global Frontier Tech Research Program - 2027 Start

Beijing ByteDance Technology Co Ltd

Seattle, WA

JOB DETAILS
SKILLS
Analysis Skills, Artificial Intelligence (AI), Artificial Intelligence (AI) Agents, Broadband, C Programming Language, C++ Programming Language, Cloud Computing, Computer Engineering, Computer Networks, Computer Science, Computer Skills, Cost Control, Database Clustering, Frontier Programming Language, Hardware Design, High Availability, Intelligent Network, Localization, MPI, Network Administration/Management, Network Architect Software, Network Operations Center, Network Programming, Network Protocols, Network Software, Network Support, Network Switching, Network System Hardware, Network Systems, Onboarding, Performance Tuning/Optimization, Programming Languages, Protocol Stack, Python Programming/Scripting Language, RPC (Remote Procedure Call), Research & Development (R&D), Resource Utilization, Root Cause Analysis, Software Development, Software Engineering, Technical Research, Technical Support, Virtualization Software
LOCATION
Seattle, WA
POSTED
30+ days ago

We are looking for talented individuals to join our team in 2027. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at our Company.

Successful candidates must be able to commit to an onboarding date by end of year 2027. Please state your availability and graduation date clearly in your resume.

Team Introduction: ByteDance Networking brings together innovative ideas and technologies from network architecture, software defined networking (SDN), network virtualization, switch software and hardware co-design, and high-speed networking, to create hyperscale data-center networking solutions that power several of the most popular apps of the world such as Douyin and TikTok which serve hundreds of millions of users around the globe.

ByteDance Networking is responsible for designing, building, and operating the global, intelligent network infrastructure to meet the requirements of high availability, scalability, and high-performance. By joining this team, you will gain marketable software development and/or network operation experiences in data center networking at massive scale.

Topic Content: With the large-scale adoption of LLMs and AI agents, traditional cloud-native infrastructure can no longer meet the ultra-high performance and elasticity requirements of AI workloads.

Network and Observability: Research intelligent fault localization and root cause analysis for large-scale AI clusters, combined with intelligent tuning of time-series databases to improve cluster stability.

This topic aims to build a next-generation AI-native infrastructure to support the deployment of LLMs and AI agents, improve resource utilization, reduce costs, support elastic scaling, and drive the technological evolution of AI infrastructure.

Responsibilities:

  • Design, implementation and deployment of high-speed network technologies to support AI/LLM applications.
  • Design and development of platforms/systems for monitoring, analysis and diagnosis of large scale AI/LLM network.
  • Research and development of high-performance AI communication framework, network protocol stacks, and codesign optimization of host-network-application to improve the scalability, reliability and performance of AI/LLM network.
  • Building next generation AI network infrastructure supporting large scale heterogeneous network hardware with innovative and deployable solutions.Minimum Qualifications:
  • Individuals who are completing or recently completed a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline.

Preferred Qualifications:

  • Proficiency in computer network and network programming.
  • Proficiency in one or several mainstream programming languages, including C/C++, Python, Go and so on.
  • Be familiar with the latest advances in the area of high-speed network systems, including RDMA, congestion control, AI network optimization and so on.
  • Experience in developing high performance communication frameworks(including NCCL, MPI and RPC libraries) is a plus.
  • Experience in developing software systems for AI network diagnosis and performance optimization is a plus.

About the Company

B

Beijing ByteDance Technology Co Ltd