Site Reliability Engineer, Global E-commerce

TikTok Inc

San Jose, CA

JOB DETAILS
SKILLS
Business Operations, Business Support, Capacity Management, Cloud Computing, Communication Skills, Computer Science, Computer Servers, Contingency Plans, Cross-Functional, Disaster Recovery, Distributed Computing, Go Programming Language (Golang), Identify Issues, Internet/Online Service, Java, Large-Scale Systems, Linux Operating System, Network Operations Center, Problem Solving Skills, Product Engineering, Production Systems, Programming Languages, Python Programming/Scripting Language, Reliability Engineering, Resource Management, System Architecture, Systems Administration/Management, Team Player, Time Management, eCommerce
LOCATION
San Jose, CA
POSTED
30+ days ago

The Global E-commerce Service Architecture team ensures the availability, scalability, and resilience of TikTok's e-commerce platform in the U.S., partnering closely with product and engineering teams to operate reliable, large-scale production systems. We are seeking a Site Reliability Engineer (SRE) to advance the stability and resilience of TikTok Global E-commerce services in the U.S. In this role, you will strengthen disaster recovery readiness, optimize infrastructure capacity, and elevate service stability.

Key Responsibilities:

  • Data Center Disaster Recovery: Ensure services maintain disaster recovery capabilities under normal operations, including contingency planning and drills, capacity assurance, and effective response in disaster scenarios.
  • Resource Management & Capacity Planning: Manage and plan server and compute resources, including resource restructuring, overall capacity planning, and dynamic scaling, to support reliable business deployment and operations.
  • Service Stability Improvement: Establish and enhance service monitoring systems to enable timely alerting on failures and rapid issue identification and resolution. Partner with Business stakeholders to conduct ongoing stability governance.

Minimum Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
  • Proficiency in at least one programming language (e.g., Go, Python, or Java).
  • Strong understanding of Linux systems, networking fundamentals, and distributed systems architecture.
  • Experience operating services in cloud-native or large-scale production environments.

Preferred Qualifications:

  • Experience in Site Reliability Engineering, infrastructure, or production engineering roles.
  • Experience supporting high-traffic e-commerce or internet platforms.
  • Experience in designing, operating, and troubleshooting large-scale distributed systems.
  • Strong communication and cross-functional collaboration skills, with a high sense of ownership and accountability.

About the Company

T

TikTok Inc