Lead Site Reliability Engineer

Judge Group

Philadelphia, PA

JOB DETAILS
SALARY
$150,000–$180,000 Per Year
SKILLS
Amazon Web Services (AWS), Automation, Bash Scripting, Best Practices, Business Continuity Planning (BCP), Cloud Computing, Communication Skills, Continuous Deployment/Delivery, Continuous Integration, Customer Relations, DevOps, Disaster Recovery, Fortune 500 Customers, GCP (Good Clinical Practices), High Availability, Incident Management, Incident Response, Large-Scale Systems, Leadership, Mentoring, Microsoft Windows Azure, Production Management, Production Systems, Python Programming/Scripting Language, Quality Assurance, Release Management/Engineering, Reliability Engineering, Root Cause Analysis, Scripting (Scripting Languages), Service Level Agreement (SLA), Short Messaging Service (SMS), Software as a Service (SaaS), System Operations, Systems Administration/Management, Systems Reliability, Systems Scalability, Team Lead/Manager, Technical Leadership
LOCATION
Philadelphia, PA
POSTED
30+ days ago
Location: Philadelphia, PA
Salary: $150,000.00 USD Annually - $180,000.00 USD Annually
Description:

We are seeking a Lead Site Reliability Engineer (SRE) who combines deep technical expertise with strong leadership and client-facing capabilities. This is a high-impact role responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure and kiosk platform.

You will lead a team of engineers while remaining hands-on, owning uptime, SLAs, and incident management, and driving long-term improvements in system resilience and operational maturity. This role also requires working closely with Fortune 500 clients, translating complex technical concepts into clear, business-friendly insights.

What Makes This Role Unique

This is a rare opportunity for a hybrid leader who can:

  • Operate as a hands-on SRE expert
  • Lead and mentor a team of engineers
  • Act as a client-facing technical advisor
  • Drive both real-time operations and long-term reliability strategy

Key Responsibilities:

Reliability & Operations

  • Own platform uptime, SLAs, and overall system reliability
  • Lead incident response, root cause analysis, and postmortems
  • Develop and maintain disaster recovery and business continuity plans

Infrastructure & Automation

  • Design, build, and optimize cloud infrastructure and Kubernetes environments
  • Automate deployments and operational tasks using CI/CD and Infrastructure-as-Code (Terraform preferred)
  • Improve system scalability, performance, and resilience

Observability & Monitoring

  • Implement and enhance monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, New Relic)
  • Establish operational standards, runbooks, and best practices

Leadership & Collaboration

  • Lead, mentor, and develop a team of ~6 engineers
  • Partner with platform engineering, QA, and development teams to ensure operational readiness
  • Serve as a technical point of contact for clients, clearly communicating system health, risks, and solutions

Required Qualifications:

  • 8+ years of experience in SRE, DevOps, or Platform Engineering
  • 2+ years in a lead or managerial role
  • Strong expertise in:
    • Cloud infrastructure (AWS, Azure, or GCP)
    • Kubernetes and containerized environments
    • CI/CD pipelines and release engineering
    • Infrastructure-as-Code (Terraform preferred)
  • Proficiency in scripting/automation (Python, Bash, or Go)
  • Deep understanding of observability, monitoring, and logging systems
  • Experience with GitOps workflows (e.g., ArgoCD)
  • Proven experience managing production systems with strict uptime requirements

Preferred Experience :

  • Client-facing experience in enterprise or SaaS environments (required)
  • Experience communicating with non-technical stakeholders and Fortune 500 clients
  • Background in high-availability systems and large-scale distributed environments

What We’re Looking For :

  • A hands-on technical leader who can balance execution and strategy
  • Strong communicator with executive presence
  • Someone who thrives in high-ownership, fast-paced environments
  • A mentor who can elevate team performance and operational excellence

    By providing your phone number, you consent to: (1) receive automated text messages and calls from the Judge Group, Inc. and its affiliates (collectively "Judge") to such phone number regarding job opportunities, your job application, and for other related purposes. Message & data rates apply and message frequency may vary. Consistent with Judge's Privacy Policy, information obtained from your consent will not be shared with third parties for marketing/promotional purposes. Reply STOP to opt out of receiving telephone calls and text messages from Judge and HELP for help.


    Contact: arawat@judge.com
    This job and many more are available through The Judge Group. Find us on the web at www.judge.com

    About the Company

    J

    Judge Group

    The Judge Group Inc., is a leading professional services firm specializing in talent, technology, and learning solutions. We consult, staff, train, and solve. Through our work we make people and organizations better. Our services are successfully delivered through a network of more than 30 offices across the United States, Canada, and India.

    The Judge Group is proud to partner with the best and brightest companies in business today, including over 60 of the Fortune 100. We serve organizations in financial services, healthcare, life sciences, insurance, government (including aerospace and defense), manufacturing, and technology and telecommunications. If you would like to learn more about The Judge Group visit www.judge.com or call toll free (800) 360-4474.

    COMPANY SIZE
    5,000 to 9,999 employees
    INDUSTRY
    Computer/IT Services
    FOUNDED
    1970
    WEBSITE
    https://www.judge.com