Site Reliability Engineering (SRE) Platform Engineer (Lead)

ECLARO

Rochester, NY

JOB DETAILS
SKILLS
ADP, Agile Programming Methodologies, Analysis Skills, Application Programming Interface (API), Automation, Bar Code Scanners, Best Practices, Business Solutions, Cloud Computing, Communication Skills, Computer Programming, Configuration Management, Continuous Deployment/Delivery, Continuous Improvement, Continuous Integration, Cross-Functional, Data Analysis, Data Collection, Database Programming, Dental Insurance, DevOps, Distribution Warehousing, Diversity, Ecosystems, Embedded Systems, Engineering, Genetics, GitHub, IT Service Management (ITSM), ITIL (IT Infrastructure Library), Identify Issues, Improvement Metrics, Incident Management, Leadership, Machine Tool, Maintain Compliance, Microsoft .NET, Microsoft C# (C Sharp), Microsoft Hyper-V, Microsoft SQL Server, Microsoft System Center Operations Manager (SCOM), Microsoft Windows Azure, Mobile Devices, NetApp Storage Systems, Network Performance/Analysis, Operational Strategy, Oracle, Order Management, Performance Analysis, Performance Management, Predictive Modeling, Problem Solving Skills, Production Systems, Python Programming/Scripting Language, Regulatory Compliance, Reliability Engineering, Remedy, Reporting Dashboards, Risk, Risk Analysis, Root Cause Analysis, SQL (Structured Query Language), Scrum Project Management and Software Development, Software Engineering, Software Patches, Statistical Modeling, Team Player, Technical Leadership, Testing, VMWare, Validation Testing, Vendor/Supplier Evaluation, Vision Plan, Warehousing
LOCATION
Rochester, NY
POSTED
3 days ago
Site Reliability Engineering (SRE) Platform Engineer (Lead)
Job Number: 26-00672

Use your skills where innovative technology solutions begin. ECLARO is looking for a Site Reliability Engineering (SRE) Platform Engineer (Lead) for our client in Rochester, NY.

ECLARO’s client is a leading technology solutions provider, collaborating with customers to manage their needs and achieve success in their business goals. If you’re up to the challenge, then take a chance at this rewarding opportunity!

Position Overview:
  • As a Lead SRE Platform Engineer, will drive reliability engineering strategy and execution across critical IT Business Solutions platforms This role focuses on improving uptime, performance, and operational efficiency through software enhancements, observability, automation, and data-driven Root Cause Analysis (RCA).
  • Will serve as the technical lead for SRE practices—establishing monitoring standards, improving MELT (Metrics, Events, Logs, Traces) strategy, influencing tooling decisions, and partnering across infrastructure, development, operations, and vendor teams. This is a high-impact opportunity to build and mature reliability engineering capabilities from the ground up.

Responsibilities:
  • Reliability & Observability Leadership:
    • Define and mature SRE best practices across cloud and on-prem environments.
    • Design and implement comprehensive monitoring strategies using tools such as: Dynatrace, Datadog, Microsoft SCOM.
    • Develop dashboards, alerts, synthetic testing, and proactive monitoring capabilities.
    • Establish and evolve a MELT data strategy to improve service reliability.
    • Provide data-driven RCA investigations and implement preventative solutions.
  • Platform & Application Reliability:
    • Support and enhance reliability across:
      • Cloud & Infrastructure:
        • Microsoft Azure (Software, Storage, Azure Local)
        • Hyper-V and Legacy VMware Environments
        • NetApp and Pure Storage Platforms
        • Azure Log Analytics
        • Infrastructure as Code using Terraform
        • Migration from Azure DevOps to GitHub (strong GitHub experience, required)
      • Order Management Systems:
        • Azure-based, internally developed .NET / C# applications.
        • Internal message queuing systems.
        • Logging, analytics, and synthetic testing post-patching.
        • API-based integrations.
      • Workforce & Payroll Platforms:
        • Workday (Payroll)
        • ADP Vantage (Timekeeping)
      • Warehouse & Distribution Systems:
        • Blue Yonder Warehouse Management System (WMS)
        • Collect handheld voice picking devices.
        • Network analytics for identifying dead zones and connectivity issues.
        • Barcode scanners and device connectivity troubleshooting.
  • DevSecOps & Automation:
    • Lead CI / CD reliability improvements (Azure DevOps → GitHub transition critical).
    • Enhance pipeline automation with embedded security controls.
    • Advance Infrastructure-as-Code standards (Terraform).
    • Improve configuration management and change governance.
    • Drive automation to reduce manual intervention and operational risk.
  • ITSM & Incident Management:
    • Work within BMC ecosystem including:
      • BMC Helix
      • BMC Remedy
      • BMC Server Automation
    • Optimize automated incident generation (SCOM → BMC workflows).
    • Improve triage, escalation, and impact modeling across services.
    • Monitor vendor performance and escalate appropriately.
    • Participate in off-hour escalation support when required.
  • Strategic Impact:
    • Develop predictive reliability models using statistical techniques.
    • Identify systemic risk across production systems.
    • Guide tooling decisions (e.g., Dynatrace vs. Datadog or other observability platforms).
    • Ensure regulatory and operational compliance standards are met.
    • Facilitate cross-functional collaboration and document SRE procedures and planning artifacts.

Required Skills:
  • 5-7+ years of Software Engineering and Infrastructure / Database Engineering experience.
  • Deep expertise in:
    • DevSecOps practices
    • Observability Platforms
    • API Integrations
    • Performance Management Tools
    • ITIL Principles
    • ITSM Data Analytics
    • MELT Data Collection and Analysis
  • Experience in Azure cloud environments.
  • Strong analytical and problem-solving skills.
  • Demonstrated ability to influence technical direction.
  • Excellent communication and cross-team collaboration skills.
  • Continuous improvement mindset focused on reliability engineering.

Preferred Qualifications:
  • Strong programming experience in:
    • .NET / C#
    • Python
    • SQL
  • Experience with MSSQL (primary) and Oracle (limited).
  • Experience with GitHub (critical for upcoming transition).
  • Agile / Scrum experience.
  • Knowledge of Reliability-Centered Engineering and maintenance strategies.
  • Experience with synthetic testing and proactive validation post-deployment.
  • Bachelor's Degree in a related technical field.

If hired, you will enjoy the following ECLARO Benefits:
  • 401k Retirement Savings Plan administered by Merrill Lynch
  • Commuter Check Pretax Commuter Benefits
  • Eligibility to purchase Medical, Dental & Vision Insurance through ECLARO

If interested, you may contact:
Jeanine Hastings
jeanine.hastings@eclaro.com
646-755-9303
Jeanine Hastings | LinkedIn

Equal Opportunity Employer: ECLARO values diversity and does not discriminate based on Race, Color, Religion, Sex, Sexual Orientation, National Origin, Age, Genetic Information, Disability, Protected Veteran Status, or any other legally protected group status, in compliance with all applicable laws.

About the Company

E

ECLARO

Eclaro is a Business and Technology Consulting Firm that connects top talent with opportunities nationwide. We have direct access to Hiring Managers from leading Fortune 1000 organizations in almost every industry segment, with particular expertise in:

• Technology and Business Consulting
• Financial Services and Insurance
• Pharmaceuticals and Life Sciences
• Consumer Products, Public Sector, and Utilities

Eclaro provides fully customizable, comprehensive talent acquisition and management of seasoned professionals through a number of business models, including:

• Consulting
• Professional Hiring
• Global Integrated Delivery™
• Managed Services

Eclaro recruits and manages a staff of highly skilled individuals in an array of specialized disciplines enabling our clients to leverage new opportunities, respond to increased and changing demands, and increase their profitability.

Eclaro’s Management Team averages over 25 years of experience in partnering with clients in technical, corporate operations and human capital solutions. We hold ISO 9001:2008 certification and have achieved SOC 2 Type 2 certification in Security, Availability and Confidentiality. Eclaro’s decades of expertise and collaborative practice have proven that The Right People are The Answer.

COMPANY SIZE
500 to 999 employees
INDUSTRY
Staffing/Employment Agencies
FOUNDED
1999
WEBSITE
http://www.eclaroit.com