Site Reliability Engineer

Macpower Digital Assets Edge LLC

Raleigh, NC

Apply

JOB DETAILS

JOB TYPE

Full-time

SKILLS

Analysis Skills, Ansible, Authentication, Automation, Bash Scripting, Cloud Computing, Communication Skills, Continuous Improvement, Cross-Functional, DNS (Domain Name System), Distributed Computing, Go Programming Language (Golang), Identify Issues, Improvement Metrics, Incident Response, Kerberos, LDAP (Lightweight Directory Access Protocol), Linux Operating System, Metrics, Microsoft Windows Azure, Microsoft Windows Server, NFS (Network File System), Network Attached Storage (NAS), Operating Systems, Operational Improvement, Performance Tuning/Optimization, Process Improvement, Python Programming/Scripting Language, Red Hat Linux Operating System, Reliability Engineering, Scripting (Scripting Languages), Service Level Agreement (SLA), Software Engineering, Storage Area Network (SAN), System Integration (SI), Systems Administration/Management, Systems Scalability

LOCATION

Raleigh, NC

POSTED

30+ days ago

Job Overview: Site Reliability Engineer role focuses on reliability, scalability, and performance of enterprise platforms across cloud and on-prem environments. Position requires hands-on engineering with automation, observability, and cross-functional collaboration. Emphasizes metrics-driven improvements and operational excellence under pressure. Key Responsibilities: Design, implement, and maintain reliable, scalable, secure systems in cloud and on-prem setups. Manage distributed systems on Azure, Linux RHEL7+, and Windows Server 2019+. Build automation workflows using Python, Go, and Bash scripting. Develop Infrastructure-as-Code with Terraform and Ansible. Define, monitor, and refine SLIs, SLOs, and SLAs for service quality. Reduce operational toil through automation and process enhancements. Integrate systems with observability platforms for visibility and proactive issue detection. Troubleshoot incidents, lead response efforts, and conduct post-mortem analyses. Collaborate with software, infrastructure, and business teams for resilient services. Optimize reliability, performance, and maintainability with full ownership. Required Skills and Experience: Demonstrate proven experience as Site Reliability Engineer from software engineering, infrastructure, or operations background. Show hands-on expertise with Azure and enterprise OS like Linux RHEL7+ and Windows Server 2019+. Possess strong knowledge of networking and storage including NFS, SAN, and NAS. Understand authentication and naming services such as DNS, LDAP, Kerberos, and Centrify. Exhibit proficiency in Python, Go, Bash scripting, Terraform, and Ansible IaC tools. Design and monitor SLIs/SLOs/SLAs to drive reliability via metrics and automation. Integrate with observability platforms for logs, metrics, and tracing. Remain calm and structured during high-pressure incidents. Display strong communication and collaboration to influence cross-functional stakeholders. Maintain proactive, ownership mindset for continuous improvement. Key Skills: Site Reliability Engineering, Cloud Platforms, Azure, Linux RHEL7+, Windows Server 2019+, Networking Fundamentals, NFS, SAN, NAS, DNS, LDAP, Kerberos, Centrify, Python, Go, Bash, Terraform, Ansible, Infrastructure as Code, Observability Platforms, SLIs, SLOs, SLAs, TOIL Reduction, Incident Response, Post-Mortems, Automation, Metrics-Driven Engineering, System Reliability, Cross-Functional Collaboration, Communication Skills, Ownership Mindset.

About the Company

Macpower Digital Assets Edge LLC

Resume Resources

Free Resume Templates Free Resume Builder