Site Reliability/Platform Engineer (Linux/ Kubernetes / Python) - 180-190K

Career Developers

Reston, VA

Apply

JOB DETAILS

SALARY

$180,000–$190,000 Per Year

SKILLS

Auditing, Automation, Automation Engineering, Business Support, Cloud Computing, Communication Skills, Computer Science, Continuous Deployment/Delivery, Continuous Improvement, Continuous Integration, Debugging Tools, DevOps, Documentation, Engineering, Health Maintenance, High Availability, Identify Issues, Incident Response, Infrastructure as a Service (IaaS), Leadership, Linux Operating System, Machine Tool, Metrics, Microsoft Windows Azure, Multitasking, On Call, Operational Improvement, Operational Strategy, Operations Processes, Packet Flows, Performance Management, Production Support, Production Systems, Python Programming/Scripting Language, Reliability Engineering, Research Skills, Root Cause Analysis, Systems Administration/Management, Team Player, Technical Presentation, Technical Research, Technical Support, Technical Writing, Test Automation, Test Plan/Schedule, Testing, Trend Analysis

LOCATION

Reston, VA

POSTED

14 days ago

Site Reliability Engineer (Kubernetes / OpenShift Platform Engineering)
Location: Reston, VA
Salary: 180-190K + 10% Bonus

Must have the following: on-prem Kubernetes engineering, OpenShift, Platform Engineering, Observability tools, Incident response, Automation, Production troubleshooting, Linux environments

Responsibilities:

Maintain the health, stability, and reliability of core technical platforms and platform services supporting business continuity and high availability.
Improve end-to-end platform observability to ensure system performance, incidents, and trends are proactively identified and addressed.
Lead incident response efforts, root-cause analysis, and postmortems to continuously improve platform reliability and reduce recurring issues.
Partner with development teams to troubleshoot deployment, routing, ingress, and configuration issues within Kubernetes/OpenShift environments.
Build and maintain automated deployment pipelines supporting engineering, development, and data teams.
Develop, test, and deploy automation solutions that reduce manual intervention and improve operational efficiency.
Lead the rollout of new platform services, features, and capabilities across hybrid infrastructure environments.
Operate and support platform services across on-premise infrastructure and Azure cloud services.
Maintain operational documentation, deployment procedures, incident response plans, and technical runbooks.
Participate in on-call rotation supporting production environments and critical infrastructure systems.
Assist with additional technical initiatives and operational responsibilities as needed.

Requirements:

Bachelor's degree in Computer Science or related field, or equivalent practical experience.
4–5+ years of experience in Kubernetes Engineering, Site Reliability Engineering, Platform Engineering, or similar infrastructure-focused roles.
Strong hands-on Kubernetes engineering experience, including workload management, operators, routing/ingress, cluster administration, and performance management.
Experience managing and supporting OpenShift environments is highly preferred.
Experience deploying and supporting platform services and observability tooling.
Strong troubleshooting skills across logs, metrics, traces, packet captures, and Kubernetes debugging tools.
Strong understanding of observability platforms and connecting alerts, incidents, and operational trends to actionable outcomes.
Experience working within regulated or heavily audited environments preferred.
Strong communication skills with the ability to document technical procedures and operational activities thoroughly.
Ability to manage multiple priorities in a dynamic, fast-paced environment.
Strong collaboration skills with the ability to work effectively across engineering and infrastructure teams.
Experience conducting independent technical research and presenting findings to leadership and peers.
Proof of eligibility to work in the United States required.

Site Reliability Engineer, Kubernetes Engineer, OpenShift Engineer, Platform Engineer, DevOps Engineer, Kubernetes administration, OpenShift platform, cluster management, routing ingress, observability tools, Prometheus, Grafana, Datadog, incident response, production support, infrastructure engineer, automation engineer, CI/CD pipelines, platform reliability, troubleshooting Kubernetes, container orchestration, cloud infrastructure, Azure cloud, Linux systems, platform services, SRE jobs, enterprise infrastructure, root cause analysis, deployment automation, platform operations, production troubleshooting, hybrid infrastructure, site reliability, platform monitoring

Site Reliability Engineer, SRE, OpenShift engineer, Kubernetes engineer, Azure cloud engineer, platform engineer, DevOps engineer, observability, Grafana, Prometheus, Datadog, HashiCorp Vault, Kafka, AMQ, Redis, CI/CD, automation, Bash scripting, Python scripting, cloud infrastructure, hybrid cloud, data center, reliability engineering, incident response, root cause analysis, container platform, cluster management, Azure infrastructure, production support, platform reliability, DevOps, monitoring tools, automation engineer, enterprise infrastructure, platform services, site reliability, cloud platform, OpenShift administrator, Kubernetes troubleshooting

About the Company

Career Developers

Resume Resources

Free Resume Templates Free Resume Builder