Senior AWS Cloud Site Reliability Engineer (SRE) with AWS Database experience PeratonSenior AWS Cloud Site Reliability Engineer (SRE) with AWS Database experienceHome, Texas$104,000–$166,000 / yearul style="margin-top:0in">Infrastructure Automation: Design, implement, and manage infrastructure as code (IaC) solutions using tools like AWS CloudFormation, Terraform or Helm Charts to automate continuous database deployment and scaling processes. The AWS Site Reliability Engineer (SRE) will collaborate closely with cross-functional teams, including development, quality assurance, and operations, to ensure seamless software releases and continuous improvement of our release processes.
Senior Site Reliability Engineer Parallel Domain IncSenior Site Reliability EngineerWA$145,000–$185,000 / yearWe're growing the team-two of these roles are open-and the work is substantive: multi-region GPU scheduling, Windows workloads on Kubernetes, large-scale batch simulation, and an enterprise product direction that will require rethinking parts of how we deploy and operate. You'll work across multi-region AWS infrastructure, operate Kubernetes at scale, and contribute directly to reliability, security, and deployment systems that the rest of the engineering org depends on.
Staff Infrastructure Reliability Engineer - Database & Storage Rocket Companies IncStaff Infrastructure Reliability Engineer - Database & StorageWARemote$180,100–$278,700 / yearThis role requires depth in design, collaboration with internal teams, a proactive approach to problem solving, and the ability to share complex ideas with senior leadership and secure their support. You have a proven history in architecting, building, scaling, and supporting cloud infrastructure technologies, specializing in database and storage services and can communicate the direct business impact of this work.
Graduate Reliability Engineer 29Metals Ltd.Graduate Reliability EngineerWAThe graduate program is a structured 24-month rotational program across key fixed plant areas, combining on-the-job experience with formal training to develop core reliability engineering capability, supported by ongoing development opportunities and regular performance reviews. We operate two long-life assets: Golden Grove in Western Australia and Capricorn Copper in Queensland, along with a portfolio of exploration interests, including a strategic tenement package and project in Redhill, Chile.
Customer Reliability Engineer, Hypershield (remote) CiscoCustomer Reliability Engineer, Hypershield (remote)Olympia, WARemote$158,200–$200,700 / yearThe applicable full salary ranges for this position, by specific state, are listed below: New York City Metro Area: $158,200.00 - $241,700.00 Non-Metro New York state & Washington state: $140,600.00 - $241,800.00 * For quota-based sales roles on Cisco's sales plan, the ranges provided in this posting include base pay and sales target incentive compensation combined. + 1+ years of experience securing operating system (OS) instances, applications, and/or distributed systems + 3+ year hands on experience operating Linux systems with experience in Kubernetes, cloud-native or container architecture + 2+ years of hands-on experience configuring and managing Cisco Nexus switches in production environments.
Senior Reliability Engineer (Michigan) OracleSenior Reliability Engineer (Michigan)Olympia, WA$120,100–$251,600 / yearOracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. **Preferred Skills / Certifications** Experience with commissioning reviews, reliability analytics, lifecycle cost analysis, or maintenance strategy design.
Senior AI Site Reliability Engineer OracleSenior AI Site Reliability EngineerOlympia, WA$79,200–$178,100 / yearcitizenship is required for this position, as the successful candidate will be required to obtain (and maintain) a U.S. government security clearance after hire._** **Required Skills** **Infrastructure & Reliability** + Experience building and operating high-availability, fault-tolerant systems + Strong understanding of distributed systems, performance monitoring, and resiliency patterns + Experience with incident response, root-cause analysis, and production troubleshooting **AI-Native Engineering (NEW)** + Hands-on experience applying Generative AI or Agentic AI (e.g., LangChain, AutoGPT, custom agents) to: + Infrastructure lifecycle management + Observability and anomaly detection + Incident response and remediation automation + Ability to design or integrate AI-driven workflows for operational efficiency and reliability + Familiarity with building or integrating autonomous agents for DevOps/SRE use cases **Cloud & Multi-Cloud Ecosystems** + Strong experience with **multi-cloud environments** (OCI, AWS/Azure) + Deep understanding of cloud infrastructure design, deployment, and resource optimization + Experience managing hybrid or cross-cloud architectures **DevOps/SRE Practices** + Advanced competency in CI/CD pipelines (Jenkins, Kubernetes) + Infrastructure as Code (Terraform) + Observability tools (Prometheus, Grafana) + Strong focus on **automation-first operations** **Data Technologies** - Proficiency in Data Warehousing platforms (e.g., Vertica, Snowflake) - Experience with ETL frameworks and large-scale data processing - Understanding of columnar storage systems **BI & Reporting** - Experience supporting or integrating BI tools (Tableau, Power BI, Oracle Analytics) **Programming & Tools** + Strong proficiency in Python, Java, or Go + Experience with Docker, Kubernetes, and shell scripting **Problem-Solving** + Strong troubleshooting skills with ability to perform root-cause analysis + Experience resolving complex production issues in distributed systems **Responsibilities** **Responsibilities** Work with the Site Reliability Engineering (SRE) team to take shared ownership of services and platform components. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
Senior Site Reliability Engineer OracleSenior Site Reliability EngineerOlympia, WA$79,200–$209,500 / yearcitizenship is required for this position, as the successful candidate will be required to obtain (and maintain) a U.S. government security clearance after hire._** **Required Skills** **Infrastructure & Reliability** Experience building and operating high-availability, fault-tolerant systems Strong understanding of distributed systems, performance monitoring, and resiliency patterns Experience with incident response, root-cause analysis, and production troubleshooting **Cloud Ecosystems** Experience with one or more cloud environments OCI, AWS/Azure **DevOps/SRE Practices** Advanced competency in CI/CD pipelines (Jenkins, Kubernetes) Infrastructure as Code (Terraform) Observability tools (Prometheus, Grafana) Strong focus on automation-first operations **Data Technologies** - Proficiency in Data Warehousing platforms (e.g., Vertica, Snowflake) - Experience with ETL frameworks and large-scale data processing - Understanding of columnar storage systems **Programming & Tools** Proficiency in Python, Java, or Go Experience with Docker, Kubernetes, and shell scripting **Problem-Solving** Strong troubleshooting skills with ability to perform root-cause analysis Experience resolving complex production issues in distributed systems **Operational Excellence** Apply DevOps/SRE practices to automate deployments and operations Enhance observability using Prometheus/Grafana and AI-driven insights **Incident Response** Participate in on-call rotations Implement preventative and automated remediation solutions **Collaboration** Work closely with engineers to execute technical roadmaps Contribute to code reviews and infrastructure improvements **What You Bring** 4+ years of software engineering, cloud infrastructure, SRE, or DevOps experience Proven ownership of production system reliability in cloud environments **Core Expertise** Cloud infrastructure design and automation Distributed systems and performance optimization Data warehousing and ETL frameworks **Technical Skills** Terraform, Docker, Kubernetes Observability stacks (Prometheus, Grafana) Python, Java, or Go **Additional Strengths** Strong problem-solving mindset with a focus on automation and scalability Experience improving system reliability through intelligent automation **Preferred Qualifications** Experience in healthcare or regulated environments (HIPAA, compliance frameworks) Experience working in environments requiring security clearance Experience building self-healing or autonomous infrastructure systems **Responsibilities** - Work with the Site Reliability Engineering (SRE) team to take shared ownership of services and platform components. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
Principal Site Reliability Engineer OraclePrincipal Site Reliability EngineerOlympia, WA$99,600–$234,600 / yearcitizenship is required for this position, as the successful candidate will be required to obtain (and maintain) a U.S. government security clearance after hire._** **Required Skills** **Infrastructure & Reliability** Experience building and operating high-availability, fault-tolerant systems Strong understanding of distributed systems, performance monitoring, and resiliency patterns Experience with incident response, root-cause analysis, and production troubleshooting **Cloud Ecosystems** Experience with one or more cloud environments OCI, AWS/Azure **DevOps/SRE Practices** Advanced competency in CI/CD pipelines (Jenkins, Kubernetes) Infrastructure as Code (Terraform) Observability tools (Prometheus, Grafana) Strong focus on automation-first operations **Data Technologies** - Proficiency in Data Warehousing platforms (e.g., Vertica, Snowflake) - Experience with ETL frameworks and large-scale data processing - Understanding of columnar storage systems **Programming & Tools** Proficiency in Python, Java, or Go Experience with Docker, Kubernetes, and shell scripting **Problem-Solving** Strong troubleshooting skills with ability to perform root-cause analysis Experience resolving complex production issues in distributed systems **Operational Excellence** Apply DevOps/SRE practices to automate deployments and operations Enhance observability using Prometheus/Grafana and AI-driven insights **Incident Response** Participate in on-call rotations Implement preventative and automated remediation solutions **Collaboration** Work closely with engineers to execute technical roadmaps Contribute to code reviews and infrastructure improvements **What You Bring** 7+ years of software engineering, cloud infrastructure, SRE, or DevOps experience Proven ownership of production system reliability in cloud environments **Core Expertise** Cloud infrastructure design and automation Distributed systems and performance optimization Data warehousing and ETL frameworks **Technical Skills** Terraform, Docker, Kubernetes Observability stacks (Prometheus, Grafana) Python, Java, or Go **Additional Strengths** Strong problem-solving mindset with a focus on automation and scalability Experience improving system reliability through intelligent automation **Preferred Qualifications** Experience in healthcare or regulated environments (HIPAA, compliance frameworks) Experience working in environments requiring security clearance Experience building self-healing or autonomous infrastructure systems **Responsibilities** - Work with the Site Reliability Engineering (SRE) team to take shared ownership of services and platform components. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
AI Platform Reliability Engineer OracleAI Platform Reliability EngineerOlympia, WA$79,200–$209,500 / yearOracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. The engineer will also support data reliability use cases such as detection of stopped processing, data gaps, freshness issues, schema drift, and anomaly conditions that affect downstream analytics and reporting.
Reliability Engineer (Michigan) OracleReliability Engineer (Michigan)Olympia, WA$97,500–$199,500 / yearWhy Oracle Cloud Infrastructure?** **Global impact at scale:** Contribute directly to how mission-critical OCI data centers operate across regions and continents, influencing infrastructure reliability, security, sustainability, and long-term capacity growth. **Technically rigorous environment:** Work alongside experienced engineers, automation specialists, and compliance teams in a rapidly scaling hyperscale cloud infrastructure, where disciplined execution and technical depth matter.
Distinguished Site Reliability Engineer - Cloud NVIDIA CorpDistinguished Site Reliability Engineer - CloudWASRE at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and at the same time enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency and performance. This is a highly specialized discipline which demand knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack.
Site Reliability Developer 3 OracleSite Reliability Developer 3Olympia, WA$79,100–$158,200 / yearYou will have the opportunity to: + Reach billions of people with our products & services + Create technology in which truly impacts the world + Ability to have immediate impact on developing technology + Unlimited growth potential with inspiring work + Work with the best minds in the industry + Enjoy working in an open, diverse, and productive environment **About The Job** This role provides technical leadership for the core data platforms behind Oracle Health's Data & Analytics Platform. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business.
[Remote] Principal Site Reliability Developer- USC Required Oracle[Remote] Principal Site Reliability Developer- USC RequiredOlympia, WARemote$86,400–$199,500 / yearOracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. This is a hands-on engineering role focused on solving hard distributed systems problems, improving platform resilience, and building intelligent operational capabilities at scale.
Reliability Engineer Georgia-PacificReliability EngineerTacoma, WAFull timeAs a Koch company and a leading manufacturer of building products, bath tissue, paper towels, paper-based packaging, cellulose, specialty fibers, and much more, Georgia-Pacific works to meet the evolving needs of customers worldwide with quality products. Collaborate with other leaders to develop, prioritize, and execute strategies that improve the site's competitiveness in aspects of plant operating cost, volume, and yield by utilizing Agile and Lean processes.
MRO Engineer-DER-Part32-Sept23 Keltia Design IncMRO Engineer-DER-Part32-Sept23WAJob Summary: The MRO Engineer is responsible for developing and implementing strategies to maintain, repair, and optimize industrial equipment and machinery to ensure uninterrupted production and minimize operational disruptions. A Maintenance, Repair, and Operations (MRO) Engineer, also known as a Reliability Engineer or Maintenance Engineer, plays a critical role in ensuring the smooth and efficient operation of industrial and manufacturing facilities.
Senior Manager, Data Center Reliability Engineering - Michigan OracleSenior Manager, Data Center Reliability Engineering - MichiganOlympia, WA$120,100–$251,600 / yearYou will lead engineers and analysts who partner closely with site operations, design, construction, commissioning, and automation teams to identify reliability risks, improve maintenance strategies, strengthen incident learning, and ensure corrective actions are implemented and sustained. **Ideal Candidate Profile** + 5-7+ years of experience in reliability engineering, maintenance engineering, critical facilities, manufacturing, utilities, industrial operations, or other uptime-critical environments; data center experience preferred, but adjacent industry experience is highly relevant.
Management Consulting/Asset Management Project Manager Kennedy/Jenks ConsultantsManagement Consulting/Asset Management Project ManagerFederal Way, WAFull timeKJ is at the forefront of developing sustainable solutions for clients, including green infrastructure design, strategies to reduce energy use and environmental impacts, award-winning water reuse projects, and efficient construction management practices that ensure quality, safety, and on-time delivery. Project Management: Perform project management activities, including preparing and negotiating professional services contracts, managing the quality and financial performance of projects, coordinating with subconsultants, and identifying and resolving budget and schedule issues.
Senior Director, Secure DevOps Requisition MaximusSenior Director, Secure DevOps RequisitionOlympia, WARemoteExpertise in Linux and Windows operating systems, network administration, and networking protocols/functions (HTTP, HTTPS, SSL/TLS, SMTP, DNS). In addition, we are looking for someone with proven success leading teams of engineers to develop capabilities for applications and infrastructures along with experience working with government agencies.
Job Opportunity for Digital Services Technical Manager with our Federal Client Artech LLCJob Opportunity for Digital Services Technical Manager with our Federal ClientHome, DCRemote$70–$80 / hourWe are looking for a Digital Services Technical Manager who will have the technical skills to drive innovation and collaboration between diverse teams of Software Developers, AWS Site Reliability Engineers, Automation Engineers, Testing Engineers, Network Engineers, Database Engineers, Cloud Operations Engineers, Tools Engineers, Application Developers, and Security experts. •Ability to manage communication for critical incidents in an impromptu manner and ensure detailed technical solutions are actioned, assigned with timelines resulting in timely resolutions.
Job opportunity for Digital Services Technical Lead with our federal client Artech LLCJob opportunity for Digital Services Technical Lead with our federal clientHome, DCRemote$70–$80 / hourThe Digital Services Technical Lead will have the technical skills to drive innovation and collaboration between diverse teams of Software Developers, AWS Site Reliability Engineers, Automation Engineers, Testing Engineers, Network Engineers, Database Engineers, Cloud Operations Engineers, Tools Engineers, Application Developers, and Security experts. •Proven track record in senior consulting or advisory roles with current hands on experience, driving strategic change and digital adoption across multiple business units.