Principal Engineer Platform Engineering & Production Support

Mindlance

IRVING, TX

JOB DETAILS
SKILLS
Apache Kafka, Automation, Cloud Applications, Cloud Architecture, Cloud Computing, Database Technology, DevOps, Distributed Computing, Ecosystems, Gap Analysis, Identify Issues, Incident Management, Incident Response, Java, Mentoring, Microservices, Multitasking, On Call, Operational Support, Problem Solving Skills, Production Support, Production Systems, Python Programming/Scripting Language, React.js, Red Hat Linux Operating System, Relational Databases (RDBMS), Reliability Engineering, Reporting Dashboards, Risk Analysis, Sales Closing Skills, ServiceNow, Software Administration, Splunk, System Architecture, Systems Reliability, Technical Leadership
LOCATION
IRVING, TX
POSTED
24 days ago
Principal Engineer Platform Engineering & Production Support
Team Overview
This role supports a critical Platform Engineering team responsible for stabilizing, scaling, and operating applications as they move closer to production release. The team plays a key role post-deployment, ensuring reliability, performance, and operational excellence across a portfolio of applications.
This is not traditional infrastructure support it is application-focused production engineering, requiring deep technical expertise, proactive issue prevention, and strong ownership of application health in cloud environments.

Role Summary
We are seeking a Principal Engineer to backfill a key contractor position within our Platform Engineering team. This individual must be Day 1 ready, capable of operating in fast-paced, production-critical environments, and able to seamlessly balance multiple priorities.
The ideal candidate is a strong DevOps and Site Reliability Engineering (SRE) professional with hands-on expertise in observability, incident management, and cloud platforms (OpenShift). They will play a leading role in supporting production systems, preventing outages, and improving system reliability through automation and intelligent monitoring.

Key Responsibilities
Lead production support efforts across a portfolio of 20+ applications, ensuring stability, performance, and rapid issue resolution
Design and build advanced monitoring, alerting, and observability dashboards using tools such as Splunk, Grafana, AppDynamics, and Prometheus
Proactively identify risks through gap analysis, anomaly detection, and predictive alerting, preventing production incidents before they occur
Troubleshoot complex production issues across distributed microservices environments, reducing MTTR through deep technical expertise
Drive adoption of modern SRE practices, including automation, AIOps, and intelligent monitoring solutions
Support applications running on OpenShift and cloud-native platforms, with a focus on reliability and scalability
Collaborate closely with development teams during release cycles, providing production-readiness guidance and operational support
Participate in 24x7 on-call rotation, demonstrating urgency and ownership during incidents
Mentor and guide engineers, helping elevate team capabilities in SRE, DevOps, and platform engineering practices
Act as a trusted technical leader, able to quickly switch priorities and manage competing demands in a high-pressure environment

What We re Looking For
A genuine, hands-on engineer who can operate across multiple roles (SRE, DevOps, Production Support)
Strong ability to shift priorities quickly and respond with urgency in critical situations
Deep understanding of application support in cloud environments, especially OpenShift
Experience in the financial services industry strongly preferred
Prior development experience is a plus, particularly in Java-based ecosystems

Required Qualifications:
" 10+ years of Platform and production support
" 5 years of Redhat Linux, OpenShift, Kubernetes, Java, microservices, Spring Boot, Python experience
" 5 years of Observability dashboard creation experience - Grafana, Splunk, SPLOC, AppDynamics
" 5 years of Observability alerts and Incident handling - AIOPS, Service now, Bigpanda etc
" 4 years of React.js, Apache, Kafka, relational databases experience
" 4 years of distributed systems, microservices architectures, and cloud native platforms experiencexperience

EEO:

Mindlance is an Equal Opportunity Employer and does not discriminate in employment on the basis of Minority/Gender/Disability/Religion/LGBTQI/Age/Veterans.

About the Company

M

Mindlance