Title: Principal Engineer Platform Engineering & Production Support
Location: 401 W Las Colinas Blvd Irving, TX
Alternate Locations: Charlotte, NC or Minneapolis, MN
Duration: 12 months
Work Engagement: W2
Work Schedule: 3 days in office/2 days remote
Benefits on offer for this contract position: Health Insurance, Life insurance, 401K and Voluntary Benefits
Summary:
We are seeking a Principal Engineer within the Platform Engineering team. This individual must be Day 1 ready, comfortable operating in fast-paced, production-critical environments, and capable of balancing multiple competing priorities.
The ideal candidate is a seasoned DevOps and Site Reliability Engineering (SRE) professional with strong hands-on expertise in observability, incident management, and cloud platforms (OpenShift). This role will play a leading part in supporting production systems, preventing outages, and improving system reliability through automation, intelligent monitoring, and modern SRE practices.
Team Overview:
This role supports a critical Platform Engineering team responsible for stabilizing, scaling, and operating applications as they move toward and beyond production release. The team plays a key role post-deployment, ensuring reliability, performance, and operational excellence across a broad application portfolio.
This is not traditional infrastructure support. It is application-focused production engineering, requiring deep technical expertise, proactive issue prevention, and strong ownership of application health in cloud-native environments.
Responsibilities:
Lead production support efforts across a portfolio of 20+ applications, ensuring stability, performance, and rapid issue resolution
Design, build, and maintain advanced monitoring, alerting, and observability dashboards using tools such as Splunk, Grafana, AppDynamics, Prometheus, and SPLOC
Proactively identify production risks through gap analysis, anomaly detection, and predictive alerting, preventing incidents before they occur
Troubleshoot complex production issues across distributed microservices environments, driving reduced MTTR through deep technical expertise
Drive adoption of modern SRE practices, including automation, AIOps, and intelligent monitoring
Support applications running on OpenShift and cloud-native platforms, with a strong focus on reliability, scalability, and resiliency
Collaborate closely with development teams during release cycles, providing production-readiness guidance and operational support
Participate in a 24x7 on-call rotation, demonstrating urgency, ownership, and accountability during incidents
Mentor and guide engineers, helping elevate team capabilities in SRE, DevOps, and platform engineering
Act as a trusted technical leader, able to rapidly shift priorities and manage competing demands in high-pressure environments
Qualifications:
Applicants must be authorized to work for ANY employer in the U.S. This position is not eligible for visa sponsorship.
Strong background in platform engineering and production support
Hands-on experience with:
Red Hat Linux
OpenShift and Kubernetes
Java and Python
Microservices architectures and Spring Boot
Experience designing and maintaining observability dashboards, including:
Grafana
Splunk
SPLOC
AppDynamics
Experience with observability alerts, incident response, and on-call support, leveraging tools such as:
AIOps platforms
ServiceNow
BigPanda or similar incident management tools
Experience with:
React.js
Apache
Kafka
Relational databases
Strong understanding of distributed systems, cloud-native platforms, and microservices-based architectures
We believe in our vision and values just as strongly today as we did the first time we put them on paper more than 20 years ago. Staying true to them will guide us toward continued growth and success for decades to come. As you read more about our vision and values, you will learn about who we are, where we’re headed and how every Wells Fargo team member can help us get there.