System Performance Engineer - Lead SE : Mastercard

Shiftcode Analytics, Inc

St. Louis, MO

JOB DETAILS
SKILLS
Amazon Web Services (AWS), Analysis Skills, Autoscaling, Booting, Brokerage, Budgeting, CPU (Central Processing Unit), Caching, Cloud Computing, Communication Skills, Concurrency, DNS (Domain Name System), Debugging Skills, Distributed Computing, Docker, F5 Network Software, File Systems, Financial Systems, Gaming, Hybrid Cloud, IBM Product Family, Identify Issues, Input/Output, Java, Load Balancing, Memory Hardware, Middleware, Network Performance/Analysis, Payment Processing, Performance Analysis, Performance Engineering, Performance Modeling, Private Cloud, Problem Solving Skills, Production Systems, Productivity Model, RabbitMQ, Radiography, Redis, Replication and Remote Mirroring, Retail, Root Cause Analysis, SSL-TLS (Secure Socket Layer - Transport Layer Security), Software as a Service (SaaS), System Architecture, Systems Engineering, nginx Web Server
LOCATION
St. Louis, MO
POSTED
30+ days ago
Interview : Coding Test + Video Interview

Visa : USC, GC

This is hybrid from day-1 ( Need only local candidates )

Client is only looking for consultants coming from Amazon, IBM or brokerage trading services comapany dealing with high volume of trasactions.

From an interview perspective, the Hiring Manager provided some great feedback to what they are looking for in the interviews:

  • Strong communication – looking for someone that answers questions without hesitation. Can speak to resume without needing to reference the resume or notes.
  • Java development background (even if not recent, but need to still have the ability to understand the logic)
  • Lots of cloud experience (AWS and PCF is what they use)
  • High-transactions systems
  • Containerization
  • Experience digging in and resolving middleware problems and collaborating with downstream teams


Description:

This role is a blend of application performance, system performance engineer, and cloud engineer. They are really looking for someone from a high-volume transaction background like brokerage trading services, payment services, gaming, large scale online retail (Walmart, Amazon, Netflix). They will need to have the ability to troubleshoot across entire application and technology stack in a cloud environment with the capability to go beyond diagnoses of the symptoms, to the root cause problem, and determine a path for resolution. The Areas of Expertise included a list of various technologies that could apply. The candidates should have knowledge in each area and can define the tools they have used in each area, but none are specifically mandatory, other than Spring Boot.


Responsible for identifying and resolving end-to-end performance bottlenecks across distributed systems, Spring Boot services, middleware components, and hybrid cloud environments (private cloud + AWS). This role goes far beyond traditional testing by deeply analyzing container orchestration, networking paths, and system interactions under load. This position maps full system workflows, sets realistic latency budgets, and ensures each component meets its SLOs. Ideal candidates have extensive experience with high-scale, multi-region, and high-transaction platforms (e.g., financial systems, payment processing, or large enterprise SaaS) running in a Cloud environment.

Key Responsibilities
  • Define service-level objectives (SLOs), performance budgets, and latency/throughput targets across services.
  • Architect and champion comprehensive distributed tracing strategies (Dynatrace, AWS X-Ray, etc.).
  • Analyze application, platform, and cloud behavior using deep-dive techniques such as heap dumps, thread dumps, flame graphs, GC logs, network traces, and storage I/O profiling.
  • Review service and system architectures for performance risks (e.g., synchronous hops, excessive dependencies, misconfigured connection pools, poor cache placement).
  • Conduct and lead root-cause analysis for performance incidents in production and pre-production environments.
  • Develop capacity models and performance baselines for services running across cloud environments.
Areas of Expertise
  1. Application Layer: Spring Boot internals, JVM tuning, thread/heap management, concurrency debugging, GC optimization
  2. Container Runtime: PCF, Docker, container resource limits, CPU throttling, memory pressure
  3. Orchestrators: PCF, Kubernetes, ECS (autoscaling, pod health, scheduling issues)
  4. Networking: Service-to-service hops, TLS overhead, DNS, routing, load balancer configs (F5, Nginx, ALB/NLB), service mesh performance
  5. Storage: Latency, IOPS constraints, distributed file system behavior
  6. Caching & Middleware: Redis, Hazelcast, NATS, Kafka, RabbitMQ configuration and throughput tuning
  7. Databases: Connection pool tuning, slow queries, indexing, replication lag
  8. Cloud Layer: AWS compute/storage/network performance, regional latency, cross-cloud traffic patterns

About the Company

S

Shiftcode Analytics, Inc