Lead Performance & Observability Engineer

Intercontinental Exchange Holdings, Inc.

Atlanta, Georgia

Apply

JOB DETAILS

SKILLS

Analysis Skills, Apache JMeter, Apache Kafka, Architectural Services, Artificial Intelligence (AI), Automation, Benchmarking, CPU (Central Processing Unit), Communication Skills, Computer Science, Ecosystems, Finance, GitHub, Government Organizations, Groovy Programming Language, High Throughput, Home Automation, IBM WebSphere MQ (Message Queue), Instrumentation, Java, Leadership, Leading Edge Technology, Linux Operating System, Load Testing, Machine Tool, Mentoring, Metrics, Microsoft Exchange Server, Multiplatform/Cross-Platform, Multithreaded Programming, Operating Systems, Operations Management, Performance Analysis, Performance Engineering, Performance Modeling, Performance Testing, Presentation/Verbal Skills, Project/Program Management, Python Programming/Scripting Language, Quality Assurance Methodology, Query Analysis, Reporting Dashboards, Root Cause Analysis, Scaffolding, Scripting (Scripting Languages), Software Architecture, Software Development, Stock Market, Systems Analysis, Team Building, Team Lead/Manager, Team Player, Technical Leadership, Test Automation, Test Data, Test Harness, Unix Shell Programming, Writing Skills

LOCATION

Atlanta, Georgia

POSTED

30+ days ago

Overview:

Job Purpose

Intercontinental Exchange, Inc. (ICE), the owner of the New York Stock Exchange (NYSE), is seeking a results-oriented, self-motivated individual for its Clearing Performance Engineering team in Atlanta. This individual will serve as a technical lead within a team of software architects and performance engineers, operating in a cutting-edge technology environment responsible for running critical financial sector exchanges and clearinghouses. The successful candidate will drive performance engineering strategy across multiple platforms, mentor peers, and deliver deep-dive analysis on the most complex and time-sensitive systems in the organization.

You must be technically authoritative, outcome-focused, and capable of thriving in a mission-critical environment where end-of-day processing windows are measured in minutes. This role requires close collaboration with software architects, developers, quant library owners, infrastructure teams, and project managers to ensure our platforms meet the highest standards of reliability, scalability, and throughput.

As a Lead Performance Engineer you will own the end-to-end performance strategy for complex, event-driven Java platforms. You will define testing methodologies, build durable test harnesses, drive observability initiatives, and act as the authoritative voice on performance trade-offs across application, JVM, OS, and hardware layers.

Responsibilities

Strategy & Investigation
- Define and own the performance engineering strategy across multiple critical platforms; set standards for testing approach, tooling, and reporting across the team.
- Lead deep-dive performance investigations on CPU-bound, multi-threaded Java systems—including analysis at the JVM, OS, and hardware levels.
- Drive scalability analysis: model how system performance scales linearly (or non-linearly) as data volume and instrument counts grow; produce capacity projections to guide architecture decisions.
- Lead critical path segregation analysis: identify which operations are time-constrained, propose architectural solutions (e.g., head-start strategies, out-of-band processing), and validate their impact.
Observability & Monitoring Framework
- Design and own a comprehensive observability framework spanning metrics, distributed tracing, and continuous profiling to provide full-stack visibility across platform components.
- Build and maintain platform-wide health scoring and alerting using Prometheus and VictoriaMetrics; drive adoption of consistent instrumentation and Grafana visualization standards across application and infrastructure layers.
- Develop custom metrics exporters to surface business-layer signals — including database growth, queue depths, and pipeline latency — alongside infrastructure metrics.
- Architect and maintain Grafana dashboard ecosystems; establish standards for dashboard organization, datasource governance, and operational visibility across teams.
- Lead distributed tracing and continuous profiling initiatives across Java application estates; use trace and profile data to complement load test findings and accelerate root cause analysis.
- Build automated post-test analysis pipelines that correlate load test results with observability data to produce clear, causal performance narratives for technical and non-technical stakeholders.
Testing & Automation
- Design and build robust test harnesses that accurately measure version-to-version performance regressions for compute-intensive components.
- Tune JVM thread pools, garbage collection, and heap allocation for high-throughput, latency-sensitive processing pipelines.
- Experience with load generation frameworks (JMeter, Gatling, or custom harnesses); design workloads representative of real production traffic patterns.
- Create and maintain automation scripts and tooling to simplify repeatable performance analysis tasks.
Collaboration & Leadership
- Act as a technical liaison between performance, development, infrastructure, and operations teams; translate performance findings into actionable recommendations with clear data backing.
- Champion AI-augmented workflows within the performance team: identify where AI tools accelerate root cause analysis, reduce boilerplate in test harnesses, and surface anomalies in benchmark data — while establishing team standards for validating AI-generated output before it reaches production.

Knowledge and Experience

Bachelor's Degree or equivalent in Computer Science, Engineering, or a related field.
8+ years of experience in performance engineering, performance testing, or Java development in high-volume, low-latency, transactional systems.
JVM & Application Performance
- Deep expertise in JVM internals: heap dump analysis, thread dump analysis, GC log interpretation, and memory profiling.
- Proficiency with Java and scripting languages such as Python, Groovy, or Linux shell for building test harness foundations and automation.
- Proven ability to perform scalability analysis — projecting how systems behave as data volumes grow and validating linearity assumptions with empirical evidence.
- Strong grasp of critical path analysis: ability to identify time-constrained operations, separate them from non-critical work, and validate architectural changes that improve throughput within a fixed time window.
Observability Stack & Monitoring Frameworks
- Hands-on experience designing and operating multi-signal observability stacks covering metrics collection, distributed tracing, and continuous profiling across Java application environments.
- Proficiency with Prometheus and VictoriaMetrics for metrics collection, storage, and querying; experience building health scoring rules, recording rules, and alerting across platform components.
- Experience building custom metrics exporters to expose business-layer and application-specific signals not covered by standard exporters.
- Proficiency with Grafana for building and managing operational and performance dashboards; experience with programmatic dashboard management, datasource governance, and multi-team dashboard standards.
- Familiarity with distributed tracing and continuous profiling tools; ability to use trace and profile data alongside load test results to accelerate root cause analysis.
- Experience building automated post-test reporting pipelines that synthesize observability data and load test results into actionable performance narratives.
Messaging, Load Testing & Event Pipelines
- Experience with event-driven, message-based architectures (Kafka, IBM MQ, or equivalent); ability to test and profile throughput and latency across event pipelines.
- Experience with load generation frameworks (JMeter, Gatling, or custom harnesses) and the ability to design workloads representative of real production traffic patterns.
Communication & AI Tooling
- Excellent verbal and written communication skills; able to present findings clearly to both technical architects and non-technical stakeholders.
- Ability to work across teams with varying levels of performance domain knowledge and build collaborative relationships with application owners.
- Demonstrated proficiency with AI coding and analysis tools (GitHub Copilot, Claude, Cursor, or equivalent) for performance-specific tasks: generating profiling scripts, writing Prometheus queries, analyzing GC logs, and building test harness scaffolding.

----------: Intercontinental Exchange, Inc. is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to legally protected characteristics.

About the Company

Intercontinental Exchange Holdings, Inc.

Resume Resources

Free Resume Templates Free Resume Builder