Lakehouse Performance Engineer

International Business Machines Corp

Austin, TX

JOB DETAILS
SKILLS
Artificial Intelligence (AI), Automation, Benchmarking, CPU (Central Processing Unit), Campaigns, Capacity Management, Capacity Utilization, Competitive Analysis/Strategy, Concurrency, Continuous Deployment/Delivery, Continuous Improvement, Continuous Integration, Customer Relations, Data Collection, Data Sets, GPU (Graphics Processing Unit), IBM Product Family, Industry Standards, Metrics, Performance Analysis, Performance Engineering, Performance Management, Performance Metrics, Reporting Dashboards, Root Cause Analysis, Scorecarding, Technical Recruiting, Telemetry, Testing, Total Cost of Ownership, Vehicle Fleets, Warehousing
LOCATION
Austin, TX
POSTED
4 days ago

IBM is building the next generation of watsonx.data: a GPU-accelerated, open data lakehouse engineered to deliver category-leading price‑performance for analytics and AI workloads. We are hiring a Performance Engineer to be a hands-on individual contributor focused on measuring, defending, and improving the performance and cost‑per‑performance of the platform across every release.

You will run the benchmarks, build the harnesses, and operate the backing infrastructure that the entire watsonx.data organization relies on to characterize performance. That includes the dedicated benchmark labs, GPU and CPU test fleets, dataset stores, result warehouses, and the automation that ties them together. Engineering, product, field, and competitive intelligence will all consume what you produce: regression signals in CI, executive scorecards, customer-facing dashboards, and the data behind public claims that we are the market-leading open lakehouse.

Benchmarking & Workload Engineering

  • Industry-standard benchmarks: Run, maintain, and continuously improve reproducible benchmarks across watsonx.data configurations and against competitive offerings.

  • Customer-representative workloads: Build and curate workload suites that reflect real customer query mixes, data volumes, concurrency profiles, and freshness requirements.

  • Reproducibility & rigor: Ensure every published result is reproducible end-to-end: controlled environments, pinned versions, locked datasets, documented methodology, variance analysis, and statistically defensible reporting.

  • Cost-per-performance metrics: Operationalize the canonical price‑performance KPIs ($/query, $/TB scanned, $/training‑token, queries/sec/$, TCO at workload mix); instrument workloads, collect data, and produce repeatable scorecards.

Performance Observability & Analysis

  • Telemetry pipeline: Build and maintain the metrics, traces, profiles, GPU/CPU utilization, query plan, and IO telemetry that flow from benchmark runs into the performance data store.

  • Dashboards & scorecards: Develop dashboards that surface trends, regressions, and competitive position to engineering, leadership, and external audiences.

  • Regression gates: Operate performance regression gates in CI/CD; triage failures, file and drive issues with engine, storage, and GPU teams, and verify fixes.

  • Root-cause analysis: Drill into slow queries and GPU/CPU bottlenecks using profilers (Nsight, perf, async-profiler, pprof, flamegraphs) and query plan inspection to pinpoint regressions and improvement opportunities.

Backing Infrastructure for Performance

  • Performance environment ownership: Own the lifecycle of the dedicated performance environment(s) supporting watsonx.data: GPU and CPU clusters, networking, storage, and the orchestration that schedules workloads onto them.

  • Test fleet automation: Build and maintain infrastructure-as-code (Terraform/Ansible/Helm) for provisioning, configuring, and resetting test environments deterministically across on-prem hardware and cloud (IBM Cloud and partner clouds).

  • Benchmark harness platform: Develop and operate the benchmark harness itself: job scheduler, run orchestration, dataset provisioning, result capture, artifact storage, and the API/CLI other teams use to launch runs.

  • Dataset & result warehouse: Own the curated datasets used for benchmarking and the warehouse of historical results that powers trend analysis, regression detection, and competitive comparisons.

  • Capacity & utilization: Manage capacity and utilization of the performance lab so concurrent campaigns from different teams (query engine, storage, GPU acceleration, AI) run cleanly and without interference.

About the Company

I

International Business Machines Corp

At IBM, you don’t need a degree to shape the future. Just bring your skills—and your passion. To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate.

Not just to do something better, but to attempt things you've never thought possible. To lead in this new era of technology and solve some of the world's most challenging problems. Let’s get to work.

COMPANY SIZE
10,000 employees or more
INDUSTRY
Computer/IT Services
FOUNDED
1911
WEBSITE
http://www-03.ibm.com/employment/us/