ML Research Engineer, AI Evaluation Platform

Apple

Seattle, WA

JOB DETAILS
SKILLS
Apple, Artificial Intelligence (AI), Benchmarking, Calibration, Cloud Computing, Communication Skills, Computer Science, Continuous Deployment/Delivery, Continuous Integration, Debugging Skills, Distributed Computing, Docker, Economics, Ecosystems, Engineering, Equipment Maintenance/Repair, Human Interaction, JAX (Java API for XML), Machine Learning, Modeling Languages, Open Source, Performance Tuning/Optimization, Prototyping, Psychometrics, Publications, Python Programming/Scripting Language, Scientific Research, Software Engineering, Source Code/Configuration Management (SCM), Statistics, Systems Administration/Management, Team Player, Testing, Workflow Analysis
LOCATION
Seattle, WA
POSTED
18 days ago
**Role Number:** 200656392-3337 **Summary** AI systems are only as trustworthy as the methods used to evaluate them. At Apple, where AI powers experiences for billions of people, getting evaluation right is not a support function-it is a foundational science. Our team, part of Apple Services Engineering, is building that scientific foundation: rigorous, scalable evaluation methodology for LLMs, agentic systems, and human-AI interaction. What makes this team unusual is its interdisciplinary core. You will work alongside measurement scientists (psychometrics, validity theory), ML researchers, and platform engineers-bringing together ML research, statistical rigor, and production engineering. We are looking for an ML Research Engineer who can move fluidly across this landscape: someone who loves implementing the latest techniques in AI, has the engineering instincts to make them robust and scalable, and thrives at the intersection of research and production. **Description** This is a combined research and engineering role, sitting with and between research/applied scientists and platform engineers. New evaluation research can be challenging to use at scale-that's where your skills in both machine learning and engineering come into play. On the research side, you will partner with scientists to rapidly prototype their ideas, implement methods from recent papers, run large-scale experiments, and provide critical feedback grounded in your engineering experience. On the engineering side, you will work with platform engineers to bring those research prototypes into production-moving from Python packages on local machines to robust services deployed in the cloud. While past experience in research is not required, a desire to advance the state of the art in AI evaluation is. You should be ready to jump in across the full lifecycle of bringing new research into production at scale, speaking both the language of research and the language of engineering. **Minimum Qualifications** + Bachelor's degree in Computer Science, Machine Learning, Software Engineering, or a closely related field (Master's preferred) + 2+ years of hands-on experience in a role combining machine learning and software engineering (e.g., ML engineer, research engineer, or applied scientist with strong engineering output), or a Master's degree in Computer Science, Machine Learning, or a closely related field with relevant project experience + Strong proficiency in Python and the modern ML ecosystem (PyTorch, JAX, or TensorFlow), with demonstrated ability to implement complex methods from recent ML papers + Solid software engineering fundamentals: clean code design, version control, testing, debugging, and performance optimization + Experience working with large language models-whether fine-tuning, inference, prompting pipelines, or building LLM-powered applications + Demonstrated ability to work across the research-to-production spectrum: you have taken experimental or prototype code and made it robust, scalable, and usable by others + Practical experience with cloud-native development and deployment: containerization (Docker/Kubernetes), CI/CD pipelines, and distributed computing frameworks (e.g., Ray, Spark) + Strong communication skills and comfort working in interdisciplinary teams, with the ability to engage productively with both researchers and platform engineers + Comfort with ambiguity and new problem spaces-you thrive when building something that doesn't yet have a playbook **Preferred Qualifications** + Master's or Ph.D. in Computer Science, Machine Learning, or a related field + Experience with evaluation-specific methods or frameworks: LLM-as-judge approaches, reward modeling, RLHF, calibration techniques, benchmark design, or human evaluation methodology + Familiarity with modern evaluation tools and frameworks (e.g., DeepEval, Ragas, TruLens, LangSmith) and an understanding of how to implement and scale model-based evaluation workflows + Track record of contributing to research outputs-co-authored publications, open-source contributions, or internal research reports-even if research is not your primary role + Experience with the engineering challenges specific to generative AI and agentic systems: managing token economics, handling non-deterministic outputs, evaluating multi-turn agent trajectories and tool usage + Familiarity with statistical concepts relevant to evaluation: calibration, inter-rater reliability, scoring rules, or measurement validity + Experience in fast-moving, early-stage teams where you helped define technical direction and engineering culture from the ground up

About the Company

A

Apple

We bring amazing people together to make amazing things happen.

We’re a diverse collection of thinkers and doers, continually reimagining what’s possible to help us all do what we love in new ways. The people who work here have reinvented entire industries with the Mac, iPhone, iPad, and Apple Watch, as well as with services, including iTunes, the App Store, Apple Music, and Apple Pay. And the same passion for innovation that goes into our products also applies to our practices — strengthening our commitment to leave the world better than we found it.

About Apple

There’s a place here for every kind of brilliant. Everyone here is an innovator, or an innovator-to-be, no matter what your team or your role. So bring your passion, courage, and original thinking and get ready to share it, because every new product, service, or feature we invent is the result of people working together to make each others’ ideas stronger. Innovation at this level depends on people who represent the variety of the human experience and inspire us with their own fresh perspectives. Together, we’ll do amazing work that can make a difference in people’s lives. Including your own. Learn more about working at Apple.

COMPANY SIZE
10,000 employees or more
INDUSTRY
Other/Not Classified
FOUNDED
1976
WEBSITE
https://www.apple.com/jobs