Technical Program Manager, Safeguards - Infrastructure & Evals

Anthropic PBC

San Francisco, CA

JOB DETAILS
SKILLS
Artificial Intelligence (AI), Cadence, Cloud Computing, Communication Skills, Incident Management, Incident Response, Machine Tool, On Call, Operational Audit, Operations Management, Process Improvement, Production Systems, Project/Program Management, Quality Management, Risk, Safety Systems, Team Player, Technical Leadership
LOCATION
San Francisco, CA
POSTED
30+ days ago

About the Role

Safeguards Engineering builds and operates the infrastructure that keeps Anthropics AI systems safe in production - the classifiers, detection pipelines, evaluation platforms, and monitoring systems that sit between our models and the real world. That infrastructure needs to be not just correct, but reliable: when a safety-critical pipeline goes down or degrades, the consequences can be serious, and they can be invisible until someone looks closely.

As a Technical Program Manager for Safeguards Infrastructure and Evals, youll own the operational health and forward momentum of this stack. Your primary responsibility is driving reliability - owning the incident-response and post-mortem process, ensuring SLOs are defined and met in partnership with various teams, and making sure that when things go wrong, the right people know, the right actions get taken, and those actions actually get closed out. Alongside that ongoing operational rhythm, youll coordinate the larger platform investments: migrations, eval-platform improvements, and the cross-team dependencies that connect them.

This role sits at the intersection of operations and program management. It requires genuine technical depth - you need to understand how these systems work well enough to triage effectively, judge whats actually safety-critical versus what can wait, and have informed conversations with the engineers building and maintaining them. But the core of the job is keeping the machine running well and the work moving.

What Youll Do

• Own the Safeguards Engineering ops review - Drive the recurring cadence that keeps the team informed and coordinated: surfacing recent incidents and failures, bringing visibility to reliability trends, and making sure the right people are in the room when decisions need to be made. This is the heartbeat of how Safeguards Eng stays ahead of operational risk.

• Drive incident tracking and post-mortem execution - When incidents happen - and in this space, they happen regularly - youll make sure they get followed through properly. That means tracking incidents across the organization (including those owned by partner teams like Inference), ensuring post-mortems get written, and most critically, making sure the action items that come out of them actually get done. Closing the loop on post-mortem actions is one of the highest-leverage things this role does.

• Establish and maintain SLOs with partner teams - Work with Safeguards Engineering teams and key partners - particularly Inference and Cloud Inference - to define service-level objectives for safety-critical pipelines. Then build the tracking and reporting that makes it possible to tell whether those SLOs are being met, and surface it when theyre not.

• Maintain runbook quality and incident-ownership clarity - Safety-critical systems need clear playbooks for when things go wrong. Partner with engineering leads to keep runbooks accurate, actionable, and up to date - and ensure that ownership of incidents (including for areas like account-banning false positives and CSAM detection) is unambiguous so that nothing falls through the cracks during an active incident.

• Drive platform migrations and infrastructure projects - Own the program management for the larger infrastructure work on the roadmap: migrating the infra from one platform to the next, moving from one incident platform to the next and from one cloud system monitoring to another, and other migrations as they come. These are cross-team efforts with real dependencies - your job is to keep them sequenced, on track, and connected to the teams that need them.

• Coordinate evals platform improvements - Partner with the evals engineering team to drive improvements to the evaluation platform - including self-serve capabilities and the broader eval factory infrastructure. Help scope the work, track dependencies on other Safeguards systems, and make sure the evals platform is keeping pace with the teams needs.

You Might Be a Good Fit If

You have solid technical program management experience, particularly in operational or infrastructure-heavy environments - youre comfortable owning a mix of ongoing operational cadences and discrete project work simultaneously.

You understand how production ML systems work well enough to triage incidents intelligently and have substantive conversations with engineers about whats going wrong and why - you dont need to write the code, but you need to follow the technical thread.

Youre energized by closing loops. Post-mortem action items that never get done, SLOs that no one checks, runbooks that go stale - these things bother you, and you know how to build the processes and follow-ups that fix them.

You can work effectively across team boundaries - comfortable coordinating with partner teams (like Inference) where you dont have direct authority, and skilled at keeping shared work moving through influence and clear communication.

You thrive in environments where the work shifts between keeping the lights on and building something new - and can context-switch between incident follow-ups and longer-horizon platform projects without dropping either.

You have experience with or strong interest in AI safety - you understand why the reliability of a safety-critical pipeline is a different kind of problem than the reliability of a product feature, and that distinction motivates you.

Strong candidates may also:

Have experience with SRE practices, incident management frameworks, or on-call operations at scale.

Have worked on or with evaluation infrastructure for ML systems - understanding how evals get designed, run, and interpreted.

Have experience driving infrastructure migrations in complex, multi-team environments - particularly where the migration touches operational systems that cant go offline.

Be familiar with monitoring and alerting tooling (PagerDuty, Datadog, or equivalents) and the operational culture around them.

Deadline to Apply: None, applications will be received on a rolling basis.

About the Company

A

Anthropic PBC