Applied AI Researcher – Video Diffusion
Location: On-site, San Francisco, CA
Compensation: $160,000 – $300,000 + Equity (0.5%–2%)
Employment Type: Full-time
An early-stage AI startup in San Francisco is building the next generation of video foundation models for human motion and expression. They're seeking an Applied AI Researcher to help lead the training of state-of-the-art video diffusion models from scratch—working directly with massive visual datasets and 100s of GPUs.
This is a high-ownership role at a company that trains its own models (not just fine-tunes others) and is backed by well-known investors and founders from top-tier companies in AI, video, and infrastructure.
Train large-scale diffusion and transformer-based models for video generation
Curate, clean, and label internet-scale video datasets
Run targeted experiments and rapidly iterate on model improvements
Distill models for faster inference with minimal performance loss
Stay current with arXiv and GenAI research; help shape the model roadmap
Build LoRA modules to expand model capability
2+ years building ML models from scratch in Python and PyTorch
Deep experience with vision transformers, diffusion models, or related architectures
Comfortable working on Linux clusters with GPU workloads
Experience with labeling tools (e.g., face detection, speaker recognition)
Strong research mindset—PhD or top-tier publications a plus
Familiarity with video compression, codecs, and perceptual metrics
You're hands-on with training generative video models
You thrive in early-stage environments and want full-stack research-to-deployment impact
You obsess over both clean data and novel architecture design
You're active in AI research circles and keep up with the latest GenAI trends