Sr. Research Engineer/Scientist (all levels), World Models
TikTok Inc
San Jose, CA
Apply
JOB DETAILS
SKILLS
Artificial Intelligence (AI), Communication Skills, Computer Graphics, Computer Vision, Content Development, Cross-Functional, Data Management, Data Sets, Image Manipulation, Machine Learning, Multimodal Interaction, Physics, Publications, Reinforcement Learning, Sales Pipeline, Scientific Research, Simulation, Training/Teaching, Video Editing
LOCATION
San Jose, CA
POSTED
30+ days ago
About the Team
The Vision-Applied Research team focuses on applied research in Generative AI and CV/Multimodal Understanding, and delivering intelligent solutions to Tiktok, enabling users to make and share creative content in a much easier way. The team has research groups dedicated to generative models for content creation, image generation, video synthesis, intelligent image/video editing, and world models.
The team is looking for Research Engineer / Scientists who can take initiatives in building next-generation World Models. The candidate will work on developing methods and infrastructure to train large-scale generative models from massive simulated and real-world multimodal datasets. This role places a particular emphasis on ensuring long-horizon temporal consistency, realistic physics, complex dynamics from the model and enabling users and agents to interact with the model in real-time.
Responsibilities
Develop large-scale, diverse, and interactive multi-modal data generation pipeline.
Develop training pipeline for long-context interactive video generation models.
Advance video generation models to capture long-horizon temporal consistency, realistic physical dynamics, object interactions, and causal relationships from large-scale multi-modal data. Minimum Qualifications:
M.S or Ph.D. in Computer Vision, Computer Graphics, Machine Learning, or equivalent experience.
3 years research experiences in broad GenAI, multimodal foundation models, or Embodied AI areas.
Demonstrated ability to communicate complex technical concepts and collaborate effectively within cross-functional research teams
Preferred Qualifications:
Proven experiences in at least one of the following areas: video generation and synthesis; efficient and real-time diffusion models; 3D/physics-based simulation; or reinforcement learning for agentic environment interaction.
Proven track record of first-author publications in prestigious venues including CVPR, ICLR, NeurIPS, SIGGRAPH, and ICML