Senior Research Scientist (Multimodal Large Language Model) - PICO

Beijing ByteDance Technology Co Ltd

San Jose, CA

Apply

JOB DETAILS

SKILLS

Accounts Receivable, Artificial Intelligence (AI), C++ Programming Language, Cloud Computing, Communication Skills, Computer Science, Computer Vision, Conferences, Construction, Cross-Functional, Data Analysis, Data Science, Deep Learning, Electrical Engineering, Hardware Development, Human-Computer Interaction, IBM MVS Operating System, Industrial Research, Information Models, Leadership, Leading Edge Technology, Localization, Machine Learning, Memory Management, Modeling Languages, Problem Solving Skills, Product Design, Publications, Python Programming/Scripting Language, Reinforcement Learning, Research & Development (R&D), Research Skills, Scene Understanding, Scientific Research, Software Engineering, Team Player, User Interface/Experience (UI/UX)

LOCATION

San Jose, CA

POSTED

30+ days ago

About the Team

PICO-MR team is dedicated to pioneering core technologies for intelligent human-computer interaction in MR environments, with a focus on integrating multimodal large language models (MLLM) and tool-use capabilities to redefine user experiences. Our R&D directions cover cutting-edge fields including multimodal scene understanding, MLLM-based agent systems, tool-augmented MR interaction, 3D environment perception, and AIGC-driven content generation. Within MR scenarios, our work spans: MLLM optimization and adaptation for MR, intelligent task execution with tool use, multimodal scene understanding (vision, point clouds, text), AIGC-based scene generation, depth estimation (Mono/Stereo/MVS), 3D environment perception, large-scale 3D scene reconstruction (3DGS, NeRF, etc.), visual localization, and lighting estimation-encompassing both fundamental research breakthroughs and industrial-grade solution deployment.

Responsibilities:

• Lead the R&D of multimodal large language models (MLLM) tailored for MR scenarios, integrating vision, point clouds, text, and other multimodal information-including model architecture optimization, cross-modal alignment, data construction, evaluation system enhancement, and end-to-end training/inference acceleration. • Drive the research and implementation of MLLM tool-use capabilities in MR environments, enabling models to proficiently utilize spatial interaction and spatial computing-related professional tools, support tool calls for both single-turn and multi-turn conversations, and solve complex user tasks through interaction. • Address key challenges in long-horizon, multi-turn tool-augmented tasks in MR, such as context memory management, tool selection strategy, and error correction mechanisms. • Keep abreast of cutting-edge technologies in MLLM, multimodal intelligence, and tool-use research, and lead the application and deployment of innovative technologies in PICOs MR products. • Collaborate with cross-functional teams (including software engineering, product design, and hardware development) to translate research outcomes into practical features that enhance user experience.

Minimum Qualifications

Masters or Ph.D. degree in Computer Science, Electrical Engineering, Machine Learning, Artificial Intelligence, or a related quantitative field.
Expertise in multimodal large model pre-training, post-training, fine-tuning, or cross-modal fusion technologies, with hands-on experience in model optimization, training workflow design, and performance tuning.
Proven research experience in LLM tool use, reinforcement learning, LLM agents, or interactive learning, with a deep understanding of single-turn and multi-turn interaction mechanisms.
Proficiency in core 2D/3D computer vision tasks, including detection, segmentation, depth estimation, image matching, and 3D scene perception.
Skilled in Python and C++, with solid programming capabilities and experience in developing large-scale models using mainstream deep learning frameworks (PyTorch/TensorFlow).
Excellent problem-solving and independent research abilities, capable of addressing complex technical challenges in the integration of MR and MLLM tool use.

Preferred Qualifications

Publications in AI/ML/CV conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP) focusing on multimodal large models, LLM tool use, or agent systems.
Hands-on experience in building large-scale MLLM training pipelines, tool-use evaluation systems, or multimodal agent platforms.
Familiarity with MR/AR/VR technologies, spatial computing, or 3D scene reconstruction (3DGS, NeRF, etc.) is a strong plus.
Experience in addressing long-horizon reasoning or asynchronous agent behavior challenges is highly valued.
Award winners of competitions such as ACM-ICPC, NOI/IOI, TopCoder, or AI/ML contests (e.g., Kaggle) are preferred.
Strong collaboration and communication skills, able to lead research initiatives and drive cross-team technical alignment.

About the Company

Beijing ByteDance Technology Co Ltd

Resume Resources

Free Resume Templates Free Resume Builder

Senior Research Scientist (Multimodal Large Language Model) - PICO

Beijing ByteDance Technology Co Ltd

San Jose, CA

About the Company

Beijing ByteDance Technology Co Ltd

Resume Resources

Similar Job Searches