Accounts Receivable, Artificial Intelligence (AI), C++ Programming Language, Cloud Computing, Communication Skills, Computer Science, Computer Vision, Conferences, Construction, Cross-Functional, Data Analysis, Data Science, Deep Learning, Electrical Engineering, Hardware Development, Human-Computer Interaction, IBM MVS Operating System, Industrial Research, Information Models, Leadership, Leading Edge Technology, Localization, Machine Learning, Memory Management, Modeling Languages, Problem Solving Skills, Product Design, Publications, Python Programming/Scripting Language, Reinforcement Learning, Research & Development (R&D), Research Skills, Scene Understanding, Scientific Research, Software Engineering, Team Player, User Interface/Experience (UI/UX)
About the Team
PICO-MR team is dedicated to pioneering core technologies for intelligent human-computer interaction in MR environments, with a focus on integrating multimodal large language models (MLLM) and tool-use capabilities to redefine user experiences. Our R&D directions cover cutting-edge fields including multimodal scene understanding, MLLM-based agent systems, tool-augmented MR interaction, 3D environment perception, and AIGC-driven content generation. Within MR scenarios, our work spans: MLLM optimization and adaptation for MR, intelligent task execution with tool use, multimodal scene understanding (vision, point clouds, text), AIGC-based scene generation, depth estimation (Mono/Stereo/MVS), 3D environment perception, large-scale 3D scene reconstruction (3DGS, NeRF, etc.), visual localization, and lighting estimation-encompassing both fundamental research breakthroughs and industrial-grade solution deployment.
Responsibilities:
• Lead the R&D of multimodal large language models (MLLM) tailored for MR scenarios, integrating vision, point clouds, text, and other multimodal information-including model architecture optimization, cross-modal alignment, data construction, evaluation system enhancement, and end-to-end training/inference acceleration.
• Drive the research and implementation of MLLM tool-use capabilities in MR environments, enabling models to proficiently utilize spatial interaction and spatial computing-related professional tools, support tool calls for both single-turn and multi-turn conversations, and solve complex user tasks through interaction.
• Address key challenges in long-horizon, multi-turn tool-augmented tasks in MR, such as context memory management, tool selection strategy, and error correction mechanisms.
• Keep abreast of cutting-edge technologies in MLLM, multimodal intelligence, and tool-use research, and lead the application and deployment of innovative technologies in PICOs MR products.
• Collaborate with cross-functional teams (including software engineering, product design, and hardware development) to translate research outcomes into practical features that enhance user experience.
Minimum Qualifications
- Masters or Ph.D. degree in Computer Science, Electrical Engineering, Machine Learning, Artificial Intelligence, or a related quantitative field.
- Expertise in multimodal large model pre-training, post-training, fine-tuning, or cross-modal fusion technologies, with hands-on experience in model optimization, training workflow design, and performance tuning.
- Proven research experience in LLM tool use, reinforcement learning, LLM agents, or interactive learning, with a deep understanding of single-turn and multi-turn interaction mechanisms.
- Proficiency in core 2D/3D computer vision tasks, including detection, segmentation, depth estimation, image matching, and 3D scene perception.
- Skilled in Python and C++, with solid programming capabilities and experience in developing large-scale models using mainstream deep learning frameworks (PyTorch/TensorFlow).
- Excellent problem-solving and independent research abilities, capable of addressing complex technical challenges in the integration of MR and MLLM tool use.
Preferred Qualifications
- Publications in AI/ML/CV conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP) focusing on multimodal large models, LLM tool use, or agent systems.
- Hands-on experience in building large-scale MLLM training pipelines, tool-use evaluation systems, or multimodal agent platforms.
- Familiarity with MR/AR/VR technologies, spatial computing, or 3D scene reconstruction (3DGS, NeRF, etc.) is a strong plus.
- Experience in addressing long-horizon reasoning or asynchronous agent behavior challenges is highly valued.
- Award winners of competitions such as ACM-ICPC, NOI/IOI, TopCoder, or AI/ML contests (e.g., Kaggle) are preferred.
- Strong collaboration and communication skills, able to lead research initiatives and drive cross-team technical alignment.
B
Beijing ByteDance Technology Co Ltd