Meta is seeking a creative, skilled and motivated Research Scientist to advance the state-of-the-art in multi-modal understanding. You will work on developing models that reason across vision, language, and other modalities to enable richer AI experiences across Meta's family of apps and products. You will collaborate with research scientists, software engineers, and data scientists to design technical solutions in a fast-paced multidisciplinary environment.Develop and advance multi-modal models that integrate vision, language, audio, and other modalities Research novel architectures and training methods for cross-modal reasoning and understanding Design and prototype interactive experiences that leverage multi-modal AI capabilities Collaborate across teams to develop concepts that advance the entire research pipeline (hardware, software, data collection, machine learning, etc.) Publish research findings at top-tier conferences and contribute to the broader research communityCurrently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta Currently has, or is in the process of obtaining, a PhD degree in Computer Science, Machine Learning, or relevant technical field. Degree must be completed prior to joining Meta Experience in multi-modal learning, combining vision, audio, language, or related areas Experience working with PyTorch or TensorFlow Experience with transformer architectures and large-scale model training Technical knowledge across machine learning, deep learning, and statistical modeling Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment First-authored publications at leading conferences such as NeurIPS, ICML, and CVPR, or similar Experience with large language models (LLMs) and their integration with other modalities Experience transferring multi-modal research into shipping products Experience working and communicating cross-functionally in a team environment Research experience in vision-language models, multi-modal transformers, or cross-modal representation learningMeta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics.Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. You may view our Equal Employment Opportunity notice here .Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form .