San Francisco30+ days ago
You may be a good fit for the Model Efficiency team if you have:5+ years of experience writing high-performance, production-quality codeStrong programming skills in C++ or Python (Rust/Go also welcome)Experience working with large language models and familiarity with the LLM inference ecosystem (e.g., vLLM, SGLang, etc.)Ability to diagnose and resolve performance bottlenecks across the model execution stackA strong bias for action - you ship fast, measure impact, and iterateIt's a big plus if you have experience with:GPU programming, CUDA, or low-level systems optimizationLanguage modeling with transformers (MoE, speculative decoding, KV-cache optimizations)Scaling performance-critical distributed systems (e.g., computation, search, storage)If some of the above doesn't line up perfectly with your experience, we still encourage you to apply! Full-Time Employees at Cohere enjoy these Perks: An open and inclusive culture and work environment Work closely with a team on the cutting edge of AI research Weekly lunch stipend, in-office lunches & snacks Full health and dental benefits, including a separate budget to take care of your mental health 100% Parental Leave top-up for up to 6 months Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend️ 6 weeks of vacation (30 working days!).