Mountain View, CA30+ days ago ResponsibilitiesAs a Principal Software Engineer on the team the common tasks of the job would include, but not be limited to:Identify and drive improvements to end-to-end inference performance of OpenAI and other state of the art LLMsMeasure, benchmark performance on Nvidia/AMD GPU's and first party Microsoft siliconOptimize and monitor performance of LLMs and build SW tooling to enable insights into performance opportunities ranging from the model level to the systems and silicon level, help reduce the footprint of the computing fleet and achieve Azure AI capex goalsEnable fast time to market of LLMs/models and their deployments at scale by building SW tools that afford velocity in porting models on new Nvidia, AMD GPUs and Maia siliconDesign, implement, and test functions or components for our AI/DNN/LLM frameworks and toolsSpeeding up/reducing complexity of key components/pipelines to improve performance and/or efficiency of our systemsCommunicate and collaborate with our partners both internal and externalEmbody Microsoft's Culture and ValuesQualificationsRequired Qualifications:Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience. Preferred Qualifications:Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or PythonOR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or PythonOR equivalent experience. Mountain View, CA30+ days ago ResponsibilitiesAs a Senior Software Engineer on the team the common tasks of the job would include, but not be limited to:Identify and drive improvements to end-to-end inference performance of OpenAI and other state-of-the-art LLMsMeasure, benchmark performance on Nvidia/AMD GPUs and first party Microsoft siliconOptimize and monitor performance of LLMs and build SW tooling to enable insights into performance opportunities ranging from the model level to the systems and silicon level to improve customer experience and reduce the footprint of the computing fleetEnable fast time to market of LLMs/models and their deployments at scale by building SW tools that afford velocity in porting models on new Nvidia and AMD GPUsDesign, implement, and test functions or components for our AI/DNN/LLM frameworks and toolsSpeeding up/reducing complexity of key components/pipelines to improve performance and/or efficiency of our systemsCommunicate and collaborate with our partners both internal and externalEmbody Microsoft's Culture and ValuesQualificationsRequired Qualifications:Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience. Preferred Qualifications:Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or PythonOR Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or PythonOR equivalent experience. San Francisco, CA30+ days ago You will work closely with our pretraining and inference teams to identify bottlenecks, design and implement highly optimized kernels, and push the limits of throughput, latency, and hardware utilization across a range of accelerator platforms. Strong understanding of distributed training systems and parallelism schemes, including data parallelism, tensor/model parallelism, pipeline parallelism, sharding, and communication/computation overlap. San Francisco, California30+ days ago You will work closely with our pretraining and inference teams to identify bottlenecks, design and implement highly optimized kernels, and push the limits of throughput, latency, and hardware utilization across a range of accelerator platforms. Strong understanding of distributed training systems and parallelism schemes, including data parallelism, tensor/model parallelism, pipeline parallelism, sharding, and communication/computation overlap. South San Francisco, CA30+ days ago This role operates the prioritization and funding, model governing the full lifecycle from demand intake and investment approval through to delivery oversight and value realization for GDDA initiatives, ensuring that digital, data, and AI investments are aligned to strategic priorities and deliver measurable business value. Working in close collaboration with the Head of GDDA, Finance, and senior business stakeholders, this leader ensures that strategic priorities translate into clear portfolio decisions, sequencing of initiatives, and disciplined oversight across the GDDA organization. Steers and conducts cross-functional meetings with maintenance schedulers, E&T teams, and CAPEX project managers to ensure routine and non-routine planned events do not disrupt the operations schedule and to minimize impact of equipment downtime and maintenance work on productivity. This principal/Sr Specialist/specialist level role is a subject matter expert and will serve as the key liaison between production planning in Manufacturing and Supply Chain - combining the supply demands and information of the manufacturing planning team and other departments to facilitate maximum productivity of the facility. San Mateo, California30+ days ago The Skydio team combines deep expertise in artificial intelligence, best-in-class hardware and software product development, operational excellence, and customer obsession to empower a broader, more diverse audience of drone users, from utility inspectors to first responders, soldiers in battlefield scenarios, and beyond. By combining best-in-class autonomy and cloud-connected real-time visibility, drones will become the critical and ubiquitous infrastructure that can be deployed and monitored in real-time to solve various problems faster, cheaper, and safer than ever before. San Francisco, CA30+ days ago p>One of the highest impact roles at one of the fastest growing companies (revenue is growing 40% MoM, we are 60x+ RR compared to last year, raised Series A/B/C within the last 12 months) with a world changing vision: hyperscaling human creativity. Expert-level familiarity with advanced inference techniques: quantization, kernel authoring, compilation, model parallelism (TP, context/sequence parallel, expert parallel), distributed serving and profiling. Mountain View, CA30+ days ago Strategy & Consulting: We work with C-suite executives, leaders and boards of the world's leading organizations, helping them reinvent every part of their enterprise to drive greater growth, enhance competitiveness, implement operational improvements, reduce cost, deliver sustainable 360° stakeholder value, and set a new performance frontier for themselves and the industry in which they operate. Accenture is a leading global professional services company that helps the world's leading businesses, governments and other organizations build their digital core, optimize their operations, accelerate revenue growth and enhance citizen services-creating tangible value at speed and scale. Burlingame, CA30+ days ago Quadrics co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. This is a hands-on role: beyond analysis, you will prototype solutions yourself - whether that means writing optimized code, modifying compiler passes, or building proof-of-concept implementations to validate proposed fixes before handing off to the appropriate team for productization. p>In collaboration with multiple teams across Arm's engineering organization, you will diagnose and resolve performance challenges, and use these insights to influence Arms IP and tooling roadmaps. Job Description: In this role, you will work closely with customers to optimize AI workloads targeting Arm technology, focusing on achieving best-in-class performance and power efficiency. San Francisco, California3 days ago You'll build and optimize AMD GPU backends, kernels, runtime paths, and benchmarking infrastructure using ROCm, HIP, Triton, CK, AITER, and related tooling so vLLM can deliver frontier inference performance on AMD GPUs. Bonus points if you have: Contributed to vLLM, ROCm, HIP, Triton, CK, AITER, PyTorch, compiler projects, or other open-source ML infrastructure.
San Francisco, California12 days ago p style="min-height:1.5em">Bonus points if you have: Contributed to vLLM, JAX/XLA, Pallas, PyTorch/XLA, compiler projects, or other open-source ML infrastructure. You'll build and optimize TPU backends, compiler integrations, runtime paths, and benchmarking infrastructure using JAX, XLA, Pallas, and related tooling so vLLM can deliver frontier inference performance on TPU hardware.
San Francisco, CA24 days ago If education verification is required, information on how to verify education requirements, including verifying foreign education credits or degree equivalency, can be found at https://careers.sf.gov/knowledge/experience-education/.Note: City Performance staff members provide consulting services to multiple departments every year to streamline and coordinate City processes, conduct analyses to answer key policy questions, promote meaningful collaboration between departments, and support a culture of transparency and accountability. |