Bala Cynwyd (Philadelphia Area), Pennsylvania30+ days ago
This role is focused on workloads where off-the-shelf runtimes and vendor libraries do not fully exploit the structure of the model, and where custom kernels, memory layouts, and execution strategies can deliver meaningful gains. Exposure to neural networks, tree-based models (e.g., LightGBM), state space models (e.g., Mamba architectures), and experience with kernel fusion, custom operators, model compilation, or graph-level optimization.