Los Angeles, CA30+ days ago
Hands-on experience deploying and operating ML models at scale, including: GPU-based inference services concurrency handling and request batching latency and throughput optimization Experience with cloud platforms and ML deployment stacks, such as: AWS (SageMaker, EC2, EKS), GCP, or similar Docker, containers, CI/CD pipelines Solid understanding of systems performance, debugging, and reliability engineering. Hands-on experience with model optimization and acceleration, such as: quantization, pruning, distillation ONNX Runtime, TensorRT, FSDP, DeepSpeed Experience with distributed systems or scalable inference frameworks (Ray, Triton, TorchServe).