Senior AI Platform / LLM Infrastructure Engineer

PeopleNTech LLC

Alexandria, VA

JOB DETAILS
SALARY
$75–$77 Per Hour
SKILLS
Artificial Intelligence (AI), Benchmarking, CUDA (Compute Unified Device Architecture), Caching, Distributed Computing, GPU (Graphics Processing Unit), Incident Response, Load Testing, Performance Analysis, Performance Modeling, Performance Testing, Performance Tuning/Optimization, Python Programming/Scripting Language
LOCATION
Alexandria, VA
POSTED
12 days ago
Indent : SF_OP_204606-1-1
Role : Senior AI Platform / LLM Infrastructure Engineer
Location : Charlotte, NC (Hybrid)
Rate: $75/hr - $77/hr

We are hiring a Senior AI Platform Engineer to build and optimize on-prem LLM inference platforms. The role focuses on high-performance model serving, GPU workloads, and scalable ML infrastructure using modern inference frameworks and Kubernetes.

Must-Have Skills
• LLM Inference Frameworks: vLLM, TensorRT-LLM, Triton Inference Server, SGLang
• Model Optimization: Continuous Batching, Speculative Decoding, KV Cache / Prefix Caching, FP8 / AWQ / GPTQ
• Distributed/Parallel Systems: Tensor Parallelism
• Platform & Orchestration: Kubernetes, KServe, OpenShift AI, Helm / Operators
• GPU & Performance: CUDA, NCCL, MIG, GPU Orchestration (Run:AI)
• Monitoring: Prometheus, Grafana, ML Observability
• Programming: Python
• GenAI Tools: Arize AI, Claude (CoWork)
• Load / performance testing: GuideLLM, Locust
=' Key Responsibilities
• Build and manage LLM inference platforms on on-prem GPU infrastructure
• Optimize model performance using advanced inference techniques (batching, caching, quantization)
• Deploy and operate ML workloads on Kubernetes (KServe/OpenShift AI)
• Enable GPU scheduling and orchestration for large-scale workloads
• Implement monitoring and performance benchmarking frameworks
• Drive SRE practices for platform reliability and scalability (observability, incident handling)
• Collaborate with AI/ML teams to enable production-grade GenAI deployments


About the Company

P

PeopleNTech LLC