Senior AI Engineer (Production Agentic & RAG Systems)

EPAM Systems Inc

Atlanta, GA

JOB DETAILS
SKILLS
Accounting, Amazon Web Services (AWS), Artificial Intelligence (AI), Automation, Caching, Continuous Deployment/Delivery, Continuous Integration, Cost Engineering, Cost Modeling, Docker, English Language, Error Handling, GitHub, Jenkins, MCP - Microsoft Certified Professional, Memory Hardware, Microsoft Windows Azure, Natural Language Processing (NLP), Network Routers, On Call, Production Systems, Python Programming/Scripting Language, Service Level Agreement (SLA), Shallow Parsing, Software Engineering, System Operations, Writing Skills
LOCATION
Atlanta, GA
POSTED
10 days ago

Back to Search

Senior AI Engineer (Production Agentic & RAG Systems)

Remote in Georgia, & 4 others

AI Engineering

apply

FacebookLinkedInSend via email

Looking for something else?

Find a vacancy that works for you. Send us your CV to receive a personalized offer.

Find me a job

Location-specific conditions & benefits*

Choose an option

We are seeking a hands-on Senior AI Engineer who designs, builds, and operates production GenAI systems - agentic workflows, RAG pipelines, and LLM-backed services with real users and real SLAs. This is an engineering role, not a research role. The bar is reliability, latency, cost, observability, and safe deployment at scale, with end-to-end ownership from architecture through on-call. Typical workloads include enterprise knowledge platforms, conversational analytics, agentic automation, and LLM-augmented data products.

Responsibilities

  • Design agent orchestration (graph/state, conditional routing, tool calling, memory, checkpointing) in LangGraph / LangChain or equivalent

  • Build production RAG end-to-end: chunking, embeddings, vector stores, hybrid retrieval, reranking, caching, and grounded synthesis

  • Own Python / FastAPI services - async, SSE streaming, session handling, and structured error contracts

  • Instrument with tracing and evaluation harnesses (MLflow, OpenTelemetry, or equivalent) for accuracy, cost, and regression

  • Ship on Docker + Kubernetes (EKS/AKS/GKE) via CI/CD with test, eval, and canary gates

  • Drive LLM cost engineering - model routing, prompt optimization, caching, token accounting, and build-vs-buy decisions

  • Apply GenAI safety & governance: hallucination control, prompt-injection defense, PII handling, and HITL where required

  • Partner with data engineering on semantic layers and pipelines (PySpark / SQL where applicable)

Requirements

  • 5+ years in software engineering, with 2+ years shipping production LLM / agentic systems (not POCs or research)

  • Proficiency in Python and FastAPI (async, REST, SSE)

  • Production expertise in LangChain and LangGraph (or equivalent serious production experience with LlamaIndex, AutoGen, or MCP stacks)

  • Background in production RAG: embeddings, chunking, and hybrid retrieval with reranking and caching

  • Skills in vector databases such as Pinecone, Weaviate, pgvector, OpenSearch, or Databricks Vector Search

  • Knowledge of at least one major LLM provider in production - AWS Bedrock (preferred), OpenAI / Azure OpenAI, or Anthropic - with model selection and routing trade-offs

  • Competency in Kubernetes and Docker in real production environments (EKS/AKS/GKE)

  • Expertise in cloud engineering on AWS

  • Familiarity with observability and tracing tools (MLflow, LangSmith, OpenTelemetry), evaluation harnesses, and latency/cost ownership

  • Capability to build CI/CD for AI systems (GitHub Actions, Jenkins, or equivalent) with test/eval gates

  • Strong written and spoken English (B2 level); able to own design discussions with engineering and business stakeholders independently

Nice to have

  • Databricks depth - MLflow (tracking & serving), Vector Search, Unity Catalog / Metric Views, PySpark / SQL

  • Experience with LLM fine-tuning - PEFT, LoRA, QLoRA

  • Understanding of MCP servers and tool integration

  • Qualifications in GenAI governance & FinOps - auditability, prompt-injection hardening, PII, and token cost in regulated environments

  • Background in classical ML / DL - NLP, BERT-family, time-series, and CV

About the Company

E

EPAM Systems Inc