Senior AI Engineer (Production Agentic & RAG Systems)

EPAM Systems Inc

Atlanta, GA

Apply

JOB DETAILS

SKILLS

Accounting, Amazon Web Services (AWS), Artificial Intelligence (AI), Automation, Caching, Continuous Deployment/Delivery, Continuous Integration, Cost Engineering, Cost Modeling, Docker, English Language, Error Handling, GitHub, Jenkins, MCP - Microsoft Certified Professional, Memory Hardware, Microsoft Windows Azure, Natural Language Processing (NLP), Network Routers, On Call, Production Systems, Python Programming/Scripting Language, Service Level Agreement (SLA), Shallow Parsing, Software Engineering, System Operations, Writing Skills

LOCATION

Atlanta, GA

POSTED

10 days ago

Back to Search

Senior AI Engineer (Production Agentic & RAG Systems)

Remote in Georgia, & 4 others

AI Engineering

apply

FacebookLinkedInSend via email

Looking for something else?

Find a vacancy that works for you. Send us your CV to receive a personalized offer.

Find me a job

Location-specific conditions & benefits*

Choose an option

We are seeking a hands-on Senior AI Engineer who designs, builds, and operates production GenAI systems - agentic workflows, RAG pipelines, and LLM-backed services with real users and real SLAs. This is an engineering role, not a research role. The bar is reliability, latency, cost, observability, and safe deployment at scale, with end-to-end ownership from architecture through on-call. Typical workloads include enterprise knowledge platforms, conversational analytics, agentic automation, and LLM-augmented data products.

Responsibilities

Design agent orchestration (graph/state, conditional routing, tool calling, memory, checkpointing) in LangGraph / LangChain or equivalent
Build production RAG end-to-end: chunking, embeddings, vector stores, hybrid retrieval, reranking, caching, and grounded synthesis
Own Python / FastAPI services - async, SSE streaming, session handling, and structured error contracts
Instrument with tracing and evaluation harnesses (MLflow, OpenTelemetry, or equivalent) for accuracy, cost, and regression
Ship on Docker + Kubernetes (EKS/AKS/GKE) via CI/CD with test, eval, and canary gates
Drive LLM cost engineering - model routing, prompt optimization, caching, token accounting, and build-vs-buy decisions
Apply GenAI safety & governance: hallucination control, prompt-injection defense, PII handling, and HITL where required
Partner with data engineering on semantic layers and pipelines (PySpark / SQL where applicable)

Requirements

5+ years in software engineering, with 2+ years shipping production LLM / agentic systems (not POCs or research)
Proficiency in Python and FastAPI (async, REST, SSE)
Production expertise in LangChain and LangGraph (or equivalent serious production experience with LlamaIndex, AutoGen, or MCP stacks)
Background in production RAG: embeddings, chunking, and hybrid retrieval with reranking and caching
Skills in vector databases such as Pinecone, Weaviate, pgvector, OpenSearch, or Databricks Vector Search
Knowledge of at least one major LLM provider in production - AWS Bedrock (preferred), OpenAI / Azure OpenAI, or Anthropic - with model selection and routing trade-offs
Competency in Kubernetes and Docker in real production environments (EKS/AKS/GKE)
Expertise in cloud engineering on AWS
Familiarity with observability and tracing tools (MLflow, LangSmith, OpenTelemetry), evaluation harnesses, and latency/cost ownership
Capability to build CI/CD for AI systems (GitHub Actions, Jenkins, or equivalent) with test/eval gates
Strong written and spoken English (B2 level); able to own design discussions with engineering and business stakeholders independently

Nice to have

Databricks depth - MLflow (tracking & serving), Vector Search, Unity Catalog / Metric Views, PySpark / SQL
Experience with LLM fine-tuning - PEFT, LoRA, QLoRA
Understanding of MCP servers and tool integration
Qualifications in GenAI governance & FinOps - auditability, prompt-injection hardening, PII, and token cost in regulated environments
Background in classical ML / DL - NLP, BERT-family, time-series, and CV

About the Company

EPAM Systems Inc

Resume Resources

Free Resume Templates Free Resume Builder