Administrative Skills, Agile Programming Methodologies, Amazon Web Services (AWS), Analysis Skills, Ansible, Application Programming Interface (API), Artificial Intelligence (AI), Automation, Bash Scripting, CISSP - Certified Information Systems Security Professional, Cloud Architecture, Cloud Computing, Communication Skills, CompTIA Security+, Computer Science, Continuous Deployment/Delivery, Continuous Integration, Cost Control, Cryptography, Data Science, DevOps, Disaster Recovery, Diversity, Docker, Documentation, Emerging Technology, Engineering, Enterprise Applications, Federal Compliance Regulations, Federal Government, GCP (Good Clinical Practices), GPU (Graphics Processing Unit), GitHub, Identify Issues, Incident Response, Information Technology & Information Systems, Internet Security, Jenkins, Leading Edge Technology, Linux Administration, Machine Learning, Maintain Compliance, Microsoft Windows Azure, Model Validation, MongoDB, Operations Security (OPSEC), Performance Management, Performance Modeling, Performance Tuning/Optimization, PostgreSQL, Production Management, Production Systems, Python Programming/Scripting Language, REST (Representational State Transfer), Redis, Reimbursement, Reliability Engineering, Reporting Dashboards, Resource Utilization, Root Cause Analysis, SQL (Structured Query Language), Scripting (Scripting Languages), Security Compliance, ServiceNow, Software Development, Software Engineering, Splunk, Team Player, U.S. National Institute of Standards and Technology (NIST), United States Citizen, Windows PowerShell
TO-695 – Senior AI Operations Engineer
Diverse Agile Solutions (DAS)
Location: Washington, DC (Hybrid) (or as required by the customer)
Clearance: Ability to obtain and maintain a Public Trust or applicable Federal clearance
Citizenship: U.S. Citizenship Required
Employment Type: Full-Time, W2
Performance Period: Through the end of the year, with the possibility of extension
About Diverse Agile Solutions
Diverse Agile Solutions (DAS) is a certified Minority Business Enterprise (MBE) delivering innovative IT solutions to Federal, State, and Commercial customers. Our expertise spans Cloud Engineering, DevSecOps, Artificial Intelligence, Cybersecurity, Enterprise Modernization, and IT Staff Augmentation. We help organizations modernize mission-critical systems while leveraging Agile methodologies and emerging technologies.
We are seeking a Senior AI Operations (AIOps) Engineer to lead the deployment, automation, monitoring, governance, and operational excellence of enterprise Artificial Intelligence and Machine Learning platforms supporting mission-critical federal systems.
This position is ideal for someone who combines DevOps, MLOps, Cloud Engineering, Site Reliability Engineering (SRE), and AI platform operations into scalable, secure production environments.
Position Overview
The Senior AI Operations Engineer will design, implement, automate, and support enterprise AI infrastructure and operational workflows. This individual will be responsible for deploying and maintaining production AI services, optimizing model performance, managing infrastructure automation, implementing monitoring solutions, and ensuring compliance with federal security requirements.
The engineer will work closely with Data Scientists, Machine Learning Engineers, DevSecOps teams, Cloud Architects, Cybersecurity Engineers, and software developers to operationalize AI solutions across secure cloud environments.
Responsibilities
- Deploy, operate, and support enterprise AI/ML production environments
- Design scalable MLOps pipelines for continuous model deployment
- Automate AI infrastructure using Infrastructure as Code (IaC)
- Build CI/CD pipelines supporting machine learning workflows
- Implement automated model validation and deployment strategies
- Monitor model health, drift detection, performance, and availability
- Optimize GPU and compute resource utilization
- Configure logging, observability, and operational dashboards
- Manage AI model lifecycle from development through production
- Support containerized AI workloads using Kubernetes
- Build automated rollback and disaster recovery capabilities
- Secure AI infrastructure following Zero Trust principles
- Implement AI governance and model version management
- Integrate AI platforms with enterprise applications
- Maintain operational documentation and runbooks
- Participate in incident response and root cause analysis
- Collaborate with DevSecOps teams to automate security controls
- Optimize cloud costs for AI workloads
- Ensure compliance with NIST, FedRAMP, and federal security standards
Required Qualifications
- Bachelor's degree in Computer Science, Engineering, Information Systems, or related field
- 8+ years of IT engineering experience
- 5+ years supporting cloud infrastructure
- 4+ years supporting AI/ML production environments
- Experience deploying enterprise AI solutions
- Strong knowledge of MLOps methodologies
- Experience with CI/CD automation
- Experience managing production Kubernetes clusters
- Experience supporting containerized workloads
- Experience with infrastructure automation
- Strong Linux administration experience
- Experience with scripting and automation
- Excellent troubleshooting and analytical skills
- Experience working in Agile environments
- Strong communication and documentation skills
Required Technical Skills
Cloud Platforms
- AWS
- Azure
- Google Cloud Platform (GCP)
AI & Machine Learning
- MLOps
- Model deployment
- Model monitoring
- Model versioning
- Model registry
- Feature stores
- Prompt management
- Generative AI operations
- AI inference optimization
DevOps & Automation
- GitLab CI/CD
- GitHub Actions
- Jenkins
- Terraform
- Ansible
- Helm
- Docker
- Kubernetes
- OpenShift
Programming
- Python
- Bash
- PowerShell
- SQL
- REST APIs
AI Frameworks
- TensorFlow
- PyTorch
- Hugging Face Transformers
- LangChain
- MLflow
- Kubeflow
Monitoring & Observability
- Prometheus
- Grafana
- ELK Stack
- Splunk
- Datadog
- CloudWatch
- Azure Monitor
Data Technologies
- PostgreSQL
- MongoDB
- Redis
- Kafka
- Snowflake
- Vector Databases
Security
- IAM
- Secrets Management
- Encryption
- NIST 800-53
- FedRAMP
- Zero Trust Architecture
Preferred Qualifications
- Experience supporting Federal Government customers
- Experience operating AI workloads in AWS GovCloud
- Experience with Azure AI Foundry
- Experience with Azure OpenAI
- Experience with Amazon Bedrock
- Experience with Vertex AI
- Experience implementing Responsible AI governance
- Experience supporting Retrieval Augmented Generation (RAG) systems
- Experience deploying LLM applications
- Experience with GPU clusters
- Experience with NVIDIA AI Enterprise
- Experience with ServiceNow integrations
Preferred Certifications
One or more of the following:
- AWS Certified DevOps Engineer
- AWS Certified Machine Learning Engineer
- Microsoft Azure AI Engineer Associate
- Microsoft Azure Administrator
- Kubernetes Administrator (CKA)
- HashiCorp Terraform Associate
- Certified Kubernetes Security Specialist (CKS)
- Google Professional Machine Learning Engineer
- Security+
- CISSP
What You'll Do
- Operationalize enterprise AI platforms
- Improve reliability of production AI services
- Build automated AI deployment pipelines
- Reduce operational overhead through automation
- Improve model performance and reliability
- Enhance observability of AI systems
- Implement secure AI operations
- Enable scalable AI infrastructure across multiple cloud environments
Why Join Diverse Agile Solutions?
At DAS, you'll work alongside highly skilled cloud architects, DevSecOps engineers, cybersecurity professionals, and AI specialists supporting mission-critical federal initiatives. We embrace innovation, continuous learning, Agile delivery, and cutting-edge technologies that make a real-world impact.
What We Offer
- Competitive salary
- Comprehensive benefits package
- 401(k)
- Paid Time Off (PTO)
- Paid Federal Holidays
- Professional development and certification reimbursement
- Career advancement opportunities
- Collaborative, innovation-driven culture
Equal Opportunity Employer
Diverse Agile Solutions is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive workplace for all employees regardless of race, color, religion, sex, national origin, age, disability, veteran status, or any other protected characteristic.
BreezyHR Keywords (ATS Optimization)
AI Operations, AIOps, MLOps, Machine Learning Operations, Artificial Intelligence, Large Language Models, LLM, Generative AI, GenAI, Azure AI Foundry, Azure OpenAI, Amazon Bedrock, Vertex AI, Kubernetes, Docker, Terraform, AWS, Azure, GCP, GitLab CI/CD, Jenkins, MLflow, Kubeflow, LangChain, Hugging Face, TensorFlow, PyTorch, Python, Infrastructure as Code, DevSecOps, Site Reliability Engineering, AI Governance, RAG, Prompt Engineering, Model Monitoring, Model Deployment, AI Platform Engineer, Federal Government, GovCloud, Zero Trust, FedRAMP, NIST 800-53.