AWS Lambda, Amazon Web Services (AWS), Application Programming Interface (API), Artificial Intelligence (AI), Cloud Computing, Continuous Deployment/Delivery, Continuous Integration, Cost Control, Data Processing, Data Science, Database Administration, Database Architecture, DevOps, Develop and Maintain Customers, Documentation, Embedded Systems, Environmental Sciences, Financial Control, Identify Issues, Image Management, Incident Response, Knowledge Transfer, Microsoft Windows Azure, MySQL, Network Administration/Management, On Call, Oracle, PostgreSQL, Prototyping, Resource Management, Root Cause Analysis, Security Architecture, Security Patches, Service Level Agreement (SLA), Software Engineering, Statistical Analysis System (SAS), Vulnerability Scanners
Position: Sr. AWS infra engineer
Location: either dallas, Texas or Columbus, Ohio or Minneapolis, Minnesota
Onsite: 4 days a week
Interview process: spark hire, then video interview, then face to face (MUST FACE TO FACE IN ANY OF THOSE MARKETS AS THE TEAM SITS ALL OVER)
Contract: 6 months to perm (client does not sponsor unfortunately)
Must have: AWS, Kubernetes, Lambdas, EC2, CI/CD experience, Terraform, Athena, Glue
What we need:
• An engineer who has spent the majority of their career building, operating, and maintaining cloud infrastructure on AWS — not just using cloud services for data processing.
• Hands-on Kubernetes administration: deploying clusters, managing nodes, networking (ingress, CNI), RBAC, persistent storage, and troubleshooting production issues.
• Experience with infrastructure-as-code - primarily Terraform - to provision and manage AWS resources programmatically
• CI/CD pipeline ownership: building and maintaining pipelines using Azure DevOps or equivalent tools
• Security-first mindset: IAM policies, security groups, VPCs, audit logging, vulnerability remediation within SLAs
• Ability to support Data Science and AI/ML platform infrastructure (e.g., Shakudo on EKS, SAS Viya on Kubernetes) not build the models, but run the platform they sit on.
• Experience with AWS services: EKS, ECR, SageMaker (infra layer), Lambda, Athena, Glue - specifically managing and operating them, not just calling APIs from notebooks
• Database infrastructure support: Athena, Oracle, MySQL, Postgres — connection management, performance, security, not DBA-level tuning.
What we DON'T need:
• A Data Scientist or ML Engineer who has 'used AWS' - we need infrastructure operators, not model builders.
• A Data Engineer who knows Spark, Glue jobs, or ETL pipelines — that is not this role
• A Cloud Developer who writes Lambda functions or application code — we need platform engineers.
• Anyone whose primary Kubernetes experience is running kubectl commands in a managed service without understanding the underlying cluster architecture.
• Candidates who list 'Kubernetes' on their resume but cannot explain what a DaemonSet, Ingress controller, or PersistentVolumeClaim is.
• Candidates with AWS certifications but no hands-on production infrastructure experience.
What will this platform engineers do in my team:
This is a Platform Engineering role embedded in the Data Science division. The team runs critical data and AI/ML infrastructure on AWS, including Shakudo (a data science platform), SAS Viya - all running on Kubernetes (EKS). The engineer's job is to keep that infrastructure running, secure, scalable, and automated.
Day-to-Day Responsibilities Include:
• Manage and operate AWS EKS clusters that host Shakudo and SAS Viya — currently supporting 25+ active prototypes, growing to 60+ by end of 2026.
• Build and maintain CI/CD pipelines using Azure DevOps for deploying data science environments and platform updates
• Provision and manage AWS infrastructure using Terraform - VPCs, EKS node groups, IAM roles, ECR, RDS instances.
• Manage container image lifecycle via Amazon ECR - building, versioning, scanning for vulnerabilities.
• Set up and maintain AWS accounts for the Data Science platform, including IAM, cost controls, and security guardrails.
• Respond to infrastructure incidents within SLA - on-call rotation, root cause analysis, post-mortems.
• Perform Kubernetes cluster upgrades, node patching, and security remediation.
• Support Data Scientists by unblocking infrastructure issues - not writing their code, but ensuring their compute and storage works.
• Conduct knowledge transfer sessions within the platform team - documentation, runbooks, workshops
• Collaborate with Network, Database, Architecture, and Security teams.