Platform & DevOps Engineer

Avtal

Austin, TX

JOB DETAILS
SKILLS
Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), Amazon Web Services (AWS), Ansible, Application Programming Interface (API), Artificial Intelligence (AI), Automation, Best Practices, Cloud Computing, Collection Agency, Continuous Deployment/Delivery, Continuous Integration, Credit and Collections, Customer Experience, Debugging Skills, DevOps, Git, High Availability, Identify Issues, Information/Data Security (InfoSec), Linux Administration, Linux Operating System, Machine Tool, Onboarding, Performance Analysis, Problem Solving Skills, Process Management, Production Support, Production Systems, Python Programming/Scripting Language, Reliability Engineering, Revenue Growth, Root Cause Analysis, Software Engineering, Source Code/Configuration Management (SCM), System Operations, Unix Shell Programming
LOCATION
Austin, TX
POSTED
30+ days ago

MTS DevOps Engineer 

Location: Austin, TX
Job Type: Full-Time
Department: Engineering / DevOps

About Avtal, Inc.

We are a VC-backed company that grew revenue 35x in the past year. We help third-party debt collection agencies deliver a digital, end-to-end self-service experience for their consumers.

About the Role

We are looking for a skilled and motivated MTS DevOps Engineer with strong experience in AWS, Linux, infrastructure automation, and CI/CD, along with practical experience supporting AI-enabled systems in production. In this role, you will be instrumental in building, maintaining, and scaling our cloud-native infrastructure, improving deployment workflows, and ensuring the reliability, security, performance, and auditability of our systems in a highly regulated environment. You will also help support the infrastructure and operational foundations needed for AI-powered applications, including secure runtime environments, observability, scalable service orchestration, and cost-conscious operations.

Responsibilities

  • Build and maintain infrastructure automation tools using Ansible, Terraform, Python, Go, and shell scripting
  • Develop and operate secure, scalable infrastructure on AWS (e.g.,  EC2, S3, RDS, IAM, CloudWatch)
  • Maintain and optimize Linux-based systems across development and production environments
  • Implement and manage CI/CD pipelines and automated deployment workflows
  • Support infrastructure for AI-powered services, including runtime reliability, operational visibility, and secure service configuration
  • Help enable LLM API integrations, AI service orchestration, secrets management, and secure runtime environments for AI-enabled applications
  • Monitor system health, performance, reliability, security, and AI service observability using modern tooling
  • Troubleshoot production issues, perform root cause analysis, and implement durable improvements
  • Collaborate with engineering teams to improve infrastructure reliability, scalability, developer productivity, and operational resilience
  • Document infrastructure processes, runbooks, and best practices to support knowledge sharing and onboarding

Requirements

  • 4+ years of experience in DevOps, SRE, or Infrastructure Engineering
  • Proficiency in infrastructure automation and tooling using Ansible, Terraform, Python, Go, and shell scripting
  • Deep understanding of Linux system administration, shell scripting, and process management
  • Proven experience with AWS services such as EC2, S3, RDS, IAM, CloudWatch, etc.
  • Hands-on experience with CI/CD systems and version control (Git)
  • Familiarity with infrastructure needs for AI-enabled systems, such as model API integrations, service orchestration, observability, cost monitoring, or secure data handling
  • Strong debugging, troubleshooting, and problem-solving skills
  • Ability to build and operate systems with attention to reliability, security, and auditability in a highly regulated environment

 Nice to Have

  • Experience supporting production systems that include LLM-based or other AI-powered capabilities
  • Familiarity with AI observability, evaluation support tooling, guardrails, and cost/performance monitoring
  • Experience with vector databases, embeddings pipelines, or retrieval infrastructure
  • Hands-on experience with infrastructure as code, including Terraform or CloudFormation
  • Background in Site Reliability Engineering (SRE) practices
  • Familiarity with monitoring and observability tools such as Prometheus, Grafana, and Kibana
  • Understanding of secure infrastructure design and cloud compliance best practices
  • Experience supporting high-availability production systems in regulated or security-conscious environments

About the Company

A

Avtal