AI DevOps Systems Administrator

Column Technical Services

Scottsdale, AZ

JOB DETAILS
SKILLS
Access Control, Ansible, Artificial Intelligence (AI), Computer Maintenance, Computer Science, Configuration Management, Continuous Deployment/Delivery, Continuous Improvement, Continuous Integration, Customer Relations, Data Science, DevOps, Docker, Ecosystems, Emerging Technology, GPU (Graphics Processing Unit), Hardware Virtualization, Identify Issues, Information/Data Security (InfoSec), Linux Operating System, Machine Learning, Mentoring, Multiplatform/Cross-Platform, Operating Systems, Operational Strategy, Operations, Operations Management, People Management, Performance Management, Problem Solving Skills, Scripting (Scripting Languages), Security Infrastructure, Software Administration, Software Configuration Management, System Architecture, Systems Administration/Management, Systems Maintenance, Team Player, Training/Teaching, Willing to Travel
LOCATION
Scottsdale, AZ
POSTED
Today
Column Technical Services is seeking a highly skilled AI DevOps Systems Administrator to architect, support, and evolve the infrastructure powering our cutting-edge Artificial Intelligence and Machine Learning initiatives in a secure, classified environment in Scottsdale, AZ. In this role, you'll be at the forefront of innovation, driving reliable model development and deployment by optimizing pipelines, maximizing compute performance, and ensuring robust scalability and security across platforms. This is a unique opportunity to work with advanced technologies while making a direct impact on mission-critical systems. If you're passionate about AI infrastructure, thrive in high-performance environments, and are ready to take on meaningful, complex challenges, we encourage you to apply.
In this role, you will work closely with data scientists and machine learning engineers to enable seamless transitions from experimentation to production.

Sponsorship is not available for this role. Candidates must currently reside in or near Scottsdale, Arizona.

Core Responsibilities
  • Architect, deploy, and support scalable environments for AI/ML training and inference workloads
  • Build and maintain automated CI/CD workflows for machine learning models and AI-driven applications
  • Administer and fine-tune Linux-based systems across physical and virtual infrastructures
  • Implement and manage containerized environments using tools such as Docker and Kubernetes to support scalable ML services
  • Utilize Infrastructure as Code (IaC) solutions (e.g., Terraform, Ansible) to automate provisioning, configuration, and system management
  • Optimize allocation and usage of GPU resources for compute-intensive workloads
  • Establish monitoring, logging, and alerting frameworks to ensure system health, availability, and performance
  • Partner with engineering teams to troubleshoot issues, improve workflows, and meet infrastructure requirements
Additional Responsibilities
As a senior-level contributor, you will serve as a key technical point of contact, supporting users and participating in system design and evolution efforts to align with emerging technologies. You will:
  • Install, configure, and maintain software and system components
  • Diagnose and resolve technical issues, including access control and permissions
  • Provide guidance and training to users on system functionality
  • Manage daily operations of server environments across both physical and virtual platforms
  • Configure, maintain, and troubleshoot hardware, operating systems, and network interfaces
  • Investigate and resolve system alerts, ensuring continuity of services
  • Develop scripts to streamline and automate repetitive operational tasks
  • Collaborate directly with stakeholders to identify, isolate, and resolve system-related issues impacting broader services
What Sets You Apart
  • A collaborative mindset with a strong commitment to team success and shared outcomes
  • Solid understanding of how systems, servers, and services interconnect within a broader IT ecosystem
  • Advanced expertise in supporting both physical and virtual server environments
  • Deep knowledge of access controls, permissions, and security practices to ensure appropriate and secure data access
  • A proactive approach to identifying opportunities to leverage AI for operational efficiency, continuous improvement, and innovation
What You'll Experience
  • Work with advanced and often highly classified technologies
  • Be part of a forward-thinking team focused on innovation and exploration
  • Continuous learning opportunities aligned with emerging advancements


Qualifications
  • Minimum of 8 years of relevant experience OR a Master's degree with 6+ years of experience
  • Bachelor's degree in Computer Science, a related discipline, or equivalent experience 
  • Deep expertise in server-based operating systems
  • Strong proficiency in Linux environments, containerization, and AI/ML infrastructure
  • Proven ability to serve as a subject matter expert and mentor team members
  • Advanced troubleshooting skills across operating systems, networking, and storage technologies
  • Hands-on experience building, deploying, and maintaining enterprise-scale server environments
  • Exposure to or experience working with AI/ML workloads is highly desirable
  • Willingness to travel occasionally

About the Company

C

Column Technical Services