HPC Linux System Administrator

MRI Technologies

Houston, TX

JOB DETAILS
SKILLS
Administrative Skills, Analysis Skills, Ansible, Artificial Intelligence (AI), Automation, Broadband, Communication Skills, Computer Workstations, Configuration Management, Continuous Deployment/Delivery, Continuous Integration, File Systems, Git, Intel Product Family, Linux Administration, Linux Distributions, MPI, Machine Tool, Magnetic Resonance Imaging (MRI), Microservices, Operating Systems, Problem Solving Skills, Red Hat Linux Operating System, Risk Analysis, Schedule Development, Systems Administration/Management, Team Player, United States Citizen
LOCATION
Houston, TX
POSTED
2 days ago

MRI Technologies has an exciting opportunity for an HPC Linux System Administrator on the JETS II contract at NASA Johnson Space Center. You will support the Flight Sciences Laboratory (FSL), one of JSC's primary computing facilities-over 700 machines, 26,000 cores, and 10+ petabytes of storage serving more than 1,000 users. The analyses running on FSL infrastructure support nearly every major NASA program, including International Space Station (ISS), Orion, Space Launch System (SLS), Commercial Crew, Lunar Gateway, and Human Landing System.

Your responsibilities will include working with a team of System Administrators to build and maintain all FSL services. Performing High Performance Computer (HPC) and high-end Linux workstation administration. You will need to perform high speed parallel filesystem administration and high-speed parallel filesystems administration and job scheduler administration. You will be responsible for investigating problems to proactively monitor system health. A core part of the role is supporting containerized HPC workflows-FSL uses containers (not in a traditional microservices sense) so that users can bring their own environments into the cluster when they need an older OS or older package versions. You will also help build out CI/CD workflows on the cluster and run nodes, including the team's adoption of Jacamar (https://gitlab.com/ecp-ci/jacamar-ci) for HPC CI/CD.You will work closely with FSL users to make sure they can support the NASA human spaceflight mission.

What We Are Looking For

Requirements:

  • Typically requires a bachelor's degree or equivalent certification in a related field, with a minimum of 5 years of experience
  • Linux system administration experience
  • HPC job scheduler administration experience
  • Experience using containers in an HPC context-packaging and running user environments (including older OS or older package versions) on a shared cluster rather than for traditional microservices
  • Experience building and supporting CI/CD workflows, ideally tied to HPC clusters and run nodes
  • System configuration management experience
  • High-speed parallel file storage administration experience
  • Experience with monitoring and alerting systems
  • Demonstrated problem-solving, planning, and communication skills
  • Ability to work effectively in a team environment

Preferences:

  • Strong skills administering parallel filesystems such as Lustre or GPFS
  • Strong skills administering the SLURM job scheduling system
  • Experience with RedHat-based Linux distributions
  • Familiarity with InfiniBand high-speed networking
  • Experience with provisioning tools (xCAT, Warewulf)
  • Experience with Ansible and/or Foreman for configuration management
  • Familiarity with SPACK software package manager
  • Experience with log consolidation, monitoring, and Git/GitLab (including CI/CD pipelines)
  • Familiarity with Jacamar (https://gitlab.com/ecp-ci/jacamar-ci) or comparable tooling for running CI/CD jobs against HPC clusters and run nodes
  • Experience applying AI in a sysadmin role-integrating AI tooling into operational workflows, automation, and user-facing HPC jobs
  • MPI workflow and administration experience (e.g., Open MPI, MPICH, Intel MPI)
  • Package management and environment modules experience (e.g., Lmod/Environment Modules, SPACK, EasyBuild)
  • Knowledge of NASA security mechanisms (security plans, POAMs, ATOs, Risk Assessments)

This position has been posted at multiple levels. Depending on your experience and business needs, we may consider candidates at any level for which the position is advertised.

Benefits and Perks

We offer a comprehensive benefits package including medical, dental, vision, company paid life and disability insurance, paid time off, and 401(k). You'll also enjoy a 9/80 work schedule (every other Friday off, when applicable), and the chance to work in one of JSC's most critical computing environments supporting human spaceflight.

Proof of U.S. Citizenship or U.S. Permanent Residency is a requirement for this position.

MRI Technologies is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.



About the Company

M

MRI Technologies