Senior HPC Specialist

Macpower Digital Assets Edge Private Limited

Denver, CO

JOB DETAILS
SALARY
$45–$49 Per Hour
SKILLS
Bash Scripting, Benchmarking, Best Practices, CentOS, Customer Support/Service, Data Management, Data Recovery, Data Storage, Disaster Recovery, Emerging Technology, File Systems, High Availability, High Throughput, Identify Issues, Large-Scale Systems, Linux Administration, Linux Operating System, Performance Analysis, Performance Tuning/Optimization, Perl Programming Language, Problem Solving Skills, Python Programming/Scripting Language, Red Hat Linux Operating System, Regulatory Compliance, Scripting (Scripting Languages), Systems Administration/Management, Systems Maintenance, Systems Scalability, Team Player, Technical Support, Technical Training, Technical Writing, Ubuntu
LOCATION
Denver, CO
POSTED
12 days ago
Job Summary: We are seeking a highly skilled and experienced Senior HPC Specialist to design, implement, and maintain high-performance computing (HPC) systems and solutions. The ideal candidate will play a critical role in optimizing computational performance, ensuring the reliability of the infrastructure, and supporting advanced computational workloads in a dynamic and innovative environment.
Key Responsibilities:
HPC System Design & Implementation:
  • Design and deploy HPC clusters, including compute, storage, and networking components.
  • Evaluate and integrate new HPC technologies to enhance system performance and scalability.
System Administration & Maintenance:
  • Manage Linux-based HPC systems using job schedulers (e.g., Slurm, PBS, Grid Engine).
  • Monitor system health, troubleshoot issues, and resolve performance bottlenecks.
  • Ensure optimal configuration and high availability of HPC resources.
Performance Optimization:
  • Profile and fine-tune applications and workloads for peak performance on HPC systems.
  • Analyze job performance and provide recommendations to users for enhancements.
Storage & Data Management:
  • Administer large-scale parallel file systems (e.g., Lustre, GPFS, BeeGFS).
  • Implement data transfer and storage strategies for high-throughput workloads.
User Support & Collaboration:
  • Provide technical support and training to researchers and end users.
  • Work with interdisciplinary teams to understand and meet computational requirements.
Security & Compliance:
  • Adhere to security best practices and compliance standards for HPC systems.
  • Develop and manage backup and disaster recovery solutions.
Required Qualifications:
  • HPC Expertise: Proven experience in HPC cluster design, deployment, and management (compute, storage, networking).
  • Linux Administration: Proficiency in administering Linux systems (RedHat, CentOS, Ubuntu).
  • Job Scheduling: Hands-on experience with job schedulers like Slurm, PBS, or Grid Engine.
  • Performance Tuning: Strong skills in profiling, benchmarking, and optimization techniques.
  • Parallel File Systems: Expertise in managing Lustre, GPFS, or BeeGFS.
  • Scripting & Automation: Advanced knowledge of scripting languages (Bash, Python, Perl).
  • User Support: Ability to provide technical documentation, training, and collaboration.
  • Security: Knowledge of system hardening, backups, and disaster recovery processes.
Preferred Qualifications:
  • Collaboration Skills: Experience working in interdisciplinary teams to meet diverse computational needs.
  • Innovation: Passion for exploring emerging HPC technologies to enhance capabilities.

About the Company

M

Macpower Digital Assets Edge Private Limited