Infrastructure Engineer

Advanced Tech Placement

Roseland, NJ

Apply

JOB DETAILS

SKILLS

Amazon Web Services (AWS), Analysis Skills, Artificial Intelligence (AI), Automation, Business Operations, Cloud Computing, Communication Skills, Continuous Improvement, Cost Control, Cross-Functional, Data Management, Data Modeling, Data Processing, Database Design, DevOps, Emerging Technology, Establish Priorities, GPU (Graphics Processing Unit), GitHub, High Availability, Identify Issues, Incident Response, Infrastructure Software, Leadership, Machine Tool, Mine Production, Operational Improvement, Operational Strategy, Operational Support, Operations Planning, Organizational Skills, Performance Management, Problem Solving Skills, Production Systems, Python Programming/Scripting Language, Reliability Engineering, Risk, Risk Analysis, Risk Management, Software Development, Standards Development, Team Player

LOCATION

Roseland, NJ

POSTED

19 days ago

We are looking for a Infrastructure Engineer

We are seeking a highly skilled Infrastructure Engineer to help design, build, automate, and operate scalable, high-availability production infrastructure in a fast-paced enterprise technology environment. This individual will play a key role in driving reliability, automation, cloud infrastructure strategy, operational excellence, and AI-enabled engineering practices across mission-critical systems.

Responsibilities:

Design, build, automate, and support large-scale, highly available cloud infrastructure environments
Manage and optimize containerized production platforms and orchestration environments
Develop and maintain Infrastructure as Code (IaC) solutions using tools such as Terraform or Pulumi
Build automation tooling, operational utilities, and platform enhancements using Python or Go
Drive infrastructure reliability, scalability, observability, and resiliency initiatives
Partner closely with engineering, product, security, AI/ML, and platform teams to support enterprise-wide initiatives
Implement and maintain monitoring, logging, alerting, and performance management solutions
Troubleshoot complex production issues and proactively identify systemic risks or operational weaknesses
Lead infrastructure improvements with a focus on reversibility, risk mitigation, and minimizing production blast radius
Create operational standards, automation frameworks, and deployment strategies that improve engineering velocity and reliability
Support AI-driven infrastructure operations, intelligent automation initiatives, and AI-assisted engineering workflows
Evaluate and implement emerging AI-enabled operational tooling to improve efficiency, incident response, automation, and developer productivity
Collaborate with engineering teams supporting AI/ML workloads, data platforms, and model deployment pipelines
Own infrastructure initiatives end-to-end, including architecture, implementation, rollout, rollback planning, and operational support

Requirements:

5 years of experience in Infrastructure Engineering, DevOps, Site Reliability Engineering, or similar roles supporting large-scale production environments
Hands-on experience operating containerized production environments and orchestration platforms in enterprise or high-growth environments
Strong experience with Kubernetes, Helm, and Infrastructure as Code tools such as Terraform or Pulumi
Experience supporting cloud infrastructure environments, preferably AWS
Proficiency in Python or Go for automation, tooling, and infrastructure development
Strong experience with monitoring, observability, and logging platforms such as Prometheus, Grafana, ELK, or equivalent technologies
Experience implementing resilient infrastructure designs focused on scalability, reliability, rollback strategies, and operational safety
Strong understanding of infrastructure tradeoffs involving reliability, cost optimization, deployment velocity, and operational risk
Demonstrated experience leveraging AI-assisted engineering tools and agentic AI workflows within day-to-day development and operational practices
Experience utilizing AI-enabled platforms such as Claude Code, Codex, GitHub Copilot, or similar tools to improve automation, troubleshooting, deployment efficiency, and operational workflows
Familiarity with infrastructure requirements supporting AI/ML environments, including compute scalability, data processing pipelines, model deployment, or GPU-enabled workloads is highly desirable

Required Skills:

Excellent communication and cross-functional collaboration skills
Strong analytical and problem-solving capabilities
Ability to challenge assumptions, identify operational gaps, and recommend innovative infrastructure solutions
Proven ownership mindset with experience leading infrastructure initiatives from concept through production deployment
Strong organizational skills with the ability to prioritize and execute in fast-paced environments
Passion for continuous improvement, emerging technologies, and modern AI-enabled operational practices

Preferred Skills:

Software engineering background with experience building and maintaining production-grade applications, services, libraries, or internal frameworks
Ability to read, troubleshoot, and modify application codebases supporting infrastructure platforms
Experience bridging infrastructure engineering and software development practices
Experience building reusable platform tooling, developer enablement frameworks, or internal infrastructure products
Experience supporting enterprise-scale cloud transformation or modernization initiatives
Exposure to MLOps, AI infrastructure, vector databases, model serving frameworks, or intelligent automation platforms
Experience supporting AI/ML engineering teams through scalable infrastructure and deployment automation

About the Company

Advanced Tech Placement

Resume Resources

Free Resume Templates Free Resume Builder