Site Reliability Engineer - New York (Remote)

Georgia Tek Systems

NY, NY(remote)

JOB DETAILS
SKILLS
Agile Programming Methodologies, Amazon Web Services (AWS), Ansible, Bash Scripting, Cloud Applications, Cloud Computing, Communication Skills, Continuous Deployment/Delivery, Continuous Integration, DevOps, Distributed Computing, Docker, Enterprise Applications, GCP (Good Clinical Practices), High Availability, Identify Issues, Incident Management, Incident Response, Jenkins, Linux Administration, Microservices, Microsoft Windows Azure, On Call, Operations Processes, Performance Analysis, Performance Tuning/Optimization, Problem Solving Skills, Production Support, Python Programming/Scripting Language, Reliability Engineering, Reporting Dashboards, Root Cause Analysis, Scripting (Scripting Languages), Scrum Project Management and Software Development, Splunk, Systems Reliability, Technical Writing, United States Department of Energy (DOE), Unix Shell Programming, Unix System Administration
LOCATION
NY, NY
POSTED
4 days ago
Job Title: Site Reliability Engineer (SRE) – Dynatrace
Location: New York (Remote)
Experience: 6–10 Years
Rate: DOE
Job Description
We are looking for a highly skilled Site Reliability Engineer (SRE) with strong expertise in Dynatrace monitoring and observability solutions. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of enterprise applications and infrastructure across cloud and on-prem environments.
The candidate should have hands-on experience with monitoring, automation, troubleshooting, cloud platforms, and modern DevOps practices.
Key Responsibilities
  • Design, implement, and maintain end-to-end monitoring solutions using Dynatrace.
  • Configure dashboards, alerts, problem detection rules, and observability frameworks.
  • Monitor application performance, infrastructure health, and distributed systems.
  • Troubleshoot production issues and perform root cause analysis to improve system reliability.
  • Work closely with DevOps, Cloud, and Application teams to optimize system performance.
  • Automate operational tasks using scripting languages such as Python, Bash, or Shell.
  • Support and manage containerized environments using Docker and Kubernetes.
  • Implement and maintain CI/CD pipelines using tools like Jenkins, GitLab CI/CD, or Azure DevOps.
  • Ensure high availability, scalability, and resiliency of systems and services.
  • Participate in incident response, on-call rotations, and performance tuning activities.
  • Create and maintain technical documentation, runbooks, and operational procedures.
Required Skills & Qualifications
  • 6–10 years of experience in Site Reliability Engineering, DevOps, or Production Support roles.
  • Strong hands-on expertise with Dynatrace including monitoring, alerting, dashboards, and problem analysis.
  • Solid understanding of observability, logging, monitoring frameworks, and APM tools.
  • Experience working with cloud platforms such as AWS, Azure, or GCP.
  • Strong knowledge of Linux/Unix administration and troubleshooting.
  • Experience with Docker, Kubernetes, and container orchestration.
  • Hands-on experience with CI/CD tools including Jenkins, GitLab, or Azure DevOps.
  • Strong scripting and automation skills using Python, Bash, or Shell scripting.
  • Good understanding of microservices architecture and distributed systems.
  • Experience with incident management, root cause analysis, and system performance optimization.
  • Excellent communication and problem-solving skills.
Preferred Qualifications
  • Experience with Infrastructure as Code tools such as Terraform or Ansible.
  • Exposure to logging tools like Splunk, ELK Stack, or Grafana.
  • Knowledge of Agile/Scrum methodologies.
  • Relevant certifications in Cloud, Kubernetes, or Dynatrace are a plus.

About the Company

G

Georgia Tek Systems