ARM (Advanced RISC Machine), Architectural Services, Automation, Cloud Architecture, Cloud Computing, Continuous Deployment/Delivery, Continuous Integration, Cost Control, DNS (Domain Name System), DevOps, Disaster Recovery, Distributed Computing, Failover, GitHub, High Availability, Identity Data Management, Incident Management, Incident Response, Maintain Compliance, Metrics, Microsoft Product Family, Microsoft Windows Azure, Network Security, Policy Implementation, Python Programming/Scripting Language, Registered Training Organisation (RTO), Regulatory Compliance, Reliability Engineering, Scripting (Scripting Languages), Software Engineering, Systems Reliability, Test Tools, VMS Operating System, Validation Testing, Virtual Machine (VM), Windows PowerShell
Senior Azure Platform Engineer
Role Overview
We are seeking a Senior Azure Platform & Resiliency Engineer to design, build, and operate our cloud foundation in Microsoft Azure. This role combines cloud architecture, infrastructure engineering, and site reliability principles to ensure our platform is secure, scalable, and resilient by design.
You will lead the development of Azure landing zones, infrastructure as code (IaC), and reliability patterns, enabling engineering teams to deploy and operate workloads consistently and safely.
This is a hands-on engineering role, not a pure operations or oversight position.
Key Responsibilities
Azure Platform Architecture
- Design and implement Azure Landing Zones aligned with Microsoft Cloud Adoption Framework (CAF)
- Define and manage:
- Management group and subscription strategy
- RBAC and identity models
- Azure Policy and governance controls
- Build and maintain shared services (networking, logging, identity)
Infrastructure as Code (IaC)
- Develop and maintain infrastructure using Terraform (preferred), Bicep, or ARM
- Create reusable modules for:
- Networking (hub/spoke, private endpoints)
- Compute (VMs, VMSS, AKS)
- Storage and platform services
- Integrate IaC into CI/CD pipelines (GitHub Actions or Azure DevOps)
Resiliency & Reliability Engineering
- Design high availability and multi-region architectures
- Define and implement disaster recovery (DR) strategies (RTO/RPO)
- Conduct failover testing and resilience validation
- Establish and track SLIs/SLOs and reliability metrics
Serverless & Platform Services
- Design and support event-driven and serverless architectures:
- Azure Functions
- Logic Apps
- Event Grid / Service Bus
- Ensure scalability, fault tolerance, and observability of distributed systems
Observability & Operations
- Implement monitoring and logging using:
- Azure Monitor
- Log Analytics
- Application Insights
- Define alerting strategies and reduce noise through meaningful thresholds
- Support incident response and lead post-incident reviews
Security & Compliance
- Partner with security teams to implement:
- Identity-first architecture (Entra ID, Managed Identities)
- Network security (NSGs, Private Endpoints, Zero Trust patterns)
- Secrets management (Azure Key Vault)
- Ensure infrastructure meets compliance and audit requirements
Required Qualifications
- 6+ years of experience in cloud infrastructure, DevOps, or SRE roles
- Strong hands-on experience with Microsoft Azure architecture and services
- Proven experience designing and implementing Azure landing zones
- Deep expertise in Terraform (or equivalent IaC tools)
- Strong understanding of Azure networking (VNet, peering, DNS, private access)
- Experience with CI/CD pipelines (GitHub Actions, Azure DevOps)
- Proficiency in scripting (Python, PowerShell, or similar)
Preferred Qualifications
- Experience with Kubernetes / AKS
- Familiarity with Azure Front Door, Traffic Manager, or global routing
- Experience implementing Azure Policy at scale
- Exposure to chaos engineering or resilience testing tools
- Understanding of FinOps / cloud cost optimization
Certifications (Preferred)
- Microsoft Certified: Azure Solutions Architect Expert (AZ-305)
- Microsoft Certified: Azure Administrator Associate (AZ-104)
- HashiCorp Certified: Terraform Associate
What Success Looks Like
- A well-architected, governed Azure platform that supports multiple teams and environments
- Infrastructure fully defined and deployed via code
- Systems designed for high availability and rapid recovery
- Clear observability and actionable alerting across all services
- Reduced operational toil through automation
Interview Focus Areas
Candidates should be prepared to discuss:
- Designing an Azure landing zone for a multi-team organization
- Structuring Terraform for scalability and governance
- Multi-region architecture and failover strategies
- Real-world incident response and system reliability improvements
G
Georgia Systems Operations