Full Stack Cloud Engineer

Apex Informatics

Chicago, IL

JOB DETAILS
SKILLS
Best Practices, Capacity Management, Cloud Computing, DNS (Domain Name System), Identify Issues, Incident Management, Incident Response, Network Administration/Management, On Call, Operational Support, Operations Processes, Problem Solving Skills, Root Cause Analysis, Software Configuration Management, Software Engineering, Software Patches
LOCATION
Chicago, IL
POSTED
1 day ago
Title: Full Stack Cloud Engineer

Location: Chocago, IL

Job Description:

Day to day Responsibilities:
  • Support day-to-day operations of an enterprise Kubernetes platform (100+ clusters, ~50% production)
  • Perform routine operational tasks including cluster maintenance, upgrades, patching, health checks, and capacity management
  • Troubleshoot and resolve Kubernetes platform issues impacting cluster or application availability
  • Participate in incident response, root-cause analysis, and post-incident reviews
  • Act as a backup platform engineer to enable on-call rotation and reduce key-person dependency
  • Provide after-hours support as part of a shared on-call rotation
  • Serve as a secondary escalation point for critical Production issues
  • Assist internal application teams with Kubernetes-related questions and issues
  • Support common Kubernetes constructs such as Pods, Deployments, Services, Ingress, ConfigMaps, and Secrets
  • Help teams troubleshoot networking, DNS, ingress, certificate, and resource-related issues
  • Review application configurations for Kubernetes best practices and platform alignment
  • Work with integrated enterprise tools such as:
    o Ingress controllers (e.g., Contour / Envoy)
    o Logging platforms (e.g., Fluent Bit, centralized log aggregation)
    o Monitoring/observability tools (e.g., Dynatrace or similar)
    o Container registries (e.g., Harbor, JFrog, etc)
  • Help document operational procedures, runbooks, and troubleshooting guides
  • Share Kubernetes knowledge and best practices with internal teams

Assist in improving platform resiliency, operational maturity, and supportability

About the Company

A

Apex Informatics