Required Skills & Experience:
3-5 years of Service Reliability/Operations experience managing large-scale, high-performance applications.
3-5 years of experience in writing automation scripts and building dashboards for application performance monitoring.
2-4 years of experience with programming languages such as Go, Python, Java, or Rust.
Strong experience with databases such as Oracle, PL/SQL, SQL Server, Redis, Clickhouse, Postgres, MongoDB, or time-series databases.
At least 2+ years of experience with platform cloud transitions and containerization (GCP, AWS, Azure, Rancher, OpenShift).
Practical experience maintaining containerized applications in GKE, RKE, AKE environments.
Hands-on experience implementing OpenTelemetry (OTEL) for observability.
Working knowledge of GraphQL frameworks and modern API gateway solutions.
Experience with networking protocols troubleshooting under high-pressure, production environments.
Preferred Skills:
Proven experience managing application availability and building scalable solutions for high-availability platforms.
Experience with in-memory caching solutions (Redis preferred).
Strong knowledge of monitoring tools such as Splunk, AppDynamics, Grafana/Prometheus, and Dynatrace.
Hands-on experience with GCS, Cloud SQL, PL/SQL, Spanner.
Familiarity with AI/ML technologies: Vertex AI, Gen AI, BigQuery.
Ability to monitor and troubleshoot HashiCorp Vault environments.
Exposure to Rally, Confluence, CI/CD extenders, and collaboration tools.
Technical Skills:
Digital: Kubernetes, Site Reliability Engineering (SRE), PL/SQL
Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.