Senior RabbitMQ Engineer / SME

PamTen Inc

(remote)

JOB DETAILS
SKILLS
Ansible, Automation, Best Practices, Brokerage, Cloud Computing, Coaching, Communication Skills, Configuration Management, Distributed Computing, Documentation, English Language, Establish Priorities, Failover, Gap Analysis, High Throughput, Identify Issues, Incident Management, Memory Hardware, Messaging Technology, Microsoft BizTalk Server, Microsoft Windows Azure, On Call, Performance Tuning/Optimization, Product Support, RabbitMQ, Risk, Root Cause Analysis, Software as a Service (SaaS), Technical Leadership, Technical Support, Topology, VMS Operating System, Virtual Machine (VM)
POSTED
30+ days ago
Role Summary
We need a Senior RabbitMQ Engineer (SME), located in USA (eastern side preferred) to support one of our customers. This is a hands-on, staff-augmentation role that combines architecture, technical leadership, and execution. The resource will act as the single-threaded RabbitMQ authority assessing the current platform, stabilizing it quickly, and guiding the customer toward a supported, resilient RabbitMQ posture in Azure (VMs and/or AKS). The engagement would be for 6 months.

Primary outcome: Stabilize RabbitMQ by April / early May and deliver a clear modernization recommendation (AKS vs VM-based vs SaaS) with a practical execution path.

Engagement Context
  • Customer is targeting Azure (directionally aligned, not fully confirmed)
  • RabbitMQ supports multiple product teams; each team owns its vhosts/tenants
  • Platform has tech debt (notably DR process not functioning as expected)
  • Current deployment includes 3-node clusters across Dev / Non-Prod / Prod
  • Configuration is Ansible cookbook-driven, but is bespoke per product team
  • Peak business load window: November

Responsibilities
1) Assessment & Stabilization (Immediate)
  • Perform current-state review: topology, broker configuration, policies, queue types, client connection patterns, resource thresholds.
  • Identify reliability/performance risks and execute prioritized remediation.
  • Establish "good” operational standards: monitoring, alerting, runbooks, on-call readiness.
2) Architecture & Technical Direction
  • Define target-state options and tradeoffs: Azure VMs vs AKS vs SaaS.
  • Provide an upgrade strategy to a supported RabbitMQ version (sequencing, rollout, rollback).
  • Recommend best practices for multi-tenant RabbitMQ (vhosts, permissions, policy boundaries).
3) DR / Resiliency Improvements
  • Diagnose why DR isn't working; propose and implement pragmatic recovery posture aligned to business requirements.
  • Validate failover/recovery procedures through testing and documentation.
4) Platform Enablement & Standardization
  • Improve maintainability of Ansible-based configuration and reduce bespoke patterns.
  • Create/tune reusable "gold standard” patterns for vhost provisioning, policies, and operational controls.
  • Coach customer engineers; transfer knowledge and operational ownership.

Required Skills & Experience (Must-Have)
  • 7–10+ years in distributed systems / messaging platforms; expert-level RabbitMQ in production.
  • Strong experience with:
    • clustering and HA patterns (quorum queues / mirrored strategies where applicable)
    • performance tuning (memory watermarks, disk alarms, flow control, channel/connection behaviors)
    • upgrades and lifecycle management (zero/minimal downtime approaches, rollback planning)
    • incident triage and root cause analysis in high-throughput environments
  • Azure operational experience (networking, VM patterns; AKS familiarity strongly preferred)
  • Hands-on automation experience (Ansible or similar IaC/config management)
  • Ability to operate as a technical lead: clear decision-making, documentation, stakeholder comms.

Preferred / Nice-to-Have
  • Designing DR for messaging in cloud (active/passive and/or multi-region approaches)
  • Experience integrating messaging with enterprise integration stacks (e.g., BizTalk patterns)

Deliverables
  • Current-state assessment + prioritized stabilization plan
  • Implemented stability improvements (config/tuning/operational guardrails)
  • Supported version upgrade plan (and execution, if in-scope)
  • DR gap analysis + implemented/tested recovery procedures
  • AKS vs VM vs SaaS recommendation with risk/effort tradeoffs
  • Standardized configuration approach for vhosts/policies + documentation/runbooks

Candidate Profile
  • "Player/Coach”: can architect and still get hands dirty fast.
  • Strong executive communication: can explain tradeoffs and risk in plain English.
  • Bias for practical outcomes: stabilize first, modernize second, document always.

About the Company

P

PamTen Inc