AWS Lambda, Amazon Elastic Compute Cloud (EC2), Amazon Web Services (AWS), Analysis Skills, Artificial Intelligence (AI), Authentication, Cargo/Freight, Cloud Computing, Computer Science, Configuration Management, Continuous Deployment/Delivery, Continuous Integration, Cost Control, Cron Job Scheduling, Customer Relations, Data Recovery, DevOps, Disaster Recovery, Docker, Documentation, Ecosystems, Engineering, Equal Employment Opportunity (EEO), Failover, Follow Through, GitHub, Identify Issues, Incident Response, Information Technology & Information Systems, Logistics, Machine Tool, Material Moving, Microsoft Windows Azure, Multiplatform/Cross-Platform, On Call, Operational Support, PCI, Payment Processing, Problem Solving Skills, Product Support, Profit & Loss, Release Management/Engineering, Reporting Dashboards, Risk, Risk Management, Root Cause Analysis, Scripting (Scripting Languages), Secure/SSH File Transfer Protocol (SFTP), Security Architecture, Shipping/Receiving, Software as a Service (SaaS), Standards Development, Telemetry, Test Automation
About PayCargo:
Millions of shipments with goods and materials move around the world daily, by land, sea, or air. PayCargo is the world's leading online payment solution that is revolutionizing the shipping and cargo world. With a fast and efficient way to reduce costs associated with payment processing, we help improve the speed and profitability of our customers' businesses.
PayCargo's platform connects payers and vendors across the cargo and logistics ecosystem, supporting payments, remittance data, integrations, vendor release workflows, and customer-facing digital experiences.
About the Role:
The Senior Engineer, DevOps/Platform Reliability is responsible for building and operating the infrastructure, pipelines, and platform standards that keep PayCargo's global payments platform reliable, observable, and supportable. The role spans the full platform — EC2-based services, scheduled jobs, and file processing alongside containerized (ECS/Fargate) and serverless (Lambda) workloads — across a multi-account AWS environment, Terraform, and a GitHub and ZenHub workflow that ships through GitHub Actions and GitHub OIDC, with a focus on modernizing how PayCargo builds, deploys, and runs software. As one example, PayCargo's SFTP runs on AWS Transfer Family with a Lambda identity provider.
This is a hands-on individual contributor role. The Senior Engineer, DevOps/Platform Reliability modernizes legacy scheduled jobs and file processes into containerized, observable services, codifies infrastructure as repeatable Terraform patterns, and creates standards that other developers can follow without depending on a single person for every implementation. The role requires strong judgment, strong follow-through, and a focus on reducing reactive fire drills and single points of failure.
Working within PayCargo's DevSecOps model, the Senior Engineer, DevOps/Platform Reliability partners closely with Security, Engineering, Architecture, Product, Support, and executive stakeholders to deliver scalable, secure, and repeatable platform execution.
This position has no direct reports. The role leads indirectly by defining infrastructure and deployment standards, guiding engineers toward repeatable patterns, and reducing single points of failure across the platform.
As the Senior Engineer, DevOps/Platform Reliability, you will:
Infrastructure & Platform Modernization
- Modernize legacy scheduled jobs, cron scripts, and file processes into containerized (ECS/Fargate), observable, supportable services
- Build and maintain infrastructure patterns in Terraform, with reusable modules, remote state, and plan/apply through CI
- Standardize environment configuration, secrets management (Secrets Manager and SSM Parameter Store), and repeatable deployment paths across environments and accounts
- Create platform standards that other developers can follow without depending on DevOps for every implementation
CI/CD & Release Engineering
- Build, maintain, and harden CI/CD pipelines integrated with GitHub and ZenHub, with deployments authenticated through GitHub OIDC to eliminate static cloud credentials
- Improve build, test, and deployment automation to make releases faster, safer, and more repeatable
- Establish rollback, promotion, and environment-promotion practices that reduce release risk
- Embed security and quality gates into pipelines in partnership with Security and Engineering
Observability & Reliability
- Implement and maintain monitoring, logging, and alerting using CloudWatch, SNS, and Sentry, with log analytics through Athena and Glue
- Improve telemetry, dashboards, and on-call workflows (PagerDuty) so issues are detected and resolved quickly
- Support disaster recovery, backup, and failover patterns across regions and accounts
- Lead incident response and root cause analysis with clear, durable follow-up
Secure AI Platform Support
- Support the infrastructure for a contained AI platform, including whitelisted egress and approved deployment paths
- Help operationalize controls such as stateless model access and bounded environments in partnership with Security and Architecture
- Build deployment and monitoring patterns for AI-assisted applications so they are observable and supportable
Cross-Functional Partnership
- Partner with Security to embed controls into pipelines, environments, and infrastructure-as-code, including OIDC roles, least privilege, mTLS, and Tailscale-based access
- Work with Engineering and Architecture to translate designs into runnable, supportable infrastructure
- Advise Product and Support on operational realities, trade-offs, and delivery risk
- Implement and operate the infrastructure, pipelines, and environments according to the standards and architecture owned by the VP, Infrastructure & Security
- Provide clear status, escalate risks early, and document infrastructure, pipelines, and runbooks
Required Qualifications:
- 5+ years of hands-on DevOps, platform, or infrastructure engineering experience preferred
- Strong experience with AWS (ECS/Fargate, Lambda, VPC, IAM), and working knowledge of Azure or Entra ID
- Hands-on experience with infrastructure-as-code using Terraform, including reusable modules, remote state, and plan/apply in CI
- Strong experience with Docker and container orchestration such as ECS/Fargate and ECR
- Experience building and maintaining CI/CD pipelines, preferably with GitHub Actions, including OIDC-based cloud authentication
- Experience with monitoring and observability tooling such as CloudWatch, SNS, Sentry, and Athena/Glue
- Strong understanding of secrets management (Secrets Manager, SSM Parameter Store), environment configuration, and secure deployment
- Strong troubleshooting, incident response, and root cause analysis skills
- Ability to create repeatable standards and documentation that reduce single points of failure
Experience and Education:
- Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent practical experience
- 5+ years of hands-on DevOps, platform, or infrastructure engineering experience preferred
- Demonstrated experience operating production infrastructure and CI/CD in cloud environments
- Experience with containerization, infrastructure-as-code, and observability tooling
- Payments, fintech, SaaS, or logistics experience is a plus
Preferred Qualifications:
- Experience modernizing legacy scheduled jobs and file-processing workloads into containerized services
- Experience operating both EC2-based services and containerized or serverless workloads
- Experience with disaster recovery, multi-region (us-east-1 / ap-east-1) redundancy, and failover design
- Familiarity with secure AI/LLM platform patterns, whitelisted egress, and bounded environments
- Experience with on-call workflows and tooling such as PagerDuty
- Familiarity with zero-trust network access (Tailscale) and SSM Session Manager in place of bastion hosts
- Experience in payments, fintech, SaaS, or other high-volume transactional environments
- Familiarity with SOC and PCI control requirements as they relate to infrastructure
You Will Likely Succeed If:
- Have a winning attitude
- Are naturally curious with an always-learning mentality
- Love to automate, standardize, and remove toil
- Love to solve difficult problems
- Are assertive, confident, but also humble
- Speak with clarity and listen with intention
- Are disciplined with your processes, documentation, and follow-up
- Can own a problem end to end without constant direction
- Take ownership of both the technical outcome and the business result
What Success Looks Like:
- Legacy jobs and file processes are modernized into containerized, observable, supportable services
- Infrastructure is codified, repeatable, and consistent across environments
- CI/CD pipelines make releases faster, safer, and lower-risk
- Monitoring, alerting, and disaster recovery reduce reactive fire drills
- Other developers can follow platform standards without depending on a single person
- The Senior Engineer - DevOps / Platform Reliability becomes a trusted owner of one or more critical platform domains within 90 to 180 days
What We Offer:
Our compensation package includes a competitive salary and bonus plan.
We care about your wellbeing and personal life. We offer vacation, sick, personal time off policies, a generous 401K match, and strong healthcare benefits.
Your success at PayCargo is determined by the impact that you are making, and how well you collaborate with the various teams that you interact with. Everyone at PayCargo is empowered to take ownership to learn, self-improve, and master their skills in an environment focused on efficiency, collaboration, and purpose.
We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, marital status, disability, gender, gender identity or expression, or veteran status. We are proud to be an equal opportunity employer.