Senior Software Engineer, Cloud Infrastructure

Altruist Corp

Los Angeles, CA

JOB DETAILS
SKILLS
Adverse Events, Amazon CloudFront, Amazon Simple Storage Service (S3), Amazon Web Services (AWS), Apache Kafka, Architectural Services, Artificial Intelligence (AI), Automation, Autoscaling, Bash Scripting, Brokerage, Business Strategy, Capacity Management, Cloud Architecture, Cloud Computing, Code Reviews, Communication Skills, Computer Security, Content Delivery Network (CDN), Continuous Deployment/Delivery, Continuous Integration, Cost Control, Cross-Functional, Cryptography, DNS (Domain Name System), Data Management, Database Technology, Design Patterns Programming Methodologies, DevOps, Disaster Recovery, Engineering, Failover, Financial Services, Financial Transactions, GPU (Graphics Processing Unit), GitHub, High Availability, Incident Response, Information/Data Security (InfoSec), Jenkins, Leadership, Linux Operating System, Load Balancing, Machine Learning, Machine Tool, Management Strategy, Mentoring, Multiplatform/Cross-Platform, Needs Assessment, Network Architecture/Engineering, On Call, Onboarding, Open Source, Operations Processes, PostgreSQL, Process Improvement, Production Support, Programming Tools, Python Programming/Scripting Language, Redis, Regulatory Compliance, Regulatory Requirements, Root Cause Analysis, Scripting (Scripting Languages), Securities and Exchange Commission (SEC), Security Architecture, Security Policy, Simulation, Software Engineering, Standards Development, Strategic Analysis, Systems Administration/Management, Systems Engineering, Systems Scalability, Team Lead/Manager, Technical Leadership, Technical Presentation, Technical Publications, Technical Strategy, Technical Writing, Topology, Usage Analysis, VPN (Virtual Private Network), Vendor/Supplier Evaluation, Vendor/Supplier Planning, Writing Skills
LOCATION
Los Angeles, CA
POSTED
30+ days ago

This role follows a hybrid schedule, with three days per week onsite in our Culver City office.

The opportunity Were hiring a Senior Cloud Infrastructure Engineer to join the Cloud Infrastructure & Platform (CIN) team at Altruist. This is a high-impact, senior individual contributor role responsible for architecting, building, and operating the AWS-based infrastructure that powers our broker-dealer and clearing platform. You will own critical infrastructure domains end-to-end and drive technical decisions that affect the reliability, security, and scalability of systems handling real financial transactions.

As the industry evolves rapidly with generative AI and agentic workflows, we need engineers who combine deep AWS and Kubernetes expertise with the vision and initiative to define how AI/ML tools and patterns can transform infrastructure operations, developer productivity, and platform resilience. This role carries significant technical influence - you will shape infrastructure strategy, lead complex cross-functional initiatives, and raise the bar for engineering practices across the organization.

What sets this role apart

This is not a ticket-driven infrastructure role. At the Senior-to-Staff level, we expect you to:

• Own domains, not just tasks. You will be the technical authority for critical infrastructure areas (e.g., EKS platform, observability strategy, DR architecture, or AI infrastructure) and drive their roadmap. • Define technical direction. Author architectural decision records (ADRs), propose infrastructure standards, and influence engineering-wide technology choices through design reviews and RFCs. • Lead without a title. Drive cross-team initiatives, align stakeholders, unblock other engineers, and represent infrastructure perspectives in leadership discussions and planning. • Multiply the team. Elevate the entire CIN team through mentorship, code reviews, knowledge sharing, and building reusable frameworks and golden paths that scale beyond your individual output. • Bridge infrastructure and business. Translate complex technical trade-offs into clear recommendations for engineering leadership, product teams, and compliance stakeholders.

Your impact

Cloud Infrastructure Architecture & Platform Engineering

• Architect, deploy, and operate production AWS infrastructure supporting high-availability financial services workloads (EKS, MSK, RDS/Aurora PostgreSQL, OpenSearch, ElastiCache, S3, CloudFront, and more). • Own and evolve the Infrastructure as Code (IaC) strategy using Terraform - define module standards, enforce code review practices, and drive adoption of reusable patterns across teams. • Lead Kubernetes (EKS) platform strategy, including cluster upgrades, node group architecture, Helm chart governance, service mesh evolution, and workload autoscaling policies. • Design and drive CI/CD platform improvements (GitHub Actions, ArgoCD, or similar) to enable safe, fast, and self-service deployments for application engineering teams. • Architect and validate disaster recovery (DR) strategies, including cross-region failover designs, backup automation, and leading DR simulation exercises. • Lead infrastructure design reviews and architectural discussions; ensure solutions meet scalability, security, and compliance requirements before implementation.

Reliability, Observability & Operational Excellence

• Define and drive the observability strategy across the platform (Datadog, Prometheus, Grafana, CloudWatch, OpenSearch) - including SLO/SLI frameworks, alerting standards, and distributed tracing. • Serve as a senior on-call escalation point; lead root-cause analysis on critical production incidents and drive systemic improvements through blameless post-mortems. • Own monthly resource saturation reviews and capacity planning processes; proactively identify scaling needs and present findings to engineering leadership. • Drive cloud cost optimization strategy: FinOps practices, Reserved Instances/Savings Plans analysis, vendor spend governance, and accountability frameworks across teams.

Security, Compliance & Networking

• Define and enforce security architecture standards across AWS environments: IAM policy governance, VPC design patterns, encryption strategies, secrets management (Vault, AWS Secrets Manager), and vulnerability remediation workflows. • Partner with Security, Compliance, and Audit teams to ensure infrastructure meets FINRA, SEC, SOC 2, and other regulatory requirements - and proactively identify gaps before they become findings. • Own networking architecture decisions including VPC topology, Transit Gateway strategy, VPN configurations, load balancer patterns (ALB/NLB), CDN optimization (Fastly/CloudFront), and DNS management.

AI/ML Infrastructure & Developer Productivity

• Define the strategy for evaluating, integrating, and governing AI-powered developer tools (e.g., Cursor AI, GitHub Copilot, CodeRabbit) across the engineering organization - including usage analytics, cost optimization, security review, and policy frameworks. • Architect infrastructure for AI/ML workloads: GPU-enabled compute, SageMaker endpoints, Bedrock integration, vector databases, and data pipeline orchestration. • Lead the adoption of AI-driven automation for infrastructure operations - intelligent alerting, anomaly detection, auto-remediation, and AIOps patterns - moving from exploration to production integration. • Build and champion internal platforms and golden paths that leverage generative AI to improve developer experience, reduce operational toil, and accelerate delivery velocity.

Technical Leadership & Collaboration

• Act as a trusted technical advisor to the Director of Engineering and engineering leadership on infrastructure strategy, trade-offs, and investment priorities. • Lead cross-functional initiatives spanning application engineering, data, security, and DevSecOps teams - driving alignment on complex multi-team infrastructure projects. • Represent the CIN team in architecture review boards, incident response leadership, and engineering-wide planning sessions. • Author and maintain comprehensive technical documentation, runbooks, ADRs, and operational procedures that serve as organizational knowledge assets. • Mentor senior and mid-level engineers; conduct thorough design and code reviews; actively contribute to hiring and onboarding processes for the CIN team.

What you bring

5+ years of hands-on experience in cloud infrastructure engineering, with deep, production-proven expertise in AWS. Track record of owning and driving infrastructure initiatives end-to-end - from design and architecture through implementation, rollout, and operational excellence. Expert-level proficiency with Terraform, including module design, state management strategies, and establishing IaC standards for engineering teams. Extensive production experience operating Kubernetes (EKS strongly preferred) at scale - cluster lifecycle management, multi-tenancy patterns, Helm governance, and GitOps workflows. Strong Linux systems engineering skills and advanced scripting proficiency (Python, Bash, or Go). Deep expertise in CI/CD platforms (GitHub Actions, ArgoCD, Jenkins) with experience designing deployment strategies for multi-service architectures. Expert-level understanding of cloud networking (VPC architecture, Transit Gateway, peering, DNS, load balancing) and security (IAM, KMS, WAF, GuardDuty, Secrets Manager). Proven experience designing and operating observability platforms (Datadog, Prometheus/Grafana, CloudWatch, OpenSearch/ELK) at organizational scale. Strong experience with database infrastructure and data layer architecture (Aurora PostgreSQL, RDS, ElastiCache/Redis, OpenSearch, DynamoDB). Demonstrated ability to lead disaster recovery planning, execute DR simulations, and design HA architecture patterns for mission-critical systems. Excellent technical communication skills - ability to write clear ADRs, present to leadership, and translate infrastructure complexity for non-technical stakeholders. Proven track record of mentoring engineers and elevating team capabilities through knowledge sharing, design reviews, and tooling improvements.

Bonus points

7+ years of infrastructure or platform engineering experience, including 3+ years operating at a senior or staff level. Experience in financial services, fintech, broker-dealer, or other heavily regulated industries with FINRA/SEC compliance requirements. AWS certifications at the Professional or Specialty level: Solutions Architect Professional, DevOps Engineer Professional, Security Specialty, or Machine Learning Specialty. Experience with event streaming platforms at scale (Amazon MSK / Apache Kafka) including cluster operations, partition strategy, and consumer group management. Hands-on experience with API gateway management and platform design (Kong, AWS API Gateway). Demonstrated experience with AI/ML infrastructure: provisioning GPU compute, SageMaker/Bedrock integration, vector databases, MLOps pipelines, or AIOps automation in production. Experience defining and executing organizational rollout strategies for AI developer tools, including governance frameworks, usage analytics, and cost management. Proficiency with policy-as-code frameworks (OPA/Rego, Sentinel, Kyverno) for infrastructure governance and compliance automation. FinOps certification or demonstrated experience leading cloud cost optimization programs at scale. Experience authoring RFCs, ADRs, or technical strategy documents that influenced engineering-wide decisions. Contributions to open-source projects, conference talks, or published technical writing.

About the Company

A

Altruist Corp