AI Systems Engineer

1872 Consulting

Boston, MA

JOB DETAILS
SKILLS
Agile Programming Methodologies, Amazon Web Services (AWS), Application Programming Interface (API), Artificial Intelligence (AI), Artificial Intelligence (AI) Agents, Authentication, Buses, Circuit Breakers, Communication Skills, Concurrency, Construction, Data Science, Distributed Computing, Engineering, GraphQL, Java, Kotlin, Manufacturing Operations, NoSQL, Node.js, Performance Modeling, Product Engineering, Production Systems, Python Programming/Scripting Language, REST (Representational State Transfer), Service-Oriented Architecture (fka Distributed Object Architecture), Simple Queue Service (SQS), Software Engineering, Software as a Service (SaaS), System Architecture, Systems Administration/Management, Systems Engineering, Team Player, Technical Leadership
LOCATION
Boston, MA
POSTED
30+ days ago
AI Systems Engineer
Boston, MA Onsite 4 days per week

Role Summary
Join the AI Studio of an innovative construction industry client in Boston as an AI Systems Engineer, a hybrid role responsible for architecting and building both:
  1. The distributed systems backbone that powers enterprise-scale AI, and
  1. The agentic and LLM-driven capabilities transforming construction workflows
This role sits at the intersection of platform engineering and applied AI. You will design scalable APIs, event-driven services, and reliable infrastructure while also implementing multi-model AI agents, retrieval pipelines, and AI orchestration frameworks that operate in real-world production environments.

You will help define how AI is built, deployed, observed, and scaled across the client's national operations.

Responsibilities

AI & Agentic Systems Product Engineering & Deployment
  • Design and implement production-grade RAG architectures
  • Build and deploy multi-model AI agents leveraging AWS Bedrock and LLM providers (Claude, GPT, Llama, Titan, etc.)
  • Implement dynamic model routing strategies based on task complexity, cost, and latency
  • Develop multi-agent orchestration frameworks enabling collaborative workflows (planner, retriever, executor, summarizer)
  • Design safe tool invocation patterns and guardrails for enterprise AI agents
  • Optimize inference pipelines for cost, performance, and reliability
  • Implement evaluation frameworks to measure model performance, hallucination rates, and response quality
  • Design fallback and degradation strategies for model outages or latency spikes
Distributed Systems & Platform Architecture
  • Architect and evolve service-oriented and event-driven systems supporting AI workloads
  • Design REST/GraphQL APIs with clear versioning, authentication, and backward compatibility strategies
  • Implement asynchronous processing pipelines using queues, event buses, and workflow orchestration
  • Ensure reliability through idempotent consumers, retry strategies, circuit breakers, and dead-letter queues
  • Make informed tradeoffs between relational, NoSQL, and vector storage systems
  • Build services that are observable, traceable, and production-ready
  • Define and document architectural standards for AI platform services
  • Implement LLMOps: cost monitoring, latency optimization, usage analytics, and model versioning
  • Enforce security, governance, and access standards in line with enterprise policies
Collaboration & Technical Leadership
  • Work closely with product managers, site AI engineers, and data scientists to iterate rapidly in Agile sprints
  • Communicate technical progress clearly to non-technical stakeholders; contribute to internal AI playbooks and templates

Qualifications
  • 6+ years of professional software engineering experience (not including vibe coding)
  • Demonstrated experience designing distributed or service-oriented systems in production
  • Strong backend engineering skills in Python, and at least one of Java, NodeJS, Rust or Kotlin
  • Experience building and deploying event-driven architectures (SNS/SQS, Kafka, EventBridge, etc.)
  • Experience integrating LLMs into production systems (Bedrock, OpenAI, Anthropic, etc.).
  • Hands-on experience with RAG pipelines, vector databases and building multi-agent AI systems
  • Deep understanding of:
    • Distributed system failure modes
    • API lifecycle management
    • Concurrency and consistency tradeoffs
    • LLM cost, latency, and reliability constraints
    • Tuning AI Agents for accuracy and performance
Preferred
  • Experience building internal AI platforms or shared infrastructure
  • Exposure to large-scale SaaS or mission-critical systems
  • Experience designing multi-agent or orchestration frameworks
  • Experience with Databricks Lakehouse architecture
  • Prior experience in construction, manufacturing, or operational industries

About the Company

1

1872 Consulting