Software Engineer

The Judge Group

Concord, CA

JOB DETAILS
SKILLS
Amazon Simple Storage Service (S3), Analysis Skills, Artificial Intelligence (AI), Best Practices, Communication Skills, Consulting, Distributed Computing, Maintain Compliance, Military, MySQL, Network Attached Storage (NAS), Network Performance/Analysis, PostgreSQL, Regulatory Requirements, Relational Databases (RDBMS), Reliability Engineering, Reporting Dashboards, Requirements Management, Software Development, Software Engineering, Splunk, System Architecture, System Operations, Systems Analysis, Systems Reliability, Team Player
LOCATION
Concord, CA
POSTED
1 day ago
Software Engineer / Site Reliability Engineer (SRE)

Location: Concord, CA (1755 Grant St)
Work Model: Hybrid - 3 days onsite (Monday & Tuesday preferred)
Schedule: Start at 7:00 AM PT to coordinate with India-based teams
Employment Type: 12-month contract (with potential extension or conversion)
Line of Business: TCOO
Positions Available: 1

About the Role

In this contingent assignment, you will serve as a senior-level Software Engineer / Site Reliability Engineer responsible for designing and implementing end-to-end monitoring and observability for a high-visibility, enterprise platform. This role focuses on strategy, systems reliability, and operational excellence rather than application development.

You will collaborate closely with engineering, networking, and infrastructure teams to identify system dependencies, define reliability thresholds, and build dashboards and alerts that provide insight into platform health, performance, and data flow. This role requires strong analytical skills, enterprise-scale experience, and the ability to work across teams to drive complex initiatives forward.

Responsibilities
  • Consult on complex, large-scale Software Engineering and SRE initiatives with broad organizational impact.
  • Design and implement end-to-end observability, monitoring, and alerting strategies across distributed systems.
  • Analyze and resolve multi-faceted system reliability and performance challenges, including unprecedented or ambiguous scenarios.
  • Build dashboards and alerts using enterprise monitoring tools to track system health, latency, and data flow.
  • Identify system choke points, latency thresholds, and failure scenarios across application, storage, and network layers.
  • Partner with internal networking, infrastructure, and operations teams to define monitoring requirements and ensure visibility into network traffic.
  • Support and monitor third-party platforms hosted within the enterprise environment, ensuring alignment with internal reliability standards.
  • Ensure compliance with internal policies, procedures, and regulatory requirements while meeting operational deliverables.
  • Strategically collaborate with client stakeholders and provide consultative guidance on system reliability and observability best practices.
Minimum Qualifications
  • 5+ years of experience in Software Engineering, Site Reliability Engineering, or a related technical field, or equivalent practical experience demonstrated through work, consulting, training, military service, or education.
  • Experience with observability and monitoring tools such as Grafana, Splunk, AppDynamics, or ThousandEyes.
  • Hands-on experience analyzing system and network performance in enterprise environments.
  • Experience working with relational databases such as PostgreSQL or MySQL.
  • Experience with object storage solutions such as Amazon S3 or NAS storage.
  • Ability to collaborate across teams and clearly articulate technical requirements and solutions.
Preferred Qualifications
  • Experience with OpenShift (OCP) and Kubernetes containerized platforms.
  • Experience with enterprise-grade monitoring implementations.
  • Familiarity with Skan.AI or similar third-party enterprise platforms.
  • Understanding of distributed system architecture and data flow monitoring.
  • Experience defining observability strategies for newly deployed or rapidly evolving platforms.
Additional Information
  • This is a net-new role intended to add end-to-end SRE expertise to the team.
  • The platform supported includes a third-party system hosted internally, with storage managed via S3 and integrations with Splunk and Grafana.
  • The role will involve monitoring data movement between compute, storage, and network layers, including latency and throughput analysis.
  • Occasional early or overnight meetings may be required to align with global teams; start times will not be earlier than 6:00 AM PT.
Supplier Expectations
  • All resumes must be submitted through Beeline to be considered.
  • No direct solicitation of resumes or communication with the hiring manager while the role is active.

About the Company

T

The Judge Group

The Judge Group Inc., is a leading professional services firm specializing in talent, technology, and learning solutions. We consult, staff, train, and solve. Through our work we make people and organizations better. Our services are successfully delivered through a network of more than 30 offices across the United States, Canada, and India.

The Judge Group is proud to partner with the best and brightest companies in business today, including over 60 of the Fortune 100. We serve organizations in financial services, healthcare, life sciences, insurance, government (including aerospace and defense), manufacturing, and technology and telecommunications. If you would like to learn more about The Judge Group visit www.judge.com or call toll free (800) 360-4474.

COMPANY SIZE
5,000 to 9,999 employees
INDUSTRY
Computer/IT Services
FOUNDED
1970
WEBSITE
https://www.judge.com