Senior Java / Platform Reliability Engineer

Interon IT Solutions

Malvern, PA

JOB DETAILS
SALARY
$55–$60
SKILLS
Amazon Web Services (AWS), Application Programming Interface (API), Automation, Best Practices, Cloud Applications, Cloud Architecture, Cloud Computing, Cross-Functional, Docker, Engineering, Identify Issues, Java, Machine Tool, Messaging Technology, Metrics, Microservices, Problem Solving Skills, Production Support, Production Systems, Python Programming/Scripting Language, Reliability Engineering, Reporting Dashboards, Root Cause Analysis, Scalable System Development, Scripting (Scripting Languages), Software Administration, Software Engineering, Splunk, System Architecture, Systems Reliability, Systems Scalability, Telemetry
LOCATION
Malvern, PA
POSTED
1 day ago
#W2 Role
Job Title: Senior Java / Platform Reliability Engineer
Location: Malvern, PA
Duration: Long-term Contract
Job Description
We are seeking a strong Senior 10+ Years of Java / Platform Reliability Engineer with solid software engineering experience and a strong understanding of production reliability. This role is not a traditional operations-focused SRE position. The ideal candidate should be a hands-on backend engineer who can design, build, and support resilient, scalable, and fault-tolerant applications in a cloud environment.
The role will focus on backend platform engineering, reliability improvements, cloud integration, observability, automation, and supporting production systems. The candidate should have strong experience in Java development, along with working knowledge of Python, AWS, APIs, microservices, and cloud-native application support.
Key Responsibilities
Design, develop, and enhance backend services using Java and related backend technologies.
Build reliable, scalable, and fault-tolerant applications that can operate in production at scale.
Work closely with platform, application, and infrastructure teams to improve system reliability and performance.
Support production systems by identifying reliability gaps, performance issues, and areas for automation.
Develop and maintain microservices, APIs, and backend integrations.
Work with AWS cloud services to support application deployment, monitoring, and platform improvements.
Use Python or scripting where needed for automation, tooling, and reliability engineering tasks.
Contribute to incident analysis, root cause reviews, and long-term preventive solutions.
Improve observability through logging, metrics, tracing, dashboards, and alerting.
Collaborate with engineering teams to implement best practices around resiliency, scalability, and availability.
Participate in design discussions for backend systems, cloud architecture, and platform reliability.
Help modernize and improve existing applications with better monitoring, automation, and fault tolerance.
Required Skills
10+ years of overall software engineering experience.
Strong hands-on experience with Java development.
Experience building and supporting backend services, APIs, and microservices.
Good experience with AWS/cloud technologies.
Working knowledge of Python for scripting, automation, or backend support.
Strong understanding of production reliability, scalability, and system performance.
Experience operating applications in production environments.
Knowledge of resilient application design, fault tolerance, retries, timeouts, failover, and recovery patterns.
Experience with CI/CD pipelines and modern software delivery practices.
Strong troubleshooting and problem-solving skills.
Ability to work with cross-functional teams including development, platform, cloud, and infrastructure teams.
Preferred Skills
Experience with telemetry and observability tools such as OpenTelemetry, Splunk, Datadog, CloudWatch, Grafana, Prometheus, or similar tools.
Experience with distributed tracing, logging, metrics, and alerting.
Knowledge of container platforms such as Docker and Kubernetes.
Experience with infrastructure automation or platform engineering tools.
Exposure to event-driven architecture or messaging systems.
Prior experience in a Site Reliability Engineering, Platform Engineering, or Production Engineering role.

About the Company

I

Interon IT Solutions