USA_Developer

Varite, Inc

O Fallon, MO

Apply

JOB DETAILS

SALARY

$46.42–$47.79 Per Hour

SKILLS

Apache, Apache Kafka, Apache Spark, Best Practices, Big Data, CPU (Central Processing Unit), Capacity Management, Code Reviews, Coding Standards, Computer Science, Continuous Deployment/Delivery, Continuous Integration, Data Formats, Data Management, Data Processing, Data Quality, Database Extract Transform and Load (ETL), Distributed Computing, Distributed Objects, Git, Input/Output, Java, Linux Operating System, Machine Tool, Memory Hardware, Metadata, Operating Systems, Performance Tuning/Optimization, Production Support, Production Systems, Python Programming/Scripting Language, Reconciliation, Root Cause Analysis, SQL (Structured Query Language), Scala Programming Language, Service Level Agreement (SLA), Source Code/Configuration Management (SCM), Structured Data, Technical Writing, Test Automation, Test Data, Unix Shell Programming

LOCATION

O Fallon, MO

POSTED

28 days ago

Pay Rate Range: $46.42 - 47.79/hr.
GBaMS ReqID: 10733699

Job Description:
Big Data Developer with Spark Scala
"•Languages: Scala, Python (PySpark), SQL
•Big Data: Apache Spark (Core, SQL, Structured Streaming)
•Streaming: Kafka
•Ingestion / Orchestration: Apache NiFi
•Storage: Apache Ozone, Ceph, object storage concepts
•OS & Tooling: Linux, Git, CI/CD, monitoring and logging tools

"We are looking for a highly skilled Senior Data Engineer with deep expertise in Apache Spark, Scala, and PySpark to build and operate large scale batch and streaming data processing systems. The role has a strong emphasis on real time streaming architectures using Kafka and Spark Structured Streaming, alongside ingestion and orchestration with Apache NiFi and scalable storage using Apache Ozone and Ceph. This position is ideal for engineers who enjoy solving complex performance, scalability, latency, and reliability challenges in production data platforms.

Key Responsibilities
•Design, develop, and maintain large scale Spark applications using Scala and PySpark
•Build and operate streaming heavy data pipelines using Kafka and Spark Structured Streaming
•Implement stateful streaming patterns including windowing, watermarking, late data handling, and checkpointing
•Develop robust event replay and reprocessing workflows using Kafka offsets and partitions
•Build ingestion and routing flows using Apache NiFi, including Kafka based ingestion patterns
•Implement end to end ETL/ELT pipelines with strong emphasis on low latency, fault tolerance, and scalability
•Optimize Spark jobs through partitioning strategies, memory tuning, shuffle optimization, and efficient data formats
•Integrate Spark workloads with distributed object storage systems such as Apache Ozone and Ceph
•Ensure data quality, consistency, and auditability through validation, reconciliation, and metadata capture
•Collaborate with platform, infrastructure, and operations teams on production readiness and capacity planning
•Support production systems, including monitoring, incident analysis, and root cause resolution
•Contribute to reusable frameworks, coding standards, and engineering best practices
•Participate in architecture reviews, code reviews, and technical documentation

Required Qualifications
•Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
•Strong hands on experience with Apache Spark in production environments
•Advanced proficiency in Scala and PySpark
•Solid understanding of distributed systems and data processing at scale
•Strong experience with Kafka based streaming architectures
•Hands on experience with Spark Structured Streaming
•Experience building batch and real time pipelines
•Hands on experience with Apache NiFi for data ingestion and flow management
•Strong SQL skills and experience working with structured and semi structured data
•Experience working with object storage or distributed storage platforms
•Proficiency with Linux, shell scripting, and Git based version control

Preferred Qualifications
•Experience with Apache Ozone and/or Ceph as storage backends for analytics workloads
•Experience implementing exactly once / at least once streaming semantics
•Strong background in Spark performance tuning (CPU, memory, I/O, shuffle)
•Experience supporting mission critical production systems with strict SLAs
•Familiarity with CI/CD pipelines and automated testing for data applications
Experience designing observability for streaming systems (lag, throughput, backpressure)"

"•Languages: Scala, Python (PySpark), SQL
•Big Data: Apache Spark (Core, SQL, Structured Streaming)
•Streaming: Kafka
•Ingestion / Orchestration: Apache NiFi
•Storage: Apache Ozone, Ceph, object storage concepts
•OS & Tooling: Linux, Git, CI/CD, monitoring and logging tools
•Apache Airflow, Apache NiFi.
Programming: Java (Core), Python (for Airflow), Unix Shell Scripting.
•Big Data/Storage: Apache Spark"

Essential Skills: Big Data Developer

Keyword: ~Big Data Developer~

Skills: Digital : BigData and Hadoop Ecosystems~AWS DevOps and Automation

Experience Required: 6-8 years

Skills:

Category	Name	Required	Importance	Experience
SkillCategoryTest1_MN	Digital : BigData and Hadoop Ecosystems	Yes	1	>7 years

About the Company

Varite, Inc

Resume Resources

Free Resume Templates Free Resume Builder

USA_Developer

Varite, Inc

O Fallon, MO

About the Company

Varite, Inc

Resume Resources

Similar Job Searches