Apache, Apache Spark, Best Practices, Big Data, Computer Science, Continuous Deployment/Delivery, Continuous Integration, Data Management, Data Processing, Data Quality, Data Sets, DevOps, Electronic Medical Records, SQL (Structured Query Language), Scalable System Development, Source Code/Configuration Management (SCM), Systems Engineering
Role: Senior Data Engineer
Remote (US) - Anywhere in the US (EST and CST preferred)
Job Type: W2 Contract
Note: Visa Independent candidates are highly preferred (Third party or C2C is not accepted)
Required Skills (Primary):
Hadoop, Apache Spark, Apache Airflow, CI/CD, Kafka
Short Overview of the Role:
Data Engineer with 6 10 years of experience in building and maintaining scalable, high-performance data pipelines and processing frameworks.
The ideal candidate will have strong hands-on expertise in orchestration using Apache Airflow and distributed data processing with Apache Spark (EMR).
This role requires a solid understanding of big data architecture, data engineering best practices, and a commitment to delivering efficient, reliable, and maintainable data solutions that align with business and technical requirements.
Qualifications Required:
Bachelor's degree in Computer Science, Information Systems, Engineering, or a related field.
6 10 years of experience in data engineering or related roles.
Strong experience with Apache Airflow for data orchestration and workflow management.
Proven expertise in building and tuning distributed data processing applications using Apache Spark (PySpark), including both Structured Streaming and Batch.
Experience with AWS cloud-based data ecosystems, particularly Athena and Redshift.
Proficient in SQL (Athena version) and experienced in working with large datasets from various sources (structured and unstructured).
Experience with data lakes, data warehouses, and batch/streaming data architectures.
Familiarity with CI/CD pipelines, version control, and DevOps practices in a data engineering context.
Preferred Qualifications:
Experience working in Agile/Scrum environments.
Knowledge of data quality frameworks and validation engines.
Experience with data catalog tools.