Agile Programming Methodologies, Apache Spark, Centers for Disease Control and Prevention (CDC), Communication Skills, Continuous Deployment/Delivery, Continuous Improvement, Continuous Integration, Data Collection, Data Management, Data Modeling, Data Processing, Data Quality, Data Sets, Database Extract Transform and Load (ETL), Ecosystems, GitHub, Microsoft Windows Azure, Performance Tuning/Optimization, Problem Solving Skills, Python Programming/Scripting Language, Retail, SQL (Structured Query Language), Scalable System Development, Scrum Project Management and Software Development, Source Code/Configuration Management (SCM), Streaming Technology, Structured Data, Technical Writing, Unstructured Data
Overview
We are seeking a highly skilled Databricks Data Engineer to support the Retail Spine initiative, working with a wide range of retail datasets to design, build, and optimize scalable data solutions. This role requires deep hands-on expertise in Databricks and the ability to contribute immediately by developing new pipelines while enhancing existing data workflows.
The ideal candidate brings extensive experience building end-to-end data engineering pipelines with CDC (Change Data Capture) capabilities, and thrives in a fast-paced Azure-based environment leveraging Databricks, PySpark, and Kafka streaming.
Key Responsibilities
Design, build, and optimize end-to-end ETL/ELT pipelines using Databricks and Apache Spark (PySpark).
Develop and maintain scalable data solutions on Microsoft Azure, leveraging Databricks, Delta Lake, and Kafka streaming.
Build robust pipelines with Change Data Capture (CDC) to support incremental data processing and real-time/near real-time data needs.
Work across both new pipeline development and enhancement of existing pipelines, quickly understanding current codebases to deliver improvements.
Integrate and process data from diverse sources including:
Flat files, XML, JSON
SQL Server and JDBC systems
Kafka topics, APIs, MQ
Unity Catalog volumes and other enterprise platforms
Implement data quality frameworks, including writing technical data quality rules and quarantining error records.
Apply canonical data modeling techniques, including SCD Type 1 and Type 2 implementations.
Ensure high levels of data reliability, performance, scalability, and monitoring across all workflows.
Automate workflows using Databricks Workflows and CI/CD pipelines (GitHub), focusing on reuse, efficiency, and standardization.
Adhere to ADUSA architecture, governance, and security standards.
Collaborate with developers, architects, and product owners, and engage with stakeholders, source teams, and downstream consumers as needed.
Participate actively in Agile/Scrum ceremonies and deliver high-quality solutions within defined timelines.
Required Qualifications
Extensive hands-on experience with Databricks, with ability to contribute from day one.
Proven experience building end-to-end data engineering pipelines with CDC capabilities.
Strong proficiency in PySpark and SQL (required); Python experience is a plus.
Experience working in Microsoft Azure cloud environments.
Knowledge of:
Delta Lake
Distributed data processing frameworks
Streaming technologies (Kafka)
Data modeling (SCD Type 1 & Type 2)
Experience integrating data from multiple structured and unstructured data sources.
Familiarity with GitHub and CI/CD pipelines for version control and deployment.
Strong understanding of data architecture, performance optimization, and pipeline scalability.
Success Criteria (First 90 Days)
Demonstrate strong ownership, accountability, and technical expertise.
Quickly ramp up on the Retail Spine ecosystem and contribute to both new and ongoing initiatives.
Deliver high-quality, scalable, and efficient data pipelines within agreed timelines.
Actively participate in Agile ceremonies and team collaboration.
Identify opportunities to enhance data quality, standardization, and pipeline efficiency.
Key Competencies
Proactive and self-driven with strong problem-solving skills
Ability to work across complex, evolving data environments
Strong collaboration and communication with technical and business teams
Focus on quality, scalability, and continuous improvement