Databricks Architect
Must Have Technical/Functional Skills
Experience: 5+ years of hands-on data engineering experience, with at least 3 years focused on the Databricks/Spark
Ecosystem
Databricks Expertise: Deep, hands-on expertise with the Databricks Lakehouse Platform, including Delta Lake,
Structured Streaming, Delta Live Tables, and cluster configuration/optimization.
Programming Mastery: Expert-level proficiency in Python and PySpark. Advanced SQL skills are essential.
Data Warehousing Concepts: Strong understanding of data modeling principles, including dimensional modeling
(Kimball), data warehousing concepts, and ETL/ELT design patterns.
Cloud Proficiency: Proven experience working with a major cloud provider (Azure, AWS, or GCP), particularly with
data storage S3 and related services.
Software Engineering Mindset: Experience with software engineering best practices, including version control (Git),
code reviews, testing, and CI/CD.
Roles and Responsibilities
Data Pipeline Development: Design, code, and deploy robust and scalable batch and streaming data pipelines
using PySpark, Spark SQL, and Delta Live Tables to ingest data from sources such as Point-of-Sale (POS), e-commerce
platforms, loyalty systems, and marketing clouds.
Data Modeling and Transformation: Implement complex data transformations and business logic within the Medallion
architecture (Bronze, Silver, Gold layers). Build and optimize the final "Gold" customer-dimension tables that will
serve as the single source of truth.
Data Quality: Implement data quality frameworks and cleansing routines to ensure the accuracy and trustworthiness
of the Customer 360 data.
Performance Optimization: Proactively monitor, debug, and tune Databricks jobs and Spark clusters for performance
and cost-efficiency. Implement best practices for partitioning, caching, and data layout in Delta Lake.
Infrastructure as Code (IaC) & CI/CD: Work with DevOps teams to manage Databricks environments, clusters, and
job deployments using tools like Terraform and AWS DevOps/GitHub Actions. Champion and implement CI/CD best
practices for data pipelines.
Data Governance and Security: Implement data governance features within Databricks Unity Catalog, including
data lineage tracking, access controls, and data masking to ensure compliance and security.
Collaboration: Partner closely with Functional Consultants, Data Scientists, and Analytics Engineers to understand
their data requirements and deliver well-structured, consumption-ready datasets.
Education
Bachelors
Salary Range: $120000 - $150000 a year