San Francisco, CA30+ days ago
Proficient in PySpark-based distributed data processing; well-versed in Delta Lake, Auto Loader, Structured Streaming, and Delta Live Tables (DLT) to build reliable, high-throughput data pipelines, with additional experience leveraging Databricks SDK and REST APIs for workflow automation, job orchestration, and operational monitoring. Strong working experience with Spark internals and PySpark constructs, including Data Frame APIs, UDFs, window functions, complex joins, and performance profiling, while adhering to best practices for optimization, partitioning, schema evolution, and ACID-compliant Delta Lake writes.