Design, develop, and maintain ETL/ELT pipelines to collect and transform large datasets from various sources.
Build and optimize data architectures and data models to support analytics and reporting needs.
Work closely with data scientists, analysts, and business teams to ensure reliable data availability.
Develop and maintain data lake and data warehouse solutions.
Implement data quality, governance, and security best practices.
Monitor and troubleshoot data pipeline performance and failures.
Integrate APIs and third-party data sources as needed.
Strong programming skills in Python, Scala, or Java.
Hands-on experience with SQL and NoSQL databases (e.g., PostgreSQL, MongoDB).
Experience with big data frameworks such as Apache Spark, Hadoop, or Kafka.
Proficiency in ETL tools (e.g., Apache Airflow, Talend, Informatica, AWS Glue).
Strong experience with cloud platforms - AWS (S3, Redshift, Glue), Azure (Data Factory, Synapse), or GCP (BigQuery, Dataflow).
Familiarity with containerization (Docker/Kubernetes) and CI/CD pipelines.
Strong problem-solving and analytical skills.