Amazon Web Services (AWS), Analysis Skills, Apiary/Beekeeping, Big Data, Cloud Computing, Computer Science, Cross-Functional, Data Management, Data Processing, Data Sets, DevOps, GCP (Good Clinical Practices), Jenkins, Machine Learning, Mathematics, Microsoft Windows Azure, Oracle SQLPlus, Performance Analysis, Problem Solving Skills, Python Programming/Scripting Language, Requirements Management, SQL (Structured Query Language), Scala Programming Language
Job Title
Mandatory Skills: Machine learning + Spark/Hive/SQL + Python, Scala, SQL PySpark, Kafka, use of scheduling tools, Devops using Jenkins
Key Responsibilities
- Develop and implement data pipelines and Client Pipelines to facilitate model inference (both Real-time and Batch)
- Analyze large, complex data sets to identify the most performant way to process large volume data using Spark, Hive, and SQL
- Collaborate with cross-functional teams to gather requirements and design scalable solutions
- Work on deployment of machine learning models
- Monitor the performance of data pipelines and make improvements as necessary
- Stay up to date with the latest advances in big data processing
- Productionalize time-series and regression real-time models
Qualifications
Bachelor's degree in Computer Science, Mathematics, or a related field
Strong experience in Spark/Hive/SQL, including hands-on experience building and deploying large volume data pipelines
Proficiency in Python, Scala, SQL PySpark, Kafka, use of scheduling tools, Devops using Jenkins.
Excellent problem-solving and critical thinking skills
Experience with cloud platforms (AWS, GCP, or Azure) is a plus