Hiring: W2 Candidates Only
Visa: Open to any visa type with valid work authorization in the USA
Key Responsibilities:
- Collaborate with the customer team to thoroughly understand the logic, structure, and parameters of the existing Java-based XGBoost models.
- Interpret data transformation logic and validate feature pipelines from the existing Java implementations.
- Execute the Python-converted models on historical datasets and validate output metrics against Java model benchmarks.
- Work closely with model validation teams to review performance, consistency, and explain any metric deviations.
- Design and implement unit tests and validation scenarios to support each migrated model's readiness for signoff.
- Ingest model input data from Parquet files using PySpark and pandas to accurately reproduce training and scoring workflows.
- Conduct Exploratory Data Analysis (EDA) and spot-check row-level predictions where necessary.
Essential Skills:
- 10+ years of hands-on experience with Python for Machine Learning, specifically with libraries like XGBoost, scikit-learn, and NumPy/pandas.
- Proficiency in using PySpark for reading, transforming, and analyzing large datasets stored in Parquet files.
- Extensive experience in validating or reverse engineering ML models from complex business logic or legacy implementations.
- Exposure to Java-based ML libraries (like DL4J) or a strong understanding of how internal model components map across different programming languages.
Desirable Skills:
- Hands-on experience with Python frameworks for meta-modeling libraries.
- Prior experience in a financial or regulated environment.
- Keywords in your resume: Data Scientist, Machine Learning Engineer, XGBoost, Python, PySpark, Java, Model Migration, DL4J, Data Validation, scikit-learn, pandas, NumPy, Metric Parity, Parquet, EDA,