Analysis Skills, Budgeting, Communication Skills, Deep Learning, English Language, Establish Priorities, Machine Translation, Mentoring, Metrics, Multilingual, Natural Language Processing (NLP), Process Improvement, Production Control, Production Systems, Python Programming/Scripting Language, Quality Metrics, Quality Monitoring, Research & Development (R&D), Technical Leadership, Technical Research, Use Cases
About the Role
Language Translation is one of Sanas's most exciting and fastest-growing product lines. We're looking for a Research Engineer who can both set technical direction and get deep in the modeling work — someone who owns translation quality end-to-end across language pairs and drives the fundamental research challenges unique to real-time simultaneous interpretation.
Job Description
Translation quality & modeling
- Own and drive improvements to translation accuracy across Sanas's supported language pairs, with a focus on conversational, spoken-language domains.
- Design, train, and evaluate neural MT models — from fine-tuning large multilingual models to building targeted components for low-resource or high-priority language pairs.
- Develop and maintain rigorous evaluation pipelines using both automated metrics (BLEU, COMET, chrF) and human evaluation frameworks calibrated to real-world enterprise use cases.
- Identify the highest-leverage research bets — data augmentation, domain adaptation, quality estimation, terminology consistency — and execute on them with measurable quality gains.
Simultaneous interpretation & delimiter modeling
- Lead research and development of Sanas's delimiter model — the component that determines optimal segmentation points in streaming speech for real-time translation output.
- Develop methods to handle speech disfluencies, sentence fragments, and incomplete utterances gracefully in a streaming translation pipeline.
- Collaborate closely with the speech and inference engineering teams to ensure translation components meet strict real-time latency budgets in production.
Research direction & technical leadership
- Define and maintain a research roadmap for MT and simultaneous interpretation, prioritizing work that moves production quality metrics.
- Stay at the frontier of MT research — track and evaluate relevant work — and translate (haha) relevant advances into practical improvements at Sanas.
- Mentor and technically guide other engineers working on translation-adjacent problems across the ML org.
Data & infrastructure
- Identify, source, and curate training data for MT and delimiter modeling — including parallel corpora, synthetic data generation, and speech-aware augmentation strategies.
- Instrument model quality monitoring in production to detect degradation across language pairs and trigger targeted retraining cycles.
Qualifications
- 3+ years of experience in machine translation, NLP, or multilingual modeling research — with a track record of measurable quality improvements in production systems.
- Deep familiarity with neural MT architectures: sequence-to-sequence models, Transformer variants, and large multilingual models.
- Hands-on experience with simultaneous or streaming translation, including segmentation and low-latency decoding strategies.
- Strong command of MT evaluation methodology — automated metrics, human evaluation design, and error analysis.
- Proficiency in Python and deep learning frameworks (PyTorch preferred)
- Demonstrated ability to set a research agenda, execute independently, and communicate findings clearly to technical and non-technical stakeholders.
- Fluency in English plus working proficiency in at least one non-English language is a strong plus.
Bonus
- Experience with speech translation (end-to-end or cascaded) and speech-aware MT pipelines.
- Familiarity with on-device or edge-optimized model deployment for low-latency inference.
- Prior work on low-resource language pairs, domain adaptation, or terminology-constrained translation.
- Published research at ACL, EMNLP, NAACL, INTERSPEECH, or equivalent venues.