Research Engineer (Machine Translation)

Sanas

Palo Alto, California

JOB DETAILS
SKILLS
Analysis Skills, Budgeting, Communication Skills, Deep Learning, English Language, Establish Priorities, Machine Translation, Mentoring, Metrics, Multilingual, Natural Language Processing (NLP), Process Improvement, Production Control, Production Systems, Python Programming/Scripting Language, Quality Metrics, Quality Monitoring, Research & Development (R&D), Technical Leadership, Technical Research, Use Cases
LOCATION
Palo Alto, California
POSTED
30+ days ago


About the Role

Language Translation is one of Sanas's most exciting and fastest-growing product lines. We're looking for a Research Engineer who can both set technical direction and get deep in the modeling work — someone who owns translation quality end-to-end across language pairs and drives the fundamental research challenges unique to real-time simultaneous interpretation.

Job Description

Translation quality & modeling

  • Own and drive improvements to translation accuracy across Sanas's supported language pairs, with a focus on conversational, spoken-language domains.
  • Design, train, and evaluate neural MT models — from fine-tuning large multilingual models to building targeted components for low-resource or high-priority language pairs.
  • Develop and maintain rigorous evaluation pipelines using both automated metrics (BLEU, COMET, chrF) and human evaluation frameworks calibrated to real-world enterprise use cases.
  • Identify the highest-leverage research bets — data augmentation, domain adaptation, quality estimation, terminology consistency — and execute on them with measurable quality gains.

Simultaneous interpretation & delimiter modeling

  • Lead research and development of Sanas's delimiter model — the component that determines optimal segmentation points in streaming speech for real-time translation output.
  • Develop methods to handle speech disfluencies, sentence fragments, and incomplete utterances gracefully in a streaming translation pipeline.
  • Collaborate closely with the speech and inference engineering teams to ensure translation components meet strict real-time latency budgets in production.

Research direction & technical leadership

  • Define and maintain a research roadmap for MT and simultaneous interpretation, prioritizing work that moves production quality metrics.
  • Stay at the frontier of MT research — track and evaluate relevant work — and translate (haha) relevant advances into practical improvements at Sanas.
  • Mentor and technically guide other engineers working on translation-adjacent problems across the ML org.

Data & infrastructure

  • Identify, source, and curate training data for MT and delimiter modeling — including parallel corpora, synthetic data generation, and speech-aware augmentation strategies.
  • Instrument model quality monitoring in production to detect degradation across language pairs and trigger targeted retraining cycles.

Qualifications

  • 3+ years of experience in machine translation, NLP, or multilingual modeling research — with a track record of measurable quality improvements in production systems.
  • Deep familiarity with neural MT architectures: sequence-to-sequence models, Transformer variants, and large multilingual models.
  • Hands-on experience with simultaneous or streaming translation, including segmentation and low-latency decoding strategies.
  • Strong command of MT evaluation methodology — automated metrics, human evaluation design, and error analysis.
  • Proficiency in Python and deep learning frameworks (PyTorch preferred)
  • Demonstrated ability to set a research agenda, execute independently, and communicate findings clearly to technical and non-technical stakeholders.
  • Fluency in English plus working proficiency in at least one non-English language is a strong plus.

Bonus

  • Experience with speech translation (end-to-end or cascaded) and speech-aware MT pipelines.
  • Familiarity with on-device or edge-optimized model deployment for low-latency inference.
  • Prior work on low-resource language pairs, domain adaptation, or terminology-constrained translation.
  • Published research at ACL, EMNLP, NAACL, INTERSPEECH, or equivalent venues.

About the Company

S

Sanas