Senior Data Engineer


Gaithersburg, MD

Gaithersburg, MD, US
Gaithersburg, MD
30+ days ago

The ATCC Digital Biology Group is seeking a Senior Data Engineer to join their team of genomics laboratory scientists and bioinformaticians who are building the ATCC Genome Portal. The successful candidate is a subject matter expert who manages the Linux (CentOS) High Performance Computing (HPC) cluster and provides Information Technology support to genomic data analytics, bioinformatics and sequencing laboratories. They will be expected to maintain the HPC backend infrastructure such as data management, system administration, virtual environments, pipeline control and job scheduling.

Responsibilities include working with both laboratory and bioinformaticians to develop and deploy high-performances genomic analysis pipelines to be processor, memory and storage efficient. They will help with complex data analysis, problem solving with other team members and integration of textual, both structured and unstructured data understanding database structure and schema. They will have a lead role in ATCC’s new laboratory information management system (LIMS), an integrated data lake and structured data warehouse.


  1. Systems administration of a Linux-CentOS7 HPC to maintain performance, create usage and security policies, and lead data management activities.
  2. Support IT on both backend systems and front-end deployment of various scientific and data science/analytic based tools, including downloading, configuring, compiling, and deploying open-source tools for bioinformatics and other data science needs.
  3. Determine, design, evaluate, and test complex data infrastructures and put into practice computational and workflow solutions for biotechnology research, cell line production support, and other data science.
  4. Implementing algorithms, genomic pipelines, natural language processing software and managing data in cloud-based computing environments and high-performance clusters.
  5. Provides technical leadership in deployment of newly developed data analytics and bioinformatic tools and pipelines. Anticipates technical risks, informs the team and provides mitigation strategies.
  6. Collaborate with ATCC scientists to define and implement algorithms and pipelines executed with full scientific rigor to produce robust and reproducible analysis. Translate business requirements to technical specifications and coded data pipelines.
  7. Leverage existing data infrastructure to fulfill all data-related requests, perform necessary data housekeeping, data cleansing, normalization, hashing and implementation of required data model changes.
  8. Manage, plan, budget computing resources and infrastructure to effectively meet program timelines, performance requirements and priorities.
  9. Understand database structure and schema, knowledge of relational databases and SQL language or similar.
  10. Assess a need, build, evolve, and scale out infrastructure to ingest, process and extract meaning out of data. Compile data to present to Senior Management for decision making.

Education and Experience:

  • Bachelor's degree and 5 or more years' experience or equivalent experience.

Other Duties:

  • Perform other duties as assigned.

Founded in 1925, ATCC is a non-profit organization with a mission to acquire, authenticate, preserve, develop, standardize, and distribute biological materials and information for the advancement and application of scientific knowledge.


ATCC is an Equal Opportunity Employer and does not discriminate against any employee or applicant for employment because of race, color, sex, age, national origin, religion, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class.

About the Company