Lead Data Architect

Karsun Solutions, LLC

Herndon, Virginia

JOB DETAILS
SKILLS
Access Control, Amazon Web Services (AWS), Apache Avro, Application Programming Interface (API), Architectural Analysis, Architectural Design, Artificial Intelligence (AI), Cisco Unity, Cloud Computing, Code Reviews, Continuous Deployment/Delivery, Continuous Integration, Cost Control, Data Management, Data Processing, Data Quality, Data Science, DataArchitect Data Modeling Tool, Design Patterns Programming Methodologies, GitHub, Information/Data Security (InfoSec), Jenkins, Machine Learning, Machine Tool, Mentoring, Metrics, Onboarding, Open Source, Performance Tuning/Optimization, Python Programming/Scripting Language, SQL (Structured Query Language), Semantic Search, Structured Data, Team Lead/Manager, Unstructured Data, Use Cases, Warehousing
LOCATION
Herndon, Virginia
POSTED
19 days ago
Overview:

Summary

The Lead Data Architect will design, build, and operate enterprise data platforms that power GenAI and AI/ML use cases. This is a highly technical, hands-on role responsible for data platform architecture, end-to-end data engineering, ML/LLM pipeline design, production model onboarding, and delivery of scalable Databricks- centric solutions across cloud environments.

Responsibilities:

What You'll Be Doing:

  • Architect and implement enterprise data platforms (batch + streaming) optimized for ML, LLMs, and GenAI workloads.
  • Lead design and hands on implementation of Databricks workspaces, Unity Catalog, Delta Lake design patterns, cluster policies, and performance tuning.
  • Build and own end to end data pipelines (ingest, transform, feature engineering, serving) using PySpark, Databricks Jobs, Spark SQL, Delta Lake, and orchestration tools.
  • Design and operationalize model training, fine tuning (LLM), evaluation, deployment, and monitoring pipelines (MLOps/RAG/CAG) integrating Databricks MLflow, CI/CD, and infra-as-code.
  • Implement vectorless and vectorization/embedding pipelines, vector store integrations, and retrieval layers for RAG (FAISS, Pinecone, Weaviate, Milvus).
  • Define data schemas, governance, lineage, access controls, and data product APIs; implement Unity Catalog or equivalent for centralized governance.
  • Drive cost/performance optimization for storage, compute (spot/preemptible),and query patterns. 
  • Collaborate with engineers, data scientists, product owners, and security to translate business needs into production GenAI solutions. 
  • Mentor and lead engineering teams; conduct architecture reviews, code reviews, and run technical deep dives. 
  • Implement observability for data and ML pipelines (metrics, logging, data quality tests, alerting). 
  • Create reproducible experiment tracking, model registry, and rollout strategies (canary, shadow testing, rollback). 
  • Stay current on GenAI/LLM architectures and evaluate/introduce new tooling and frameworks.
Qualifications and Education:

Required Qualifications:

  • BA or BS degree in CS, Computer Engineering, Information Technology or a
    related field.
  • 8+ years hands on experience in data engineering/platform architecture; 3+ years in an architect or lead role.
  • Candidate must hold an active AWS Certified Machine Learning – Specialty certification.
  • Proven, hands on Databricks experience (designing workspaces, Delta Lake, performance tuning, productionizing Spark jobs).
  • Deep Spark + PySpark expertise and experience with Databricks Runtime.
  • Strong experience building ML/LLM pipelines and operationalizing models (training, fine tuning, serving).
  • Practical experience with vector embeddings, semantic search, and RAG architectures.
  • Solid Python expertise and common ML libraries (PyTorch, TensorFlow, Hugging Face transformers) and MLflow.
  • Cloud platform experience (AWS strongly preferred).
  • Experience with containerization and orchestration while leveraging open source libraries for unstructured and structured data processing, serving/inference.
  • Strong SQL skills; experience with distributed query/warehouse systems and parquet/AVRO/Delta formats.
  • CI/CD and infra-as-code experience (Terraform, GitOps, Jenkins/GitHub Actions/GitLab CI).
  • Data governance, security, and IAM experience; experience implementing row/column level access controls and data lineage.
  • Demonstrated ability to design for scalability, reliability, and cost efficiency. 
Compensation:

The proposed salary range for this role is $****** to $******* USD. The salary range provided is a good faith estimate representative of all experience levels. Karsun considers several factors when extending an offer, including but not limited to, the role, function and associated responsibilities, a candidate’s work experience, location, education/training, and key skills.

About the Company

K

Karsun Solutions, LLC