Indent :PSL216927_1-26-1
Role : Google Cloud Data Architect – IAM Data ModernizationLocation : Dallas, TX / Charlotte, NC (Hybrid – 4 days office)Rate: $80/hr to $85/hrHighly Preferred OCP expProject/ProgramIdentity & Access Management (IAM) Data Modernization – migration of an on premises SQL data warehouse to a target state
Data Lake on Google Cloud (GCP), enabling metrics & reporting, advanced analytics, and
GenAI use cases (natural language querying, accelerated summarization, cross domain trend analysis) leveraging
PySpark based processing, cloud native DevOps CI/CD pipelines, and containerized deployments on OpenShift (OCP) to deliver scalable, secure, and high performance data solutions.
About Program/ProjectThe IAM Data Modernization project involves migrating an on-premises SQL data warehouse to a target state Data Lake in GCP cloud environment. Key highlights include:
- Integration Scope: 30+ source system data ingestions and multiple downstream integrations
- Capabilities: Metrics, reporting, and Gen AI use cases with natural language querying, advanced pattern/trend analysis, faster summarizations, and cross-domain metric monitoring
- Benefits:
- Scalability and access to advanced cloud functionality
- Highly available and performant semantic layer with historical data support
- Unified data strategy for executive reporting, analytics, and Gen AI across cyber domains
This modernization establishes a single source of truth for enterprise-wide data-driven decision-making.
Required Skills
DevOps / CI CD- Experience implementing CI/CD pipelines for data and analytics workloads
- Familiarity with Git based source control, build automation, and deployment strategies
Containers & Platform
- Experience with OpenShift Container Platform (OCP) for deploying data workloads and services
- Understanding of containerized architecture, scaling, and environment management
- Proven ability to build CI/CD pipelines for data and infrastructure workloads
- Experience managing secrets securely using GCP Secret Manager
- Ownership of observability, SLOs, dashboards, alerts, and runbooks
- Proficiency in logging, monitoring, and alerting for data pipelines and platform reliability
Big Data & Processing- Hands on experience with PySpark for ETL/ELT, data transformation, and performance optimization
- Solid understanding of distributed data processing concepts
Data & Cloud Architecture- Strong experience designing data platforms on Google Cloud Platform (GCP)
- Experience with Data Lakes, data warehousing, and large scale migration programs
Data Lake Architecture & Storage- Proven experience designing and implementing data lake architectures (e.g., Bronze/Silver/Gold or layered models).
- Strong knowledge of Cloud Storage (GCS) design, including bucket layout, naming conventions, lifecycle policies, and access controls
· Experience with
Hadoop/HDFS architecture, distributed file systems, and data locality principles
- Hands-on experience with columnar data formats (Parquet, Avro, ORC) and compression techniques
- Expertise in partitioning strategies, backfills, and large-scale data organization
- Ability to design data models optimized for analytics and BI consumption
Data Ingestion & Orchestration· Experience building
batch and streaming ingestion pipelines using GCP-native services
· Knowledge of
Pub/Sub-based streaming architectures, event schema design, and versioning
· Strong understanding of
incremental ingestion and CDC patterns, including idempotency and deduplication
· Hands-on experience with
workflow orchestration tools (Cloud Composer / Airflow)
· Ability to design robust
error handling, replay, and backfill mechanismsData Processing & Transformation· Experience developing scalable
batch and streaming pipelines using Dataflow (Apache Beam) and/or Spark (Dataproc)
· Strong proficiency in
BigQuery SQL, including query optimization, partitioning, clustering, and cost control.
· Hands-on experience with Hadoop
MapReduce and ecosystem tools (Hive, Pig, Sqoop)
· Advanced
Python programming skills for data engineering, including testing and maintainable code design
· Experience managing
schema evolution while minimizing downstream impact
Analytics & Data Serving· Expertise in
BigQuery performance optimization and data serving patterns
· Experience building
semantic layers and governed metrics for consistent analytics
· Familiarity with
BI integration, access controls, and dashboard standards
· Understanding of data exposure patterns via
views, APIs, or curated datasetsData Governance, Quality & Metadata· Experience implementing
data catalogs, metadata management, and ownership models· Understanding of
data lineage for auditability and troubleshooting
· Strong focus on
data quality frameworks, including validation, freshness checks, and alerting
· Experience defining and enforcing
data contracts, schemas, and SLAs Good to have
Security, Privacy & Compliance· Hands-on experience implementing
fine-grained access controls for BigQuery and GCS
· Experience with
Sprint planning and helping team technically.
· Strong stakeholder communication and solution architecture skills
Qualifications- Experience: [10–14]+ years in DevOps and Data Architecture, 5+ years designing on Pyspark/GCP/OCP at scale; prior on prem → cloud migration a must.
- Education: Bachelor's/Master's in Computer Science, Information Systems, or equivalent experience.
Certifications:Google Cloud Professional Cloud Architect/DevOps/OCP (required or within 3 months).
Plus: Professional Data Engineer, Security Engineer