Mainframe Operations Technical Leader

ClifyX, INC

Plano, TX

JOB DETAILS
SKILLS
Ansible, Automation, Budgeting, CPU (Central Processing Unit), Capacity Management, Change Management, Change Requests/Orders, Cobol Programming Language, Communication Skills, Continuous Deployment/Delivery, Continuous Integration, Control Objectives for Information and related Technology (COBIT), Corrective Action, Data Quality, Data Recovery, Data Sets, Database Recovery, DevOps, High Availability, IBM CICS (Customer Information Control System), IBM Clist Programming Language, IBM DB2, IBM IMS Database, IBM Job Entry Subsystem, IBM Product Family, IBM Rexx Programming Language, IBM System/Spool Display and Search Facility (SDSF), IBM TSO/ISPF, IBM z-OS Operating System, IP Multimedia System (IMS), ISO (International Organization for Standardization), ITIL (IT Infrastructure Library), Incident Management, JCL (Job Control Language), Leadership, Machine Tool, Mainframe Computer, Maintenance Services, Memory Hardware, Mentoring, Messaging Middleware, Metrics, Operating Systems, PCI, Performance Metrics, Performance Testing, Performance Tuning/Optimization, Process Improvement, Python Programming/Scripting Language, RACF (Resource Access Control Facility), Regulations, Reporting Dashboards, Risk, Risk Management, Sarbanes-Oxley Act (SOX), Schedule Development, Scripting (Scripting Languages), Security Compliance, Service Level Agreement (SLA), Software Administration, Symmetric MultiProcessing (SMP) Computing, System Operations, System Validation, Systems Administration/Management, Systems/Internals Programming, Technical Leadership, Technical Operations, Trend Analysis, Vendor/Supplier Planning, Virtual Tape Library, Workload Automation
LOCATION
Plano, TX
POSTED
30+ days ago
Job Description
Must Have Technical/Functional Skills
The Mainframe Operations Technical Leader is responsible for the run-and-maintain stability, resilience, and compliance of enterprise mainframe platforms (z/OS, DB2, IMS, CICS, MQ). This role leads the technical operations team, orchestrates incident/problem/change workflows, ensures product currency with vendor-supported levels, and drives operational excellence via automation, observability, and proactive maintenance. The leader partners with Delivery, Application, Infrastructure, and Security teams to minimize risk, optimize performance, and meet regulatory/audit expectations.

Required Qualifications
10+ years in Mainframe Operations/System Programming (z/OS, DB2, IMS, CICS, MQ).
Proven leadership of major incidents, platform upgrades, and audit-compliant change management.
Expertise in IBM tooling: SMP/E, PARMLIB/PROCLIB, ISPF, RACF, SDSF, JES2/3, OMEGAMON/IBM Z Observability, IWS/TWS/IWA, ChangeMan.
Solid understanding of performance tuning, workload management (WLM), CAP/DR, and security hardening.
Strong scripting/automation skills (Rexx, JCL; optional: Python on z/OS, Ansible for z/OS).
Excellent communication able to produce executive-level summaries and detailed runbooks/RCAs.

Preferred Qualifications
Experience with IBM z16 deployments and hardware migrations.
DB2 V12 V13 and IMS upgrade experience with fallback/ERLY code handling.
Knowledge of MQ clustering, message routing, DLQ handling, and transmit queue backlogs resolution.
Exposure to DevOps for mainframe, CI/CD pipelines (ChangeMan integrations), and observability platforms.
Certifications: IBM z/OS, DB2, CICS, MQ; ITIL, COBIT; ISO/PCI/SOX compliance exposure.

Success Metrics (KPIs/OKRs)
Availability & SLA: &Client;99.9% for critical regions; batch SLA &Client;98%.
Incident Performance: MTTR reduced by 20 30%; fewer repeat incidents.
Change Quality: 1% failed changes; 100% artifacts captured for audit.
Currency & Risk: 0 unmitigated HIPER exposure beyond defined window; timely RSU/PTF adoption.
Efficiency: 25% reduction in manual toil via automation; measurable batch window improvements.
Compliance: 100% control evidence traceable; zero audit findings.

Tools & Environment
OS & Subsystems: z/OS, DB2, IMS, CICS, MQ.
Scheduling: IBM Workload Scheduler / Workload Automation.
Source/Change: ChangeMan, SMP/E.
Security: RACF (or ACF2/Top Secret), SIEM integration.
Monitoring: OMEGAMON, RMF/SMF, IBM Z Observability, custom dashboards.
Automation: Rexx/CLIST, JCL, Ansible for z/OS (if applicable).

Roles & Responsibilities
Operational Leadership
Own 24x7 availability and SLA adherence for z/OS, DB2, IMS, CICS, MQ, JES2/3, RACF, IWS/TWS scheduling, and related tooling.
Lead major incident management (bridge, triage, comms), direct technical recovery steps, and oversee post-incident RCAs with corrective actions and prevention plans.
Govern problem management (trend analysis, chronic issues, defect elimination, countermeasures, KEDBs).
Chair/drive Change Advisory Board (CAB) readiness for mainframe changes; enforce runbooks, backout plans, and artifact capture for audit.

Platform Currency & Upgrades
Plan and execute vendor-supported upgrades (e.g., z/OS, DB2, IMS, COBOL compiler, MQ, IBM Workload Scheduler/Workload Automation) including fallback strategies and regression validation.
Maintain software currency and security compliance (PTFs, RSU, HIPER/APARs), coordinate with vendors, and validate interdependencies across subsystems.
Reliability, Performance & Capacity
Establish SRE-aligned practices (SLIs/SLOs, error budgets, resiliency testing, performance tuning for DB2 buffer pools, IMS PSBs/DBDs, CICS regions, MQ channels/queues).
Drive capacity planning (CPU, memory, DASD, logs/journals, queue depths, batch windows); optimize WLM.
Reduce toil via automation (Rexx/CLIST, JCL, Ansible for z/OS if applicable, IWS/TWS), observability, and auto-remediation.

Governance, Risk & Compliance
Ensure audit readiness: define Artifacts to Capture (change evidence, logs, command transcripts, approvals, test results).
Enforce access controls (RACF/ACF2/Top Secret), segregation of duties, and policy conformance (SOX/ISO).
Maintain configuration baselines, DR runbooks, and conduct BCP/DR exercises (full and component-level recovery, DB2/IMS recovery drills).

Delivery & Stakeholder Management
Translate platform strategy into quarterly OKRs, operational roadmaps, and executive-ready summaries (status, risk, business impact, investment asks).
Partner with Application teams for batch optimization, cutover planning, and data integrity (log/archive handling, reprocessing).
Mentor System Programmers and Operations engineers; standardize procedures and promote continuous improvement.

Incident/Request/Change Execution (Day-to-Day)
Triage incidents (e.g., MQ transmit queue backlogs, DB2 locking, IMS region abends, CICS transaction spikes), execute WTOR-confirmed commands when required.
Review and approve service requests (dataset allocations, RACF changes, scheduler modifications).
Perform routine health checks (channel states, buffer pools, log utilization, queue depths, JES backlog, batch SLAs).
Validate backup/restore success (DFSMSdss, tape/VTL, HSM), ensure recovery points and retention policies are met.

About the Company

C

ClifyX, INC