Role & seniority: ETL QA Analyst; seniority not specified

Stack / tools

AWS data services (S3, Redshift, Glue, EMR, etc.)
SQL for data validation
PySpark for transformation validation
Python for automation
Agile/Scrum environment; modern data engineering toolchains (version control, CI/CD, orchestration)

Top 3 responsibilities

Perform ETL testing and data validation across source, staging, transformation, and reporting layers
Validate PySpark-based transformations and ensure data accuracy, completeness, and consistency
Write complex SQL queries for transformations, reconciliations, full/incremental loads, and defect logging/tracking; collaborate with data engineers and stakeholders

Must-have skills

Solid hands-on ETL testing experience
Advanced SQL for data validation and reconciliation
Experience validating PySpark-based transformations
AWS-based data environment experience (S3, Redshift, Glue, EMR, etc.)
Data warehousing concepts; Python automation; Agile delivery experience

Nice-to-haves

Experience designing/contributing to automated data validation frameworks on AWS
Exposure to modern data toolchains and cloud-native workflows
Basic data modeling knowledge

Location & work type

Location: not specified
Work type: not specified (implied collaborative, Agile environment)

Full Description

Role Overview

We are seeking an ETL QA Analyst with strong hands-on experience in validating data pipelines within AWS-based cloud environments. The ideal candidate will have expertise in SQL-based data validation, exposure to PySpark transformations, and experience ensuring data integrity across source, staging, and reporting layers.

This role focuses on maintaining data accuracy, completeness, and consistency across enterprise data platforms and collaborating closely with data engineering teams in an Agile environment.

Key Responsibilities

ETL & Data Validation

Perform ETL testing for AWS-based data pipelines. Validate data movement across source, staging, transformation, and reporting layers.

Write complex SQL queries to validate

Data transformations and business logic
Aggregations, joins, filters, and calculations
Record counts and reconciliation checks
Perform source-to-target validation and data reconciliation.
Test full loads, incremental loads, and data refresh processes.
Identify data anomalies, integrity issues, and transformation errors.

Cloud & PySpark Validation

Validate data transformations executed using PySpark. Support testing of distributed data processing workflows in AWS environments. Ensure correctness and consistency of large datasets processed in cloud platforms.

Automation & Toolchain Exposure

Use Python to automate repetitive data validation and reconciliation tasks. Maintain regression test cases as pipelines evolve. Work within modern data engineering toolchains (e.g., version control, CI/CD, orchestration tools) to support automated validation workflows.

Collaboration & Reporting

Work closely with data engineers and business stakeholders to review transformation logic and requirements. Log, track, and verify resolution of data defects. Participate in Agile/Scrum ceremonies and provide QA updates.

Required Skills & Experience

Strong hands-on ETL testing experience. Advanced SQL skills for data validation and reconciliation. Experience validating PySpark-based transformations. Working experience in AWS-based data environments (e.g., S3, Redshift, Glue, EMR, etc.). Solid understanding of ETL concepts and data warehousing fundamentals. Proficiency in Python for automation. Experience working in Agile delivery environments.

Preferred (Nice To Have)

Experience designing or contributing to automated data validation frameworks on AWS. Exposure to modern data toolchains and cloud-native workflows. Basic understanding of data modeling concepts.

Cloud Data QA Engineer