Role & seniority: Senior Data Quality Engineer

Stack/tools

Programming: Python; Java/Scala optional (nice-to-have)
Data platforms: Hadoop (HDFS, Hive, Spark); Kafka/Flume/Kinesis
Databases: NoSQL (Cassandra, MongoDB, HBase); relational (PostgreSQL, MSSQL, MySQL, Oracle)
ETL & automation: Talend, Informatica; CI/CD (Jenkins, GitHub Actions); Git
Validation & testing: automated validation pipelines; TDD/DDT/BDT; JMeter
Visualization/analytics: Tableau, Power BI, Tibco Spotfire
Cloud & architecture: AWS, Azure, GCP; multi-cloud
Data management: MDM; data generation/synthetic data (nice-to-have)
Additional: XPath (nice-to-have)

Top 3 responsibilities

Develop and implement data quality strategies and testing frameworks to ensure accuracy across data systems
Lead cross-team data workflow improvements, governance, and prioritization aligned with compliance and business needs
Build and optimize automated validation pipelines for production environments; mentor engineers and document processes

Must-have skills

3+ years in Data Quality Engineering or related roles
Advanced Python for data validation and automation
Strong experience with Hadoop stack, Kafka/streaming tech, NoSQL and SQL ecosystems
Proficiency in ETL tooling and CI/CD pipelines; version control
Experience designing testing frameworks (TDD/DDT/BDT) and data-focused validation
Able to analyze complex datasets and communicate findings;

Full Description

We are looking for an experienced Senior Data Quality Engineer to join our team and take responsibility for ensuring the accuracy, reliability, and performance of our data systems and workflows. In this role, you will lead key initiatives to improve data quality, leveraging advanced technologies to deliver impactful results. If you are passionate about refining data processes and enjoy working with cutting-edge solutions, this position offers the chance to shape the future of our data infrastructure. Responsibilities Create and implement data quality strategies to ensure consistent accuracy across data systems and products Lead initiatives to improve data workflows by incorporating best practices across teams and projects Develop and apply advanced testing frameworks and methodologies to meet enterprise data quality standards Efficiently manage complex data quality tasks, ensuring prioritization and delivery under tight deadlines Design testing strategies tailored to evolving system architectures and data pipeline requirements Provide recommendations on resource allocation and testing priorities that align with compliance and business needs Establish and refine governance frameworks to ensure alignment with industry standards Build and optimize automated validation pipelines to support production environments Collaborate with cross-functional teams to resolve infrastructure challenges and improve system performance Mentor junior engineers and maintain comprehensive documentation of testing processes and strategies Requirements At least 3 years of professional experience in Data Quality Engineering or related roles Advanced proficiency in Python for data validation and workflow automation Expertise in Big Data platforms such as Hadoop tools (HDFS, Hive, Spark) and modern streaming technologies like Kafka, Flume, or Kinesis Hands-on experience with NoSQL databases like Cassandra, MongoDB, or HBase for managing large-scale datasets Proficiency in data visualization tools such as Tableau, Power BI, or Tibco Spotfire for analytics and reporting Extensive experience with cloud platforms like AWS, Azure, or GCP, with knowledge of multi-cloud architectures Advanced knowledge of relational databases and SQL technologies like PostgreSQL, MSSQL, MySQL, and Oracle in high-volume environments Proven ability to implement and scale ETL processes using tools like Talend, Informatica, or similar platforms Familiarity with MDM tools and performance testing applications like JMeter Strong experience with version control systems like Git, GitLab, or SVN, and automation for large-scale systems Deep understanding of testing frameworks such as TDD, DDT, and BDT for data-focused environments Experience implementing CI/CD pipelines using tools like Jenkins or GitHub Actions Strong analytical and problem-solving abilities, with the capability to derive actionable insights from complex datasets Excellent verbal and written communication skills in English (B2 level or higher), with experience engaging stakeholders Nice to have Experience with additional programming languages like Java, Scala, or advanced Bash scripting for production-level solutions Advanced knowledge of XPath for data validation and transformation workflows Proficiency in designing custom data generation tools and synthetic data techniques for testing scenarios We offer International projects with top brands Work with global teams of highly skilled, diverse peers Healthcare benefits Employee financial programs Paid time off and sick leave Upskilling, reskilling and certification courses Unlimited access to the LinkedIn Learning library and 22,000+ courses Global career opportunities Volunteer and community involvement opportunities EPAM Employee Groups Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn