Role & seniority: Senior Data Quality Engineer (intermediate level implied in listing; primary designation is Senior)

Stack/tools: Python (data validation/automation), Hadoop ecosystem (HDFS, Hive, Spark), Kafka/Flume/Kinesis, NoSQL (Cassandra, MongoDB, HBase), SQL (PostgreSQL, MSSQL, MySQL, Oracle), ETL tools (Talend, Informatica), BI/visualization (Tableau, Power BI, Spotfire), cloud (AWS, Azure, GCP), CI/CD (Jenkins, GitHub Actions), version control (Git/GitLab/SVN), testing frameworks (TDD/DDT/BDT), MDM tools, JMeter; data governance; automated validation pipelines

Top 3 responsibilities

Develop, implement, and scale data quality strategies and automated validation pipelines across production data systems
Lead testing efforts, embed best practices, design advanced testing frameworks, govern standards, and allocate resources to meet compliance and business goals
Collaborate with cross-functional teams to troubleshoot infrastructure, optimize performance, mentor juniors, and maintain testing documentation

Must-have skills

3+ years in Data Quality Engineering or related field
Python expertise for validation/automation
Experience with Hadoop ecosystem, Kafka/Flume/Kinesis, NoSQL (Cassandra/MongoDB/HBase)
SQL proficiency (PostgreSQL, MSSQL, MySQL, Oracle) and ETL tooling (Talend/Informatica)
Cloud experience (AWS/Azure/GCP), CI/CD (Jenkins/GitHub Actions), version control (Git/GitLab/SVN)
Data governance, testing frameworks (TDD/DD

Full Description

We are seeking a knowledgeable Senior Data Quality Engineer to join our team and ensure the accuracy, reliability, and efficiency of our data systems and workflows. In this role, you will lead impactful data quality initiatives, utilizing advanced technologies to drive meaningful outcomes. If you are passionate about enhancing data processes and enjoy working with innovative solutions, this is an opportunity to shape the future of our data operations. Responsibilities Develop and implement data quality strategies to maintain accuracy and reliability across data systems and products Lead efforts to enhance data quality by embedding best practices into team workflows and processes Design and execute advanced testing methodologies and frameworks to ensure enterprise-level data quality standards are met Manage complex data quality tasks efficiently, prioritizing under tight deadlines and competing requirements Create tailored testing strategies aligned with evolving system architectures and data pipeline needs Provide guidance on resource allocation and prioritize testing efforts to meet compliance and business objectives Establish and continuously improve governance frameworks to ensure adherence to industry standards Develop and scale automated validation pipelines to support production environments Work collaboratively with cross-functional teams to troubleshoot infrastructure challenges and optimize system performance Mentor junior team members and maintain detailed documentation of testing methodologies and strategies Requirements At least 3 years of professional experience in Data Quality Engineering or related fields Advanced skills in Python for data validation and automation workflows Expertise in Big Data platforms such as Hadoop tools (HDFS, Hive, Spark) and modern streaming technologies like Kafka, Flume, or Kinesis Practical experience with NoSQL databases such as Cassandra, MongoDB, or HBase for managing large datasets Proficiency in data visualization tools like Tableau, Power BI, or Tibco Spotfire for analytics and decision-making support Extensive experience with cloud services such as AWS, Azure, or GCP, with an understanding of multi-cloud architectures Advanced knowledge of relational databases and SQL technologies like PostgreSQL, MSSQL, MySQL, and Oracle in high-volume environments Proven ability to implement and scale ETL processes using tools such as Talend, Informatica, or similar platforms Familiarity with MDM tools and performance testing applications like JMeter Strong experience with version control systems like Git, GitLab, or SVN, and automation for large-scale systems Comprehensive understanding of testing frameworks such as TDD, DDT, and BDT for data-focused systems Experience with CI/CD pipeline implementation using tools like Jenkins or GitHub Actions Strong analytical and problem-solving skills, with the ability to extract actionable insights from complex datasets Excellent verbal and written English communication skills (B2 level or higher), with experience engaging stakeholders Nice to have Experience with additional programming languages like Java, Scala, or advanced Bash scripting for production-level solutions Advanced understanding of XPath for data validation and transformation processes Expertise in creating custom data generation tools and synthetic data techniques for testing scenarios We offer International projects with top brands Work with global teams of highly skilled, diverse peers Healthcare benefits Employee financial programs Paid time off and sick leave Upskilling, reskilling and certification courses Unlimited access to the LinkedIn Learning library and 22,000+ courses Global career opportunities Volunteer and community involvement opportunities EPAM Employee Groups Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn