Role & seniority

Senior Data Quality Engineer

Stack/tools

Python; Big Data: Hadoop (HDFS, Hive, Spark); streaming: Kafka, Flume, Kinesis

NoSQL: Cassandra, MongoDB, HBase; RDBMS: PostgreSQL, MSSQL, MySQL, Oracle

Data viz: Tableau, Power BI, Tibco Spotfire

Cloud: AWS, Azure, GCP; multi-cloud awareness

ETL/data quality: Talend, Informatica; testing: JMeter; version control: Git, GitLab, SVN

CI/CD: Jenkins, GitHub Actions; testing frameworks: TDD, DDT, BDT

Data governance, MDM familiarity; automation for large-scale systems

Top 3 responsibilities

Create and implement data quality strategies to ensure accuracy across data systems/products
Lead initiatives to improve data workflows, incorporating best practices across teams
Build/optimize automated validation pipelines and design testing strategies for evolving architectures; mentor junior engineers; maintain testing documentation

Must-have skills

3+ years in Data Quality Engineering or related roles
Python proficiency for validation/automation
Expertise with Hadoop/Spark, Kafka/Flume/Kinesis, and NoSQL (Cassandra, MongoDB, HBase)
Strong SQL and RDBMS experience; multi-cloud knowledge
ETL tooling (Talend/Informatica); CI/CD (Jenkins/GitHub Actions)
Testing frameworks (TDD/DT/BDT), data governance/MDM familiarity
Excellent communication in English (B2+)

Nice-to-have

Java/Scala or advanced Bash scripting
XPath experience; synthetic data generation tools

Location & work type

Location: Not

Full Description

We are looking for an experienced Senior Data Quality Engineer to join our team and take responsibility for ensuring the accuracy, reliability, and performance of our data systems and workflows. In this role, you will lead key initiatives to improve data quality, leveraging advanced technologies to deliver impactful results. If you are passionate about refining data processes and enjoy working with cutting-edge solutions, this position offers the chance to shape the future of our data infrastructure. Responsibilities Create and implement data quality strategies to ensure consistent accuracy across data systems and products Lead initiatives to improve data workflows by incorporating best practices across teams and projects Develop and apply advanced testing frameworks and methodologies to meet enterprise data quality standards Efficiently manage complex data quality tasks, ensuring prioritization and delivery under tight deadlines Design testing strategies tailored to evolving system architectures and data pipeline requirements Provide recommendations on resource allocation and testing priorities that align with compliance and business needs Establish and refine governance frameworks to ensure alignment with industry standards Build and optimize automated validation pipelines to support production environments Collaborate with cross-functional teams to resolve infrastructure challenges and improve system performance Mentor junior engineers and maintain comprehensive documentation of testing processes and strategies Requirements At least 3 years of professional experience in Data Quality Engineering or related roles Advanced proficiency in Python for data validation and workflow automation Expertise in Big Data platforms such as Hadoop tools (HDFS, Hive, Spark) and modern streaming technologies like Kafka, Flume, or Kinesis Hands-on experience with NoSQL databases like Cassandra, MongoDB, or HBase for managing large-scale datasets Proficiency in data visualization tools such as Tableau, Power BI, or Tibco Spotfire for analytics and reporting Extensive experience with cloud platforms like AWS, Azure, or GCP, with knowledge of multi-cloud architectures Advanced knowledge of relational databases and SQL technologies like PostgreSQL, MSSQL, MySQL, and Oracle in high-volume environments Proven ability to implement and scale ETL processes using tools like Talend, Informatica, or similar platforms Familiarity with MDM tools and performance testing applications like JMeter Strong experience with version control systems like Git, GitLab, or SVN, and automation for large-scale systems Deep understanding of testing frameworks such as TDD, DDT, and BDT for data-focused environments Experience implementing CI/CD pipelines using tools like Jenkins or GitHub Actions Strong analytical and problem-solving abilities, with the capability to derive actionable insights from complex datasets Excellent verbal and written communication skills in English (B2 level or higher), with experience engaging stakeholders Nice to have Experience with additional programming languages like Java, Scala, or advanced Bash scripting for production-level solutions Advanced knowledge of XPath for data validation and transformation workflows Proficiency in designing custom data generation tools and synthetic data techniques for testing scenarios We offer International projects with top brands Work with global teams of highly skilled, diverse peers Healthcare benefits Employee financial programs Paid time off and sick leave Upskilling, reskilling and certification courses Unlimited access to the LinkedIn Learning library and 22,000+ courses Global career opportunities Volunteer and community involvement opportunities EPAM Employee Groups Award-winning culture recognized by Glassdoor, Newsweek and LinkedIn