Role & seniority: Mid-level AI/ML QA Engineer (2+ years of relevant experience)

Stack/tools: Python, SQL, test automation frameworks; ML/AI validation; model evaluation; dashboards/monitoring; CI/CD; Agile

Top 3 responsibilities

Develop and execute test strategies for ML and generative AI-powered applications
Design and maintain evaluation frameworks for LLMs (automated scoring, LLM-as-a-judge)
Build/maintain dashboards and monitoring to detect drift, degraded scores, and safety risks; implement proactive AI-driven alerts

Must-have skills

2+ years validating ML or generative AI applications (model evaluation, data quality)
Proficiency in Python and SQL; experience with test automation
Experience evaluating LLMs, prompt regression testing, and human-in-the-loop methodologies
Knowledge of RAG concepts (retrieval quality, relevance, faithfulness, safety)
Experience designing AI evaluation metrics (ranking, calibration, reliability); production health reporting
Strong analytical, documentation, and communication skills; Agile/CI-CD familiarity; self-starter in fast-paced environments

Nice-to-haves

Experience with model monitoring in production; familiarity with proactive alerting systems
Ability to partner across data science, engineering, product, and security to define quality gates
Location & work type: Location not specified; work type not disclosed (full-time role with benefits and incentive eligibility)

Full Description

When you’re the best, we’re the best. We instill an environment where employees feel engaged, satisfied and able to contribute their unique skills and talents while living and working as their authentic selves. We provide extensive opportunities for personal and professional development, building both employee competence and organizational capability to fuel exceptional performance through an inclusive environment both now and in the future.

Summary

In this role, you will validate AI and ML-powered healthcare solutions across the full development lifecycle to ensure data quality, model performance, reliability, and safe deployment in production environments. You will design and execute data-driven and automated test strategies, including model evaluation, prompt regression testing, dataset profiling, and end-to-end pipeline validation. You will partner with data science, engineering, product, and security teams to define measurable quality gates and deliver compliant, explainable, and dependable AI experiences that drive client value.

Responsibilities

Develop and execute test strategies for ML and generative AI-powered applications.
Design and maintain evaluation frameworks for Large Language Models (LLM), including automated scoring and LLM -as-a-judge methodologies.
Develop prompt regression test suites to detect performance degradation across model and prompt versions.
Evaluate generative AI systems for hallucination risk, factual consistency, grounding accuracy, and safety compliance.
Conduct model evaluation, regression testing, and drift monitoring in development and production environments.
Build dashboards and monitoring tools to detect degraded evaluation scores, drift, or safety risks and support proactive triage.
Design and implement proactive AI-driven alerting and recommendation systems embedded within dashboards and user workflows
Automate dashboard metric generation and refresh pipelines using Python and data workflows.
Partner with cross-functional teams to define AI quality standards, acceptance criteria, and release gates.
Investigate defects, analyze root causes, and recommend corrective actions to improve reliability and performance.

Qualifications

Relevant degree preferred.
2 or more years of relevant experience required.
Experience validating ML or generative AI-based applications, including model evaluation and data quality assessment required.
Proficiency in Python, SQL, and test automation frameworks.
Experience evaluating LLM systems, including prompt regression testing and automated or human-in-the-loop judging methodologies.

Software Quality Engineer (AI/ML Applicaitons)