Role & seniority: QA Automation Engineer, senior-level (10+ years in QA Automation with AI/ML/LLM focus)

Stack/tools: Python (expert), Pytest; API/UI testing with Playwright/Selenium/Cypress; AI evaluation frameworks (LangSmith, DeepEval, RAGAS, Promptfoo); SQL/JSON; CI/CD integration

Top 3 responsibilities

Design and build automation frameworks to evaluate autonomous AI agents, focusing on non-deterministic outputs and model-based evaluation
Create and maintain Eval pipelines and Golden Datasets to benchmark performance across prompts/models; implement latency/cost monitoring in CI/CD
Develop automated checks for tool/API calls, multi-step workflows, prompt/regression testing, hallucination bias, and jailbreak detection; collaborate with AI Engineers to translate requirements into test cases

Must-have skills

10+ years in QA Automation, recent AI/ML/LLM testing experience
Expert Python and Pytest; strong API/UI testing with Playwright/Selenium/Cypress
Experience with AI evaluation tools (LangSmith, DeepEval, RAGAS, Promptfoo)
Data validation skills (SQL, JSON) for RAG workflows
Statistical mindset for scoring-based evaluation (e.g., 85% accuracy)

Nice-to-haves

Experience testing Multi-Agent Systems
Knowledge of Prompt Engineering
Background in Investment Banking/Fintech for high-stakes data accuracy

Location & work type

Location: not specified
Work type: Full-time; not available for independent contracto

Full Description

We are seeking a QA Automation Engineer who is ready to move beyond traditional "Pass/Fail" testing. In this role, you will design and build automation frameworks specifically for Agentic AI products. You will focus on evaluating the performance of autonomous agents, ensuring they follow logical reasoning paths, call the correct tools, and provide accurate, safe outputs. Your mission is to build the "evaluations" (Evals) that define what high-quality AI behavior looks like, moving the needle from unpredictable experiments to production-grade software. Key Responsibilities

Non-Deterministic Testing: Develop automation strategies for probabilistic outputs, using model-based evaluation to "test the tester."

Building "Eval" Pipelines: Create and maintain "Golden Datasets" to benchmark agent performance across different versions of prompts and models.

Tool-Use Validation: Build automated tests to verify that agents call the correct functions/APIs with the right parameters in complex multi-step workflows.

Regression Testing for Prompts: Monitor how subtle changes in prompt engineering or model updates (e.g., moving from GPT-4 to Claude 3.5) affect the product’s reliability.

Latency & Token Monitoring: Integrate performance testing into the CI/CD pipeline to track agent reasoning time and cost-efficiency.

Hallucination Detection: Develop automated checks to identify and report AI hallucinations, bias, or "jailbreak" attempts.

Collaboration: Work closely with AI Engineers to translate "vague" business requirements into measurable, automated test cases. Required Skills & Qualifications

Experience: 10+ years in QA Automation, with a recent focus on AI/ML or LLM-based applications.

Python Proficiency: Expert-level Python skills (the industry standard for AI testing) and experience with testing frameworks like Pytest.

AI Testing Tools: Familiarity with AI evaluation frameworks such as LangSmith, DeepEval, RAGAS, or Promptfoo.

API & Backend Testing: Deep experience with Playwright, Selenium, or Cypress for UI, but a heavy focus on API-level testing and database validation.

Statistical Mindset: Understanding that AI testing often requires "scoring" (e.g., 85% accuracy) rather than a simple binary pass/fail.

Data Skills: Ability to work with SQL and JSON to validate data retrieved by agents during RAG (Retrieval-Augmented Generation) processes. Preferred Qualifications Experience testing Multi-Agent Systems (where one agent tests another). Knowledge of Prompt Engineering and how it influences software behavior. Background in Investment Banking or Fintech (if applicable) to understand high-stakes data accuracy. Compensation, Benefits and Duration

QA Engineer (Automation)- Dallas, TX

Top 3 responsibilities

Must-have skills

Nice-to-haves

Location & work type

Full Description