Role & seniority: Senior SDET / QA Automation Engineer focused on AI/LLM testing and evaluation

Stack / tools: LLM testing/evaluation tools (MaximAI, OpenAI Evals, TruLens, Promptfoo, LangSmith); CI/CD for AI tests; Go, Java, or Python; Git, Docker; cloud platforms (AWS, GCP, Azure); prompt engineering, embeddings, RAG; experience with OpenAI/Anthropic/Hugging Face APIs

Top 3 responsibilities

Own end-to-end qualification lifecycle for AI/LLM systems from ideation to CI/CD integration; design scalable automated test suites across unit, integration, regression, and system levels
Lead design/automation of LLM-powered features (prompt pipelines, RAG workflows, AI-assisted developer tools) and develop evaluation pipelines (factual accuracy, hallucination, bias, robustness)
Define metrics-driven quality gates, experiment tracking, monitoring/alerting for real-time LLM production quality, and mentor junior engineers; collaborate in design reviews to shift-left defects

Must-have skills

4+ years in software development, SDET, or QA automation
Proficiency in Go, Java, or Python; proven test automation framework design; CI/CD with automated regression/evaluation testing
Hands-on with LLMs/GenAI; 2+ years with LLM APIs/frameworks (OpenAI, Anthropic, Hugging Face); strong prompt engineering, embeddings, RAG; bias/hallucination detection and AI safety testing
Strong analytical, leadership, teamwork, and cross-functional collaboration

Nice-to-haves

Full Description

Description

Who We Are.

Mindtickle is the market-leading revenue productivity platform that combines on-the-job learning and deal execution to get more revenue per rep.

Mindtickle is recognized as a market leader by top industry analysts and is ranked by G2 as the #1 sales onboarding and training product.

Were honoured to be recognized as a Leader in the first-ever Forrester Wave: Revenue Enablement Platforms, Q3 2024!.

Whats in it for you ?

Own the end-to-end qualification lifecycle for AI/LLM systems from ideation and implementation to CI/CD integration. Design and implement scalable automated test suites across unit, integration, regression, and system levels. Build and enhance frameworks to test, evaluate, and continuously improve complex AI and LLM workflows. Lead the design and automation of LLM-powered features, including prompt pipelines, RAG workflows, and AI-assisted developer tools. Develop evaluation pipelines to measure factual accuracy, hallucination rates, bias, robustness, and overall model reliability. Define and enforce metrics-driven quality gates and experiment tracking workflows to ensure consistent, data-informed releases. Collaborate with agile engineering teams, participating in design discussions, code reviews, and architecture decisions to drive testability and prevent defects early (shift left). Develop monitoring and alerting systems to track LLM production quality, safety, and performance in real time. Conduct robustness, safety, and adversarial testing to validate AI behavior under edge cases and stress scenarios. Continuously improve frameworks, tools, and processes for LLM reliability, safety, and reproducibility. Mentor junior engineers in AI testing, automation, and quality best practices. Measure and improve Developer Experience (DevEx) through tools, feedback loops, and automation. Champion quality engineering practices across the organization, ensuring delivery meets business goals, user experience, cost of operations etc.

Wed Love To Hear From You, If You

LLM testing & evaluation tools: MaximAI, OpenAI Evals, TruLens, Promptfoo, LangSmith.

Building LLM-powered apps: prompt pipelines, embeddings, RAG, AI workflows. CI/CD design for application + LLM testing. API, performance, and system testing. Git, Docker, and cloud platforms (AWS / GCP / Azure). Bias, fairness, hallucination detection & AI safety testing. Mentorship and cross-functional leadership.

Preferred Qualifications

Bachelors or Masters in Computer Science, Engineering, or equivalent. 4+ years in software development, SDET, or QA automation. Proficiency in GoLang, Java, or Python. Proven experience building test automation frameworks. Proven ability to design CI/CD pipelines with automated regression and evaluation testing. Hands-on exposure to LLMs, GenAI applications. 2+ years of hands-on experience with LLM APIs and frameworks (OpenAI, Anthropic, Hugging Face). Proficient in prompt engineering, embeddings, RAG, and LLM evaluation metrics. Strong analytical, leadership, and teamwork skills. Excellent communication and collaboration across teams.

Mindtickle - Software Development Engineer II - LLM Testing

Top 3 responsibilities

Must-have skills

Nice-to-haves

Full Description