Cookies & analytics consent
We serve candidates globally, so we only activate Google Tag Manager and other analytics after you opt in. This keeps us aligned with GDPR/UK DPA, ePrivacy, LGPD, and similar rules. Essential features still run without analytics cookies.
Read how we use data in our Privacy Policy and Terms of Service.
🤖 15+ AI Agents working for you. Find jobs, score and update resumes, cover letter, interview questions, missing keywords, and lots more.

bebo Technologies • Chandigarh, Chandigarh, India
Role & seniority: Mid-Senior level, Full-time
Stack/tools: AI QA, LLMs, RAG, AI agents; Python or Java (both preferred); Web/API automation (Playwright, Selenium, REST Assured); CI/CD integration; vector databases, embeddings; evaluation frameworks, benchmarking models
Test and validate LLM outputs and AI agent/autonomous workflows for accuracy, reliability, and hallucination patterns
Design and execute AI-specific test strategies (datasets, edge cases, adversarial and regression testing); develop evaluation frameworks and scoring rubrics
Analyze large volumes of AI-generated responses to identify systemic issues, collaborating with AI/ML engineers, QA, and product managers to drive improvements
3–5 years of QA experience, ≥2 years in AI/ML testing (LLMs, RAG, AI agents)
Strong QA methodologies, test design, defect lifecycle management
In-depth knowledge of LLM evaluation, hallucination types, prompt behavior, quality metrics
RAG concepts: vector databases, embeddings, retrieval relevance
Coding: Python or Java (both preferred)
Web/API automation: Playwright, Selenium, or REST Assured
Automation scripting, test framework maintenance, CI/CD integration
Analytical reasoning, problem-solving, and communication
Experience with AI QA practices, automation tooling, and evaluation datasets
Exposure to pattern-based/adversarial testing at scale
Familiarity
Job Responsibilities Test and validate LLM outputs, ensuring accuracy, correctness, completeness, consistency, usability, and hallucination analysis. Evaluate RAG systems, including retrieval accuracy, document relevance, context construction, and full response generation flows. Test AI agents and autonomous workflows, validating decision-making, task execution, and error handling.
Design and execute AI-specific test strategies: dataset creation, edge-case testing, adversarial testing, pattern-based testing, and regression validation. Develop evaluation frameworks, scoring rubrics, and benchmarking models for AI quality assessment. Analyze large volumes of AI-generated responses to identify patterns, root causes, and issue clusters, instead of isolated defects. Knowledge of testing conversational AI, workflows, or agent-based systems. Exposure to vector search tools and embedding quality analysis. Validate fixes using new examples from the same pattern category, ensuring true model improvement. Collaborate closely with AI/ML engineers, QA teams, and product managers to improve AI accuracy and performance. Contribute to continuous improvement of AI QA practices, automation, tools, and evaluation datasets. Job Requirement
B. Tech or equivalent degree in Computer Science (or related field).
Seniority level Mid-Senior level Employment type Full-time Job function Information Technology Industries Software Development