Cookies & analytics consent
We serve candidates globally, so we only activate Google Tag Manager and other analytics after you opt in. This keeps us aligned with GDPR/UK DPA, ePrivacy, LGPD, and similar rules. Essential features still run without analytics cookies.
Read how we use data in our Privacy Policy and Terms of Service.
🤖 15+ AI Agents working for you. Find jobs, score and update resumes, cover letter, interview questions, missing keywords, and lots more.

bebo Technologies • Bhubaneshwar, Odisha, India
Role & seniority: QA Engineer (3–5 years total experience; at least 2 years in AI/ML testing)
Stack/tools: Python or Java (both preferred); Web/API automation (Playwright, Selenium, REST Assured); CI/CD integration; QA methodologies; LLM evaluation and RAG concepts; vector databases, embeddings
Test and validate LLM outputs for accuracy, completeness, consistency, usability, and hallucination analysis
Evaluate RAG systems (retrieval accuracy, document relevance, context construction, full response flows) and test AI agents/autonomous workflows
Design AI-specific test strategies, develop evaluation frameworks and benchmarks, analyze large AI-generated outputs to identify patterns and root causes
3–5 years QA experience with 2+ years in AI/ML testing (LLMs, RAG, AI agents)
Proficiency in AI/ML concepts; LLM evaluation, hallucination types, prompt behavior, response scoring, quality metrics
Strong QA methodologies, test design, functional/non-functional testing, defect life cycle
Knowledge of RAG concepts, vector databases, embeddings, retrieval relevance
Coding in Python or Java; automation scripting; test framework maintenance; CI/CD integration
Ability to analyze large AI output datasets; strong analytical reasoning and communication
Experience testing conversational AI, agent-based systems, or autonomous workflows
Exposure to vector search tools and embedding qualit
Description
BE, B.Tech, M. Tech, MCA or equivalent degree in Computer Science (or related field). 3-5 years of QA experience, with at least 2 years focused on AI/ML testing (LLMs, RAG, AI agents). Proficiency in Artificial Intelligence, Machine Learning, AI Agents, and Large Language Models (LLMs). Strong knowledge of QA methodologies, test design, functional/non-functional testing, and defect lifecycle management. Solid understanding of LLM evaluation, hallucination types, prompt behavior, response scoring, and quality metrics. Good understanding of RAG concepts including vector databases, embeddings, and retrieval relevance. Coding proficiency in Python or Java (both preferred). Hands-on experience in Web/API automation using frameworks like Playwright, Selenium, or REST Assured. Experience in writing automation scripts, maintaining test frameworks, and integrating test suites into CI/CD pipelines. Ability to analyze large sets of AI outputs for patterns and systemic issues. Excellent analytical reasoning, problem-solving, and communication skills.
Job Responsibilities
Test and validate LLM outputs, ensuring accuracy, correctness, completeness, consistency, usability, and hallucination analysis. Evaluate RAG systems, including retrieval accuracy, document relevance, context construction, and full response generation flows. Test AI agents and autonomous workflows, validating decision-making, task execution, and error handling.
Design and execute AI-specific test strategies: dataset creation, edge-case testing, adversarial testing, pattern-based testing, and regression validation. Develop evaluation frameworks, scoring rubrics, and benchmarking models for AI quality assessment. Analyze large volumes of AI-generated responses to identify patterns, root causes, and issue clusters, instead of isolated defects. Knowledge of testing conversational AI, workflows, or agent-based systems. Exposure to vector search tools and embedding quality analysis. Validate fixes using new examples from the same pattern category, ensuring true model improvement. Collaborate closely with AI/ML engineers, QA teams, and product managers to improve AI accuracy and performance. Contribute to continuous improvement of AI QA practices, automation, tools, and evaluation datasets.
(ref: hirist.tech) Show more Show less