Cookies & analytics consent
We serve candidates globally, so we only activate Google Tag Manager and other analytics after you opt in. This keeps us aligned with GDPR/UK DPA, ePrivacy, LGPD, and similar rules. Essential features still run without analytics cookies.
Read how we use data in our Privacy Policy and Terms of Service.
🤖 15+ AI Agents working for you. Find jobs, score and update resumes, cover letter, interview questions, missing keywords, and lots more.

BharatGen • pune, Maharashtra, India
Role & seniority: Senior AI Evaluation & Test Engineer
Stack/tools: Python; scripting; test automation (Pytest, Selenium, Robot Framework); observability/tracing (logs, spans, session tracking); AI concepts (RAG, prompt engineering, explainability, guard rails); CI/CD release gates; familiarity with AI evaluation frameworks (e.g., Arize, Braintrust, DeepEval, LangSmith, Ragas) is a plus
Build and maintain AI evaluation pipelines to test, measure, and evaluate AI system behavior and performance
Define AI quality metrics/KPIs (factuality, faithfulness, toxicity, grounding precision/recall, latency, cost) with clear acceptance bars; implement release gates in CI/CD
Implement automated evaluation/testing (end-to-end and regression) and assist root-cause analysis; collaborate cross-functionally to shape user-facing AI behavior
BS/MS in CS/CE/IT/EE or related field; 5+ years in software testing with at least 2 years evaluating AI/ML products
Strong testing fundamentals: test plans, test cases, reports/dashboards; analytical debugging; attention to detail
Proficiency in Python and automation frameworks (Pytest, Selenium, Robot Framework)
Working knowledge of generative AI models and related concepts; understanding differences between traditional software testing and AI evaluation
Team player, good communication, able to work in fast-paced/startup environments
Strong software testing fundamentals and expertise in writing test plans, executing test cases, and generating detailed reports and dashboards.
Strong analytical and debugging skills, and attention to detail.
Proficiency in Python, scripting, and software testing automation frameworks and tools such as Pytest, Selenium, Robot Framework, etc.
Working knowledge of generative AI models, AI agents, and related concepts such as retrieval augmented generation (RAG), prompt engineering, context engineering, explainability, traceability, observability, guard rails, reasoning, specificity, etc.
Sound understanding of the fundamental differences in the approach for testing conventional software versus evaluating generative AI systems.
Team player with excellent interpersonal skills and the ability to collaborate effectively with remote and cross-functional team members.
Go-getter attitude and ability to flourish in a fast-paced, startup environment.
Experience in any of the following would be a big plus -
AI evaluation frameworks such as Arize, Braintrust, DeepEval, LangSmith, Ragas
AI safety and red teaming experience, e.g., prompt injection, jailbreak, adversarial and stress testing.
Different types of AI evaluation methods, e.g, Human-in-the-loop, LLM-as-a-Judge.