Role & seniority: AI Test Engineer with mid–senior level; 5+ years software testing, at least 1 year focused on genAI

Stack/tools: Python, pytest, Allure; AI testing tools/frameworks; GitHub Actions, GitLab CI/CD; AWS; Grafana; Jira/GitLab/GitHub; familiarity with LLMs, RAG systems; prompt-security concepts

Top 3 responsibilities

Design and implement Generative AI/LLM testing strategy, including model response evaluation and RAG pipeline accuracy
Develop and maintain QA frameworks and automated testing for AI systems (integration, performance, AI-specific tests)
Identify edge cases, bias/fairness issues, security risks, and conduct comprehensive test documentation and risk-based testing

Must-have skills

5+ years software testing; 1+ year genAI focus
Proficient in Python; strong testing framework experience (pytest, Allure)
Experience with AI testing tools, LLM/RAG knowledge, test automation, CI/CD pipelines
Cloud testing (AWS), monitoring/observability (Grafana), and ALM tools (Jira, GitHub, GitLab)
Security testing basics for AI applications; strong requirements analysis

Nice-to-haves

Prior experience with Roche testing frameworks or similar AI QA frameworks
Deep understanding of AI failure modes, prompt injection risks, data leakage mitigation
Location & work type: Location not specified; work type not stated; global company with diverse, international team

Full Description

Billennium is a global technology company with over 20 years of experience, committed to innovation and empowering businesses. As an employer, we offer a supportive, growth-focused environment where collaboration and creativity thrive. Join us to shape the future of technology together!

Job Description

We are seeking a highly skilled AI Test Engineer specializing in testing Generative (and non-Generative) AI applications to join our team. The ideal candidate will have expertise in both traditional software testing and specialized testing methodologies for Large Language Models (LLMs)-based and AI-based systems. This role requires deep understanding of LLM behaviors, and testing frameworks to ensure the reliability, fairness, and effectiveness of our AI-powered solutions.

Key Responsibilities

Generative AI Testing Strategy: Design and implement comprehensive testing strategies specifically tailored for LLM-based applications, including evaluation of model responses, RAG pipelines accuracy, and overall system reliability

Quality Assurance Framework Development: Utilize Roche's testing frameworks that address both traditional software quality aspects and AI-specific concerns such as output consistency, contextual accuracy, and ethical compliance; co-create and maintain such frameworks when required

Test Automation Development: Design and implement automated testing solutions for continuous evaluation of LLM applications, including integration tests, performance tests, and specialized AI behavior tests

Edge Case Analysis: Identify and develop test scenarios for edge cases in LLM behavior, including handling of ambiguous inputs, potential biases, and unexpected response patterns

Bias and Fairness Testing: Design and execute tests to identify potential biases in model outputs and ensure fair treatment across different user groups and use cases

Security Testing: Collaborate within development teams to test for potential vulnerabilities specific to LLM applications, including prompt injection, data leakage, and other AI-specific security concerns

Test Documentation: Create and maintain comprehensive test documentation testing strategy, test cases, and testing guidelines specific to AI applications and compliant with Roche practices; document the analysis of requirements and risks and develop tests accordingly

Performance Testing: Collaborate with other engineers to conduct thorough performance testing of GenAI applications, including response time analysis, load testing, and resource utilization monitoring

AI Test Engineer