Role & seniority: AI SDET (Senior/lead-quality engineering focus), 7+ years QE/SDET in cloud-native SaaS; 2+ years hands-on AI/ML/LLM experience.

Stack/tools: cloud-native SaaS; CI/CD (Jenkins, GitHub Actions); JS/TS (Python/Java basics); AI evaluation frameworks (Ragas, DeepEval, LangChain/LangSmith/LangFuse); test automation; observability (New Relic); performance/load tools (K6, JMeter) optional.

Top 3 responsibilities

Define and scale an AI-first quality strategy; integrate AI enhancements into CI/CD; establish scalable testing for hyper-growth and AI data pipelines.
Design deterministic/statistical tests for non-deterministic LLM/agentic systems; build automated evaluation pipelines for correctness, faithfulness, retrieval, and multi-agent flows; implement red-teaming, bias/fairness checks, and guardrails.
Lead end-to-end testing across prompts, datasets, embeddings, model versions, RAG pipelines; partner with product/data science/AI eng; drive metrics, dashboards, and continuous improvement; mentor others.
Must-have skills: strong automated testing infra, CI/CD, full-stack testing (frontend/backend/API); proven LLM/AI agent/RAG testing experience; proficiency in JS/TS (and Python/Java); experience with AI evaluation tools; excellent communication; BS/MS in CS or related.
Nice-to-haves: performance/stress testing experience; observability familiarity (New Relic); red-teaming and ethical AI practices; certifications (ISTQB AI Testing); open-source c

Full Description

Caseware is one of Canada's original Fintech companies, having led the global audit and accounting software industry for over 30 years, with more than 500,000 users across 130 countries and available in 16 different languages. While you might not have heard of us (yet) over 36,000 accounting and audit professionals list Caseware as a skill on their LinkedIn profiles!

We are at the forefront of AI adoption in our cloud-native SaaS platform, building intelligent, agentic features that transform how users interact with our product. As an AI SDET, you'll pioneer and scale AI-driven testing practices from the ground up—fast-tracking reliable, safe, and high-performing AI capabilities across the organization. You will contribute in areas to reduce deployment risks, minimize hallucinations and drift, ensure ethical AI, and drive faster releases (targeting 20-40% velocity gains through automated validations). This is a high-impact, foundational role in Platform Engineering's Quality function, where your work will directly influence product trust, compliance, and innovation for our end users.

📍 Location: This is a fully remote position located in Colombia.

You will be reporting to

Jai Joshi

Contact

Maira Russo - Senior Talent Acquisition Partner
\n

What You’ll Be Doing Quality & AI-First Mindset Evolve a modern, AI-first quality strategy for our fast-scaling SaaS architecture, including foundational infrastructure and emerging agentic/intelligent systems. Integrate AI enhancements into CI/CD pipelines (e.g., predictive flakiness detection, automated test generation, self-healing scripts) to improve isolation, data setup, & execution reliability using existing/suggesting tools. Establish scalable testing practices that support hyper-growth and petabyte-scale AI data pipelines.

AI-Focused Test Strategy, Automation & Evaluation Design deterministic and statistical testing approaches for non-deterministic LLM-based and agentic systems, addressing hallucinations, prompt injection, bias, drift, and safety risks. Build automated evaluation pipelines and harnesses for correctness, faithfulness, retrieval quality, generation accuracy, tool-calling, planning sequences, and multi-agent flows.

Execute/Develop test frameworks for the full AI lifecycle: prompts, datasets, embeddings, model versions, RAG pipelines (end-to-end validation), and guardrails. Implement red-teaming, bias/fairness checks, and compliance mechanisms; leverage in trend frameworks for metrics and observability. Integrate AI-specific quality signals into CI/CD for automated gating and continuous monitoring.

Cross-Functional & End-to-End Testing Partner closely with product, data science, AI engineering, and dev teams to test AI features, conduct multi-agent simulations, and ensure high-quality roadmap delivery. Facilitate knowledge sharing and upskilling on AI testing best practices across the Quality Function.

Metrics, Observability & Continuous Improvement Drive core metrics (DORA, test coverage/effectiveness) plus AI-specific indicators (e.g., hallucination rate, context precision, drift detection). Build real-time dashboards and support A/B testing of models with post-deployment monitoring.

Culture, Mentorship & Innovation Champion a quality-first, ethical AI mindset organization-wide. Mentor SDET’s, lead workshops on AI risks/validation, and influence design/deploy/incident processes. As a foundational hire, define roadmaps and best practices for sustainable AI quality assurance.

Challenges You'll Tackle Ensuring reliability in agentic systems amid data drift and non-deterministic behavior. Scaling tests for global SaaS while maintaining low hallucination rates and strong safety guardrails. Building evaluation from scratch in a rapidly evolving landscape (e.g., multi-modal, agentic flows).

Success in the First 6 Months Launch foundational AI test frameworks and pipelines, achieving 80-90% coverage for key AI components. Reduce AI-related defect escapes by 30-40% and integrate automated safety/compliance checks into all releases. Establish metrics dashboards and evaluation loops that enable data-driven iteration on intelligent features.

What You Will Bring 7+ years in Quality Engineering/SDET roles within cloud-native SaaS environments, including 2+ years hands-on with AI/ML/LLM systems. Expertise in automated testing infrastructure, CI/CD (Jenkins/GitHub Actions), and test pyramid strategies (unit → E2E). Strong full-stack testing experience (frontend/backend/API) and collaboration with dev teams. Proven experience testing LLMs, AI agents, RAG pipelines, and related risks (hallucinations, prompt injection, bias, drift). Proficiency in JS/TS, working knowledge of Python or Java; experiance with AI evaluation frameworks (e.g., Ragas, DeepEval, LangChain/LangSmith/LangFuse) and other tools you may have proficiency in. Knowledge of performance, Stress and Load testing tools like K6, JMeter, Blazemeter will be nice to have. Knowledge of observability (NewRelic), statistical testing methods, red-teaming, and ethical AI practices. Excellent communication, and coaching skills; ability to thrive in ambiguity and drive innovation. Bachelor's/Master's in Computer Science, AI, or related; certifications (e.g., ISTQB AI Testing) a plus. Strong English language communication and collaboration skills

We value adaptability in this fast-moving field—equivalent experience and a strong portfolio (e.g., open-source contributions, case studies) are highly regarded.

What's in it for you

▪️Innovation is at our core. We work with cutting-edge technology in accounting and financial reporting, constantly pushing the boundaries to create impactful software solutions. ▪️We are committed to a collaborative culture, where your ideas are valued, and knowledge sharing is encouraged within a supportive, inclusive team. ▪️Work-life balance is important to us. We offer flexible work options, remote opportunities, and generous time-off policies to ensure a healthy work-life balance. ▪️We offer competitive compensation, including a competitive salary and comprehensive benefits such as health insurance and retirement plans. ▪️We are driven by impactful work. Your contributions directly affect how our clients manage financial processes and drive their success. ▪️Recognition and rewards matter to us. We celebrate hard work through recognition programs, performance bonuses, and opportunities for career growth. ▪️We embrace global opportunities. Work on international projects and collaborate with a diverse, global team.

About Caseware

Caseware's cutting-edge software products are meticulously designed for accounting firms, corporations, and governments. Our teams are continually collaborating, innovating, and building upon our existing suite of products. With a customer-focused mindset, we are building technology that is shaping what the future of audits, financial reporting, and financial data analytics will look like.

With a recent strategic investment from Hg Capital in 2020, Caseware is now in its next major growth phase as we double down on the people and products that have made Caseware so successful to date.

One of Caseware's core values is Many Voices, One Team and with that in mind, we're dedicated to building teams as diverse as our customers in an equitable and inclusive way. We welcome and encourage candidates of all backgrounds to apply. Should you require accommodations or have any questions at any point during the application or interview process, please e-mail our People Operations team at talent@caseware.com.

Background Check

Any candidates successful in obtaining an offer for a position will need to successfully complete a background check through Certn.co which typically includes an Identity Verification and Criminal Record Check. Executives and Senior Managers will undergo a Soft Credit Check as well. Candidates residing in the Netherlands and Germany are excluded from undergoing background checks via Certn.co

Security and Fraud

Caseware takes the security of candidates seriously. All legitimate communication from us will come from email addresses ending in @caseware.com and our open positions are always listed on reputable job boards and on our website https://jobs.lever.co/caseware. We will NEVER ask for payment or financial information from you. If you receive an unsolicited job offer, proceed with extreme caution.

Senior AI Software Developer in Test