Role & seniority: AI Test Architect (senior level) at Caseware; fully remote, Colombia-based; reports to Jai Joshi.

Stack/tools

Cloud/infra: AWS (serverless, microservices), IaC (Terraform/CloudFormation)
CI/CD & automation: GitHub CI/CD; Playwright/Cypress; AI-generated tests; self-healing automation
AI/LLM tooling & evaluation: LangChain/LangSmith/LangGraph, LangFuse, LangSmith, DeepEval, RAGAS, Arize Phoenix
Testing & evaluation: LLM evaluation tools, red-teaming concepts; tool-calling and multi-agent workflows
Data & governance: synthetic data generation, data masking, governance for ethical AI testing

Top 3 responsibilities

Design and implement the Quality Intelligence platform using generative AI for defect prediction, test generation, self-healing automation, and SDLC integration.
Develop LLM/agent evaluation frameworks with benchmarks, red-teaming, adversarial testing, and observability; establish metrics (faithfulness, safety, bias) and governance.
Architect AI-enabled testing in CI/CD, build self-healing test frameworks, secures data/privacy, and drive cross-functional adoption of AI quality practices.

Must-have skills

8+ years in Quality Engineering/Test Architecture for cloud-native SaaS; 2+ years in AI/ML/LLM testing
AWS (serverless/microservices) and Terraform/CloudFormation; GitHub CI/CD
Proficiency in JavaScript/TypeScript and/or Python
Experience designing/testing LLM-based apps and frameworks

Full Description

Caseware is one of Canada's original Fintech companies, having led the global audit and accounting software industry for over 30 years, with more than 500,000 users across 130 countries and available in 16 different languages. While you might not have heard of us (yet) over 36,000 accounting and audit professionals list Caseware as a skill on their LinkedIn profiles!

Why This Role Matters As a leader in cloud-native SaaS, we are accelerating our shift to an AI-first future—embedding generative AI and autonomous agents across our platform to deliver smarter, faster user experiences. We are on the lookout for a visionary AI Test Architect to build the next-generation "Quality Intelligence" platform: one that leverages generative AI for automated test creation, self-healing execution, predictive defect analytics, and rigorous validation of our AI features built inhouse for our global audience.

As our foundational AI Test Architect, you'll design scalable, ethical frameworks that ensure reliability, safety, and compliance while accelerating release velocity (targeting 30-50% faster cycles through AI-augmented testing). Your work will reduce risk in production AI agents, minimize hallucinations/bias/security exposures, and empower the entire engineering organization to adopt AI-augmented quality practices that supplement traditional mature frameworks we have. This high-impact role sits at the intersection of Platform Engineering, AI, and Quality—shaping how we build trustworthy intelligence at scale.

📍 Location: This is a fully remote position located in Colombia.

You will be reporting to

Jai Joshi

Contact

Maira Russo - Senior Talent Acquisition Partner
\n

What You’ll Be Doing

AI-Driven Quality Strategy & Architecture

Architect a comprehensive "Quality Intelligence" platform using generative AI to predict defect hotspots, intelligently optimize regression suites, auto-generate tests, and enable self-healing automation.
Define enterprise-wide AI-first testing strategy, including non-deterministic evaluation paradigms, continuous monitoring for drift/hallucination, and integration across the full SDLC.
Establish governance for ethical AI testing, aligning with emerging standards

LLM & Agent Evaluation Frameworks

Design and implement advanced benchmarks, red teaming protocols, and adversarial testing for internal AI agents and generative features—focusing on hallucination rates, bias/fairness, prompt injection, jailbreaks, and goal misalignment.
Build evaluation pipelines with statistical rigor (e.g., multi-trial runs, LLM-as-judge, human-in-the-loop) using tools like LangFuse, LangSmith, DeepEval, RAGAS, or Arize Phoenix for metrics such as faithfulness, context precision, and safety compliance.

AI Test Architect

Stack/tools

Top 3 responsibilities

Must-have skills

Full Description

You will be reporting to

Contact

What's in it for you

About Caseware

Background Check

Security and Fraud