Role & seniority: LLM / AI Quality Engineer, mid-senior (3+ years in software testing/QA)

Stack/tools: Python/TypeScript; API and performance testing; CI/CD; cloud basics (AWS/Azure/GCP); microservices; observability/evaluation tools (e.g., LangSmith, Weights & Biases, TruLens, Guardrails); data pipelines for RAG

Top 3 responsibilities

Lead end-to-end evaluation of AI applications (LLM features, RAG, multi-agent workflows) across offline, pre-prod, and prod with test design, execution, and reporting
Design and validate non-functional aspects: performance, latency, cost, safety, security; implement CI/CD integrated tests and canary/A-B testing
Ensure quality and compliance: conduct rubric-based reviews, guardrails validation, data residency/PII controls, and produce risk-aware decision reports

Must-have skills

3+ years in software testing/QA with API and performance testing; strong test methodology and tooling
Programming familiarity (Python/TypeScript); CI/CD and version control; cloud basics; microservices
Experience with test design for AI/ML systems and evaluation/observability tooling

Nice-to-haves

ML/MLOps concepts, production model validation and monitoring; AI security testing
Experience with Azure OpenAI/Bedrock/Vertex; token accounting; RAG evaluation tools (e.g., LangSmith, Weights & Biases, Promptfoo)
Automation frameworks (Playwright/Cypress/Selenium); API tools; k6/JMeter
Location & work type: Asia Pacific region

Full Description

NCS is a leading technology services firm that operates across the Asia Pacific region in over 20 cities, providing consulting, digital services, technology solutions, and more. We believe in harnessing the power of technology to achieve extraordinary things, creating lasting value and impact for our communities, partners, and people. Our diverse workforce of 14,000 has delivered large-scale, mission-critical, and multi-platform projects for governments and enterprises in Singapore and the APAC region.

Job Description

What you will do

As a LLM / AI Quality Engineer, you will lead the end-to-end evaluation of AI applications—LLM features, RAG systems, and multi-agent workflows—to ensure they meet business outcomes, safety requirements, and platform standards. Own test design, execution, and reporting across offline, pre-prod, and in-prod stages, integrating with CI/CD and working closely with product, data, and platform teams.

AI/LLM Evaluation & Test Design

Define evaluation strategies (golden sets, adversarial suites, regressions), pass/fail gates, and SLOs for quality, safety, latency, and cost. Establish rubric-based human reviews (usefulness, faithfulness, safety, clarity) and calibrate annotators. Instrument LLM-as-judge where appropriate with calibration and spot checks.

RAG, Retrieval, & Grounding

Measure retrieval precision/recall, MRR/nDCG, and answer faithfulness to sources; detect hallucination and citation errors. Test chunking, prompt templates, filters, and policy chains; monitor stale/poisoned content.

Agentic & Tool-Use Scenarios

Validate multi-step plans, tool selection, error recovery, retries, and idempotency for functions with side effects. Contract-test JSON schemas and structured outputs across services.

Non-Functional, Performance & Cost

Run token-aware load/soak tests (context length, temperature, batching); track p50/p95/p99, throughput, timeouts, cache hit rate, and cost per successful task. Recommend optimizations (prompt/policy changes, retrieval tweaks, caching).

Security, Privacy & Safety

Red-team for prompt injection, data exfiltration, indirect injections via retrieved content; validate guardrails pre/post inference. Enforce PII controls, data-residency, and compliance checks; align with organizational security testing practices.

Observability & CI/CD Integration

Implement prompt/dataset/version lineage and trace-based evals; automate in CI (pre-merge golden tests, nightly adversarials) with canary/A-B in prod and rollback criteria. Produce clear, decision-ready reports with risk assessments and release recommendations.

LLM / AI Quality Engineer

Top 3 responsibilities

Must-have skills

Nice-to-haves

Full Description

What you will do

The ideal candidate should possess