Hypatos GmbH logo

QA Tester

Hypatos GmbH New Territories, Hong Kong, China

onsitefull-time
Posted Feb 20, 2026

Role & seniority: Senior Automated QA Engineer (SDET) leading testing efforts for AI features and pipelines.

Stack/tools: Playwright (TypeScript/Python), CI/CD integration (GitHub Actions, GitLab CI), familiarity with LLM/testing tools (e.g., LangSmith, DeepEval); experience with frontend/backend/AI/LLM pipelines, containerization (Docker/Kubernetes a plus).

Top 3 responsibilities

  1. Own, build, maintain, and scale the E2E automated testing framework using Playwright.

  2. Design strategies to test non-deterministic AI outputs, multi-step AI agents, RAG pipelines, prompt drift, hallucinations, latency, and context window limits; create edge-case scenarios.

  3. Integrate tests into CI/CD, collaborate with AI researchers, backend engineers, and product managers to define quality for AI agents.

Must-have skills

  • 5+ years in QA Automation/SDET.

  • Deep Playwright expertise; manage flaky tests, parallel execution, complex DOMs.

  • Strong coding skills in TypeScript, JavaScript, or Python.

  • AI/LLM testing experience; understanding of non-determinism, evaluation metrics, and related tooling.

  • Systems thinking across frontend, APIs, vector databases, and LLM endpoints; solid communication with technical and non-technical stakeholders.

Nice-to-haves

  • Experience in fintech or document automation.

  • Familiarity with Docker/Kubernetes and advanced CI/CD setups.

  • Experience testing API performance and LLM endpoint latency.

  • Location & work type: Not spe

Full Description

Your mission We are looking for a Senior Automated QA Engineer to lead our testing efforts. You won't just be testing standard web interfaces; you'll be figuring out how to reliably automate testing for non-deterministic AI features, multi-step AI agents, and complex LLM pipelines. If you know Playwright inside and out and have scars from trying to test LLM hallucinations in production, we want to talk to you. What you’ll be doing

Own the E2E framework: Build, maintain, and scale our automated testing framework using Playwright (TypeScript/Python).

Test the unpredictable: Design strategies to test non-deterministic LLM outputs, AI agents, and RAG pipelines where standard assertions don't always work.

Tackle LLM-specific challenges: Build guardrails and automated checks for prompt drift, hallucinations, latency, and context window limits.

Evaluate Agent behavior: Create scenarios to test how our AI agents handle edge cases, multi-step reasoning, and error recovery in real-world document processing workflows.

Integrate and collaborate: Wire your tests into our CI/CD pipelines to ensure we can ship quickly without breaking the core AI logic. Work closely with AI researchers, backend engineers, and product managers to define what "quality" means for an AI agent.

Your profile What we’re looking for

Experience: 5+ years in QA Automation or Software Engineering in Test (SDET).

Playwright expertise: You have deep, hands-on experience building reliable, scalable test suites in Playwright. You know how to handle flaky tests, parallel execution, and complex DOM structures.

Coding chops: Strong programming skills in TypeScript, JavaScript, or Python.

AI/LLM testing experience: You understand how LLMs work under the hood. You know the challenges of testing them (non-determinism, evaluating accuracy vs. exact match, security/injection risks) and have used tools or frameworks (like LLM-as-a-judge, LangSmith, DeepEval, etc.) to evaluate them.

Systems thinking: You can look at a complex architecture involving a frontend, backend APIs, vector databases, and LLM endpoints, and know exactly where things are likely to break.

Communication: You can clearly explain complex QA issues to both highly technical machine learning engineers and non-technical stakeholders.

Bonus points if you have

  • Experience in the financial tech or document automation space.
  • Familiarity with containerization (Docker, Kubernetes) and advanced CI/CD setups (GitHub Actions, GitLab CI).
  • Experience testing API performance and LLM endpoint latency.

Our Promise We trust amazing people to do amazing things and make a long-term impact - we give you Freedom and ownership of meaningful work that directly impacts the business We're building a positive organizational culture where personal and professional growth are just as important as business growth We believe different perspectives make Hypatos a better community - that is why we're committed to building a diverse and inclusive environment where you feel you belong Beyond a top market compensation package including company shares, you will enjoy a personal development budget, meal allowance, sporting activities and free beers :)

PlaywrightTypeScriptPythonLLM TestingAI AgentsRAG PipelinesCI/CD IntegrationE2E FrameworkPrompt Drift GuardrailsHallucination TestingLatency EvaluationContext Window ManagementEdge Case TestingAPI Performance TestingDockerKubernetesmulti-location

Cookies & analytics consent

We serve candidates globally, so we only activate Google Tag Manager and other analytics after you opt in. This keeps us aligned with GDPR/UK DPA, ePrivacy, LGPD, and similar rules. Essential features still run without analytics cookies.

Read how we use data in our Privacy Policy and Terms of Service.