Role & seniority: Software Engineer (test), mid-level (BS + ~3+ years)

Stack/tools: Python; production-grade automation; ML evaluation workflows; agent frameworks (LangGraph, AutoGen, CrewAI); CI/CD for ML; SQL/DBs; synthetic data generation

Top 3 responsibilities

Design and implement end-to-end automated evaluation pipelines for LLMs
Orchestrate multiple LLMs to generate test data, stress models, and identify failure modes
Enable safe, scalable model deployment through automated testing and evaluation processes; collaborate with ML engineers and data scientists

Must-have skills

BS degree and 3+ years relevant experience
Strong Python and production-grade automation skills
Understanding of SDLC, testing methodologies, QA concepts
Experience designing or implementing agentic/multi-step LLM workflows
Experience generating and validating synthetic data

Nice-to-haves

2+ years in test automation or QA/testing engineering
Experience with agent frameworks (LangGraph, AutoGen, CrewAI) and human-in-the-loop evaluation
CI/CD knowledge for ML/evaluation workflows
Database experience (SQL); ability to multi-task and lead tasks with varying priorities
Strong written and verbal communication for documentation
Location & work type: Location and work type not specified in the provided text; no explicit remote/on-site designation.

Full Description

We live in a mobile and device-driven world where Deep Learning technology enables a new class of applications. We are looking for software development engineer to design and build agentic systems for Large Language Model (LLM) evaluation and synthetic data generation. Imagine the countless possibilities powered by Artificial Intelligence! Are you passionate about enabling unique user experiences on Apple products; such as Apple Vision Pro, iPhone, iPad, Apple Watch and the Mac? In the Video Engineering team, we are dedicated to providing hardware software solutions and execution of Deep Learning workloads. Our success is the result of very dynamic people working in an environment which cultivates creativity, partnership and cross-functional collaboration. These elements come together to make Apple an amazing environment for motivated people to do the greatest work of their lives!

DESCRIPTION

As a Software Engineer in the test role, you will collaborate with world-class machine learning engineers and data scientists to understand the features you will support. In this role, you will create end-to-end automated evaluation pipelines that orchestrate multiple LLMs to generate test data, stress models, identify failure modes, and enable safe, scalable model deployment. This is a highly technical, hands-on role at the intersection of AI systems engineering, evaluation science, and automation.

MINIMUM QUALIFICATIONS

BS and a minimum of 3 years relevant industry experience Strong Python skills with experience building production-grade automation Strong knowledge of software development lifecycle, testing methodologies, QA terminology and processes Experience designing or implementing agentic or multi-step LLM workflow Experience generating and validating synthetic data

PREFERRED QUALIFICATIONS

2+ years experience in test automation or related areas, background in QA, test engineering Experience with agent frameworks such as LangGraph, AutoGen, CrewAI or similar Experience building human-in-the-loop evaluation system Knowledge of CI/CD pipelines for ML or evaluation workflows Ability to multi-task and lead tasks with varying priorities Experience in popular Database management software, e.g. SQL Excellent written and verbal interpersonal skills, be able to describe and document clearly

Software Development Engineer - Test

Top 3 responsibilities

Must-have skills

Nice-to-haves

Full Description