**Role & seniority: ** AI Automation Engineer (contract/assessment-based), remote; level not explicitly stated (evaluation/training-focused engineering contributor).

**Stack/tools: ** LLM-based agent evaluation; Python/JavaScript/Go/Java; SQL; modular backend architecture; multi-agent/tool workflows. Nice-to-haves: Supabase, Gmail, other APIs; persistent state/session tracking; security testing for privacy leaks/prompt injection.

**Top 3 responsibilities: **

Evaluate autonomous AI agents by writing objective evaluation rubrics (pass/fail) and debugging agent traces.
Stress test agents for edge cases, prompt injection, and tool misuse.
Provide high-density technical feedback on modular software architecture and multi-turn system behavior to support LLM training.

Must-have skills:
- Backend engineering / AI automation / complex systems integration experience
- Building/maintaining production-grade, modular systems (separation like parsing/logic/reporting)
- Strong command of at least two major languages (Python/JS/Go/Java)
- SQL database experience
- Ability to work in live/non-mocked environments; handle multi-turn interactions
Nice-to-haves:
- Integrations with live tools/APIs (e.g., Supabase, Gmail)
- Persistent state & session tracking patterns
- Identifying privacy leaks, authority escalation, and indirect prompt injection vulnerabilit

Full Description

Job Title: AI Automation Engineer (Remote)

Location: Remote (LATAM, Puerto Rico, Argentina, Peru, Colombia, Brazil, Mexico, Chile, Bolivia, Costa Rica, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Paraguay, Trinidad and Tobago, Uruguay, Venezuela))

Work Mode: Fully Remote

Role Overview Help design and evaluate autonomous AI agents across multiple LLMs, spanning health, education, daily life, and other real-world domains (all coding work). Shape the future of agentic AI systems by providing expert human feedback to leading AI organisations. Help train Large Language Models (LLMs) for complex, multi-step architectural workflows.

Key Responsibilities AI Agent Evaluation Write evaluation rubrics with objective pass/fail criteria Debug agent traces to identify failure patterns Stress test agents against edge cases, prompt injection, and tool misuse Technical Assessment Assess production-grade modular software architecture Analyse multi-turn system interactions and behaviours Provide high-density technical feedback for LLM training Project Workflow Create an account and upload a resume/ID Complete the onboarding assessment Start earning through flexible task assignments

Qualifications Experience in backend engineering, AI automation, or complex systems integration Proven ability to build and maintain production-grade software with modular separation (e.g., distinct services for data parsing, logic processing, and reporting) Strong command of at least two major languages (e.g., Python, JavaScript, Go, or Java) and experience working with SQL databases Practical experience building for live, non-mocked environments and handling multi-turn system interactions

Preferred (Nice to Have) Experience integrating agents with live tools such as Supabase, Gmail, and other APIs Familiarity with persistent state and session-tracking patterns Experience identifying privacy leaks, authority escalation, or indirect prompt injection vulnerabilities

Compensation Hourly compensation ranges from USD $30–$50, depending on experience and task complexity Payments are issued weekly via supported payout platforms (e.g., PayPal or AirTM) Full compensation details are provided prior to task acceptance

Equal Opportunity Statement Selection decisions are based solely on skills, qualifications, and project requirements. We are committed to inclusive and fair engagement practices and consider all qualified applicants without regard to legally protected characteristics.

Apply Now!