
Quality Assurance Analyst | $11/hr Remote
Crossing Hurdles • Spain
Role & seniority: LLM – AI Quality Analyst (Personalization) | Short-Term Contract, entry–mid seniority
Stack/tools: Gemini personalization context; multi-turn prompt design; evaluation of past conversations; use of personal Google account and data sources; debugging/data-source verification; data hygiene (delete evaluation conversations)
Top 3 responsibilities
-
Evaluate a personalization feature for Gemini; design and run multi-turn prompts leveraging user data
-
Assess use of past conversations/activity, grounding, and personalization quality; stack-rank responses for helpfulness and naturalness
-
Write clear rationales with references to specific turns; extract/verify debug information; ensure data hygiene
Must-have skills
-
Experience in data annotation, AI quality evaluation, or content moderation
-
Strong Spanish proficiency (reading/writing)
-
Analytical thinking and attention to detail
-
Experience with creative prompt engineering and personalization concepts
-
Ability to provide structured, written feedback; work independently remotely
Nice-to-haves
-
Comfort with using and managing personal data sources for assessment
-
Familiarity with evaluating grounding, inferences, or hallucinations in model outputs
-
Location & work type: Remote (Global); Short-Term Contract (2 months); Part-time availability 30–40 hrs/week; $11/hour
Full Description
Position: LLM – AI Quality Analyst (Personalization) – Spanish
Type: Short-Term Contract (2 months)
Compensation: $11 per hour
Location: Remote (Global)
Commitment: Part-time availability required (30–40 hrs/week)
Role Responsibilities
Evaluate a personalization feature for Gemini Design and execute multi-turn conversational prompts that require the AI to utilize personal information and experiences Assess how effectively the model uses past conversations and activity to generate relevant and helpful responses Evaluate model responses based on intent and appropriate personalization Analyze responses for grounding issues, including flawed inferences or hallucinations Assess integration quality to ensure personal data is incorporated naturally into responses Perform side-by-side evaluations and stack-rank model responses based on helpfulness and naturalness Write clear rationales referencing specific conversation turns Extract and verify debug information to confirm correct use of summaries and data sources Maintain data hygiene by deleting evaluation conversations after completion
Requirements
Experience in data annotation, AI quality evaluation, content moderation, or related roles Strong Spanish proficiency (reading and writing) Willingness to use a primary personal Google account and enable personal data sources for assessment Strong analytical thinking and attention to detail Experience with creative prompt engineering and personalization concepts Ability to provide structured feedback and clear written explanations Ability to work independently in a remote environment
Application Process (Takes 20 Mins)
Fill out the application form Complete the ICF Complete the assessment