
Quality Assurance Specialist | $11/hr Remote
Crossing Hurdles • United States
Role & seniority: LLM – AI Quality Analyst (Personalization) – Japanese; short-term contract (2 months), part-time (30–40 hrs/week)
Stack/tools: data annotation / AI quality evaluation / content moderation; strong Japanese reading/writing; work with personal Google account and personal data sources; remote/solo work setup
Top 3 responsibilities
-
Evaluate personalization feature for Gemini; design/execute multi-turn prompts using personal information and experiences
-
Assess use of past conversations/activity, grounding, avoidance of hallucinations; perform side-by-side evaluations and rank model responses
-
Write clear rationales tied to specific turns; extract/verify debug info; maintain data hygiene by deleting evaluation conversations
Must-have skills
-
Strong Japanese proficiency (reading/writing)
-
Experience in data annotation, AI quality evaluation, or content moderation
-
High attention to detail, strong analytical thinking; ability to provide structured feedback and clear written explanations
-
Experience with creative prompt engineering and personalization concepts; independent remote work capability
Nice-to-haves
-
Familiarity with grounding, data sources, and debugging data provenance
-
Experience working with AI model evaluation at the turn-level; ability to organize and present findings clearly
-
Location & work type: Remote (Global); short-term, contract-based; part-time commitment 30–40 hrs/week
Full Description
Position: LLM – AI Quality Analyst (Personalization) – Japanese
Type: Short-Term Contract (2 months)
Compensation: $11 per hour
Location: Remote (Global)
Commitment: Part-time availability required (30–40 hrs/week)
Role Responsibilities
Evaluate a personalization feature for Gemini Design and execute multi-turn conversational prompts that require the AI to utilize personal information and experiences Assess how effectively the model uses past conversations and activity to generate relevant and helpful responses Evaluate model responses based on intent and appropriate personalization Analyze responses for grounding issues, including flawed inferences or hallucinations Assess integration quality to ensure personal data is incorporated naturally into responses Perform side-by-side evaluations and stack-rank model responses based on helpfulness and naturalness Write clear rationales referencing specific conversation turns Extract and verify debug information to confirm correct use of summaries and data sources Maintain data hygiene by deleting evaluation conversations after completion
Requirements
Experience in data annotation, AI quality evaluation, content moderation, or related roles Strong Japanese proficiency (reading and writing) Willingness to use a primary personal Google account and enable personal data sources for assessment Strong analytical thinking and attention to detail Experience with creative prompt engineering and personalization concepts Ability to provide structured feedback and clear written explanations Ability to work independently in a remote environment Desktop or laptop with a stable internet connection
Application Process
Upload resume Interview Submit form