Evaluation Scenario Writer - Ai Agent Testing Specialist
32 $/oraJobtome
3 days ago Be among the first 25 applicants This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.
At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.
What We DoThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe.
About The RoleWe're looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You'll create test cases that simulate human-performed tasks and define gold‑standard behavior to compare agent actions against. You'll work to ensure each scenario is clearly defined, well‑scored, and easy to execute and reuse. You'll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.
Responsibilities- Design structured test scenarios based on real‑world tasks
- Define the golden path and acceptable agent behavior
- Annotate task steps, expected outputs, and edge cases
- Work with devs to test your scenarios and improve clarity
- Review agent outputs and adapt tests accordingly
Simply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you'll help shape the future of AI while ensuring technology benefits everyone.
Requirements- Bachelor's and/or Master's Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields.
- Background in QA, software testing, data analysis, or NLP annotation
- Good understanding of test design principles (e.G., reproducibility, coverage, edge cases)
- Strong written communication skills in English
- Comfortable with structured formats like JSON/YAML for scenario description
- Can define expected agent behaviors (gold paths) and scoring logic
- Basic experience with Python and JS
- Curious and open to working with AI‑generated content, agent logs, and prompt‑based behavior
- Ready to learn new methods, able to switch between tasks and topics quickly and sometimes work with challenging, complex guidelines
- Freelance role is fully remote;
you just need a laptop,internet connection, time available, and enthusiasm to take on a challenge
- Experience in writing manual or automated test cases
- Familiarity with LLM capabilities and typical failure modes
- Understanding of scoring metrics (precision, recall, coverage, reward functions)
- Get paid for your expertise, with rates that can go up to $32/hour depending on your skills, experience, and project needs
- Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments
- Participate in an advanced AI project and gain valuable experience to enhance your portfolio
- Influence how future AI models understand and communicate in your field of expertise
Entry level
Employment typePart‑time
Job functionOther
IndustriesIT Services and IT Consulting
#J-18808-Ljbffr30 $/ora
...English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems.... ...sources. Write comprehensive functional tests that validate actual end-to-end behavior...ConsigliatoTemporaneoImpiego permanente32 $/ora
...can design realistic and structured evaluation scenarios for LLM-based agents. You'll create test cases that simulate human-performed tasks... ...gold‑standard behavior to compare agent actions against. You'll work to ensure each scenario is clearly defined, well‑scored, and...ConsigliatoPart-timeLibero professionistaRemotoOrario flessibile- ...firm in Italy seeks professionals for project-based AI opportunities focused on testing and improving AI systems. Responsibilities include reviewing coding tasks, writing functional tests, and analyzing AI performance. Ideal candidates have a degree in Computer Science...ConsigliatoOrario flessibile
- ...Sogetel è alla ricerca di un/a HELP DESK MOBILE (SIMCARD) per attività di testing nel settore delle smart card a Roma. Il candidato ideale si dedicherà al testing dei prodotti SIM, monitorando i processi di certificazione e gestendo la documentazione necessaria. Il...Consigliato
- One80 Kitchen Lounge Restaurant in Rome is seeking a Marketing & Activations Intern to support a diverse creative team. Your main tasks will involve writing engaging copy for various platforms, collaborating with PR and marketing teams, and assisting in event activations...ConsigliatoStage/Tirocinio
23.000 € - 26.000 €
...voce e messaggio su tutti i touchpoint Gestione delle deadline anche in contesti in evoluzione Revisione e ottimizzazione dei testi sulla base dei feedback Contribuire con spunti, riferimenti e idee alla crescita del team Organizzare e aggiornare documenti...Tempo pienoStage/TirocinioApprendistatoLavoro ibrido- Senior Content Designer / Senior UX Writer Posted: 18.05.2026 Job Information Location: Around Milan or Barcelona (preferred... ...across markets. Additionally, you’ll propose, run, and evaluate experiments to test content effectiveness and directly impact product OKRs....Permesso di lavoro
- Stone Island in Rome is looking for a full-time Client Advisor to enhance client experiences and contribute to store performance. The ideal candidate will have a strong background in retail and a passion for luxury fashion. This role requires fluency in Italian and English...Tempo pieno
- ...Refinitiv is seeking a Senior Specialist Legal Editor based in Italy to leverage their expertise in employment law. This role involves creating and maintaining insightful legal content, supporting AI-powered legal solutions, and interacting with customers and stakeholders...RemotoOrario flessibile
55.600 € - 103.300 €
...About the Role As a Senior Specialist Legal Editor, you will use your technical expertise and substantial experience as a practising lawyer in employment law and practice in Italy to create and maintain insightful and customer‐focused content for online publication, and...RemotoOrario flessibile45.000 € - 50.000 €
...normative nazionali e internazionali relative alla comunicazione medico‑scientifica.Ottime capacità di scrittura, sintesi e revisione di testi scientifici in italiano e inglese.Abilità personali (Soft Skills)Affidabilità e chiara.Propensione al lavoro di squadra, precisione,...Smart workingTempo pienoImpiego permanente40 h/sett.Dal lunedì al venerdì45.000 € - 50.000 €
HAERES EQUITA S.R.L. cerca una risorsa per la creazione e revisione di contenuti medico-scientifici in un contesto in crescita. La posizione richiede una forte conoscenza delle normative e capacità di lavoro in team. Il lavoro si svolgerà in sede a Firenze full-time, ...Smart workingTempo pienoImpiego permanente35.000 € - 45.000 €
Ingenn® è la Search House indipendente , esclusivamente specializzata nell’identificazione e nella valutazione di Ingegneri , Manager ed Executive per il Comparto Manifatturiero e della Produzione Industriale. Committente: Ingenn® seleziona un Tecnico Metrologia...35.000 € - 45.000 €
Ingenn S.r.l. cerca un Tecnico Metrologia e Collaudo CMM per lavorare su progetti innovativi nella meccanica di precisione avanzata. Il candidato ideale avrà esperienza con macchine CMM, competenze nella programmazione di cicli di misura e capacità nella lettura di disegni...26.000 € - 45.000 €
Simplecomm, un'agenzia di comunicazione innovativa a Roma, cerca un content creator creativo e appassionato. Il candidato ideale dovrà gestire piani editoriali, comunicare con il cliente e sviluppare contenuti originali su piattaforme social. È richiesta una profonda conoscenza...- ...RecruitGo Careers is seeking an App Tester in Rome, Italy, to evaluate app functionality including money transactions and history checks. As an ideal... ...be a current Satispay ewallet user and be willing to test with personal accounts. Your role involves attending Zoom...Orario flessibile
- Per un magazine digitale dal posizionamento fortemente innovativo , cerchiamo un Content/Copy senior con una forte attitudine alla ricerca creativa , alla sperimentazione hermi linguaggi e una profonda familiarità con l’ecosistema social e digitale . La figura...
- Un'azienda di tecnologia è alla ricerca di un/a TESTER SIM per attività di testing su smart card. Il candidato ideale dovrà avere esperienza nel testing, capacità di interpretazione di dati in formato binari e conoscenza dei protocolli ISO7816. Le attività includeranno...
- ...RecruitGo is seeking an App Tester to join their team. This role requires testing app functionality, including sending and receiving money while using personal devices. Candidates must have experience with the Satispay ewallet and be proficient in English. The tester...Orario flessibile
- ...Sogetel ricerca un/a TESTER SIM per attività di testing in ambito smart card a Roma. In questa posizione, il candidato si occuperà di eseguire test per la certificazione di prodotti SIM, assicurando la qualità e la standardizzazione dei processi. Il ruolo richiede...
- ...Siamo alla ricerca di un/a TESTER SIM. Attività di testing in ambito smart card per certificazione Prodotti SIM. La figura si dedicherà ad attività... ...di prodotti smart-card Consumer/M2M eseguendo le test list di riferimento e verificando l'esito positivo dei test...Tempo determinato
- Are you passionate about language and healthcare? Do you thrive on solving linguistic puzzles, like deciphering challenging handwritten medical notes? If so, we’d love to hear from you! The Language Doctors, Inc. is looking for skilled Italian-to-English medical translators...Libero professionistaLungo termineRemotoOrario flessibile
- A medical translation company is looking for skilled Italian-to-English medical translators and editors in Milan. This remote role allows flexibility while engaging in long-term projects that contribute to healthcare clarity. Candidates should have over five years of experience...Lungo termineRemotoOrario flessibile
- A leading tech company in Rome is seeking a skilled developer to build, test, and deploy AI agents on their platform. The ideal candidate will have 2-6 years of software development experience and a strong understanding of APIs and networking. This role involves collaboration...
- Sei appassionato/a di formazione e hai una vena creativa che ti distingue? Vuoi contribuire a cambiare la vita delle persone attraverso l'istruzione? Allora sei la persona che stiamo cercando! Chi siamo. l’Istituto Meschini è una realtà dinamica e innovativa nel...Tempo pieno
- AeroFrohne LLC is seeking an independent contractor Matterport Pro3 Technician and 3D virtual tour creator in Roma, Lazio. The role involves coordinating with stakeholders to prepare locations, scanning spaces using the Matterport Pro3, and editing the captured data for...Paga orariaLibero professionista
- ...Hai mai pensato di impiegare il tuo tempo libero e guadagnare soldi extra ? Se conosci l'inglese ed hai competenze nella scrittura di testi, candidati ora a questa offerta. CERCHIAMO COPYWRITER FREELANCE (INGLESE) Compiti Prepara un programma specifico per aiutare...Libero professionistaDisponibilità immediataRemotoOrario flessibile
- Se cerchi un lavoro che ti permetta di esprimere il tuo talento e crescere professionalmente, Stratego SWAT è il posto giusto per te.Ma prima di candidarti, vogliamo essere sicuri che tu stia cercando sfide vere... e nuove conquiste. Cerchi la routine: da noi non c’è...Libero professionista
- ...Lead Writer Remote Permanent or Full-time Contractor Competitive package About Us: ClickOut Media is not just a company... ...# Have an introduction call with our recruitment team # Do a test # Have a technical interview # Equal Opportunities ClickOut...Tempo pienoImpiego permanenteLibero professionistaRemotoOrario flessibile
- Hosco.Com, a leading hospitality company in Italy, is seeking a Copywriter to enhance the brand's language across various media. The successful candidate will produce newsletters, press releases, and engaging content while collaborating with the design and marketing teams...Stage/Tirocinio
