Freelance Agent Evaluation Engineer
30 $/oraMindrift
Please submit your CV in English and indicate your level of English proficiency.
Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.
What this opportunity involves
We're building a dataset to evaluate AI coding agents - how well a model handles real-world developer tasks.
You'll create challenging tasks and evaluation criteria within realistic simulated environments:
- Build realistic developer environments - a virtual company with codebase, infrastructure, and context (tickets, docs, conversations) that forms a believable development history
- Design tasks from intermediate states of these environments - craft the prompt, define what "solved" means, and ensure the task is solvable by an AI agent
- Write tests that verify agent solutions - accept all valid approaches and reject incorrect ones, neither too strict nor too lenient
- Iterate on tasks and tests based on QA feedback - review agent solutions, analyze failures, and refine until the evaluation is fair and robust
What this is NOT
- Not data labeling
- Not prompt engineering
- Not writing code from scratch - the agent writes most of the code; you guide and evaluate
What we look for
- 5+ years in software development
- Core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, Redis
- Experience writing tests (functional, integration)
- English proficiency - B2+
Why this is hard
Frontier models are already good at coding. Creating a task that genuinely challenges the best models is non-trivial. You need to deeply understand where models fail and what scenarios reveal the difference between a good and a bad solution. Tasks have many valid solutions - writing tests that accept all correct solutions and reject incorrect ones is harder than it sounds.
How it works
Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid
Effort estimate
Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.
Compensation
Up to $30/hr equivalent , depending on level and pace. Tasks are estimated at ~20 hours each; you set your own schedule.
- ...Wonderful enables enterprises to build, test, deploy, and monitor AI agents for their most complex needs, serving customers and employees... ...solutions, and iterate fast Work closely with Product, Engineering, and Customer Success teams to ship high-quality, production-...Consigliato
- ...Frontiere sta cercando un Senior AI Engineer a Roma, Napoli o Milano. Il candidato ideale avrà un forte focus tecnico e contribuirà alla... ...soluzioni basate su Large Language Models e architetture multi-agent. Il lavoro include alta complessità tecnologica e...ConsigliatoLavoro ibrido
- ...Administration Place of work Roma Type of contract Freelance (VAT-registered) Description of offer For Protos Check, a... ...within the Protos Group, we are looking for an Architect or Engineer with experience in the infrastructure sector. The selected candidate...Freelance
- ...complessi di trasformazione digitale in ambito pubblico e privato, in contesti nazionali e internazionali Siamo alla ricerca di AI Engineer Freelance per attività di sviluppo e implementazione di soluzioni basate su Intelligenza Artificiale e Machine Learning. Requisiti...FreelanceTempo pienoRemoto
- ...Models? Welo Data is seeking a highly technical Senior Prompt Engineer based in Italy to lead the end-to-end migration of template... ...and professional fluency in English . Employment Type: Freelance / Independent Contractor. Candidate Profile Educational...FreelancePart-timeRemoto
- A leading AI solutions company in Rome seeks a skilled developer to design prompts and implement functions for AI agents across voice, chat, and email. The role involves integrating with customer APIs and testing with live data for performance. Candidates should have a...
- A technology company in Italy is seeking an experienced AI Integration Specialist to design and implement functions for AI agents across multiple channels such as voice, chat, and email. The ideal candidate will have a Bachelor's degree in Computer Science, excellent communication...
- A leading tech company in Rome is seeking a skilled developer to build, test, and deploy AI agents on their platform. The ideal candidate will have 2-6 years of software development experience and a strong understanding of APIs and networking. This role involves collaboration...
- ...persone e ora ti chiediamo di farlo insieme a noi. Entra a far parte della del nostro team! Nel ruolo di Junior Radio Design Engineer sarai un membro del dipartimento di Radio Network Engineering , coinvolto nelle attività relative alla progettazione della rete...FreelanceRemotoOrario flessibile
- ...SYSTRA is one of the world's leading engineering and consultancy groups specialising in public transport and sustainable mobility. With over 10,300 employees, SYSTRA's mission is to design safe and sustainable transport solutions to bring people together, develop social...Stage/Tirocinio
- Fine Foods & Pharmaceuticals N.T.M. S.p.A. cerca un Platform Engineer con esperienza di 3-5 anni per progettare e gestire piattaforme Cloud-Native sicure e scalabili. Il candidato ideale avrà competenze in CI/CD, MLOps, e sarà in grado di integrare pratiche di sicurezza...
- Un'azienda tecnologica leader sta cercando un AIV ENGINEER per un'importante programma nel settore aerospaziale. Il candidato fornirà supporto all'ingegneria di sistema nelle attività di AIV, gestendo test e collaborando fra diversi team. È richiesta una laurea in ingegneria...Impiego permanenteLavoro ibrido
- ...STIP SRL Roma, Italia Senior AI Engineer Descrizione azienda Stip AI sta trasformando il modo in cui le aziende interagiscono con... ...: Ottimizza performance, latenza e scalabilità degli AI agents e servizi, definendo le best practice di efficienza ~ Design...
- ...2026, 09:53 AM Apply Before 05/31/2026, 12:00 AM Job Description We are currently seeking a Subcontracting and Estimating Engineer in INTERNSHIP , reporting directly to Section Head, to join our team based in our Rome offices, c/o Construction Department,...Stage/Tirocinio
- ...Cybertech è alla ricerca di un Access Management Engineer per la sua divisione ENG DIGITAL. La figura sarà coinvolta in progetti complessi di sviluppo software e dovrà gestire interazioni con i clienti per fornire soluzioni efficaci. Il candidato ideale ha esperienza...
- ...Siamo alla ricerca di un Platform Engineer con 3-5 anni di esperienza nel ruolo. Il Platform Engineer è una figura chiave nella progettazione e gestione di piattaforme Cloud‑Native scalabili, resilienti e automatizzate, concepite per accelerare l’innovazione mantenendo...
- ...adapt and track educational materials; capture and communicate evaluation outcomes* Plan and execute new customer implementations,... ...*Biology, Biochemistry, Microbiology, Pharmacy or Biomedical Engineering**### ### **Experience & skills*** **5+ years of experience**...Turni
- A global technology company is seeking a talented Solutions Engineer to join their Italian presales team in Rome. This sales and technical role focuses on providing technical knowledge of F5 solutions, supporting the sales team, and acting as a trusted advisor to customers...
- ...to Connect, Secure & Defend, Observe & Protect, Explore, Travel & Navigate.We are looking for an Satellite Navigation Payload IVVQ Engineer to include in the Engineering Department with the scope of* support the execution of these exciting programs such as GALILEO,...Stage/Tirocinio
- ...A leading technology company in Italy is seeking a skilled Search Platform Engineer to manage the architecture of its search platform on AWS. The role requires expertise in backend architectures and Elastic or OpenSearch technologies. Responsibilities include optimizing...Lavoro ibridoOrario flessibile
- Responsabilità Analisi e definizione dei requisiti software per le funzionalità dei sistemi di comando e controllo (sia in ambito civile sia in ambito difesa). Sviluppo in diversi linguaggi di programmazione. Gestione delle richieste dei clienti finali per le modifiche...Tempo pieno
- ...Programme CAF Annual Fiscal & Financial Support The role As we continue to scale our AI capabilities, we’re looking for an MLOps Engineer to help us build and operate reliable, production‑grade AI systems powering intelligent financial experiences across our platform....Lavoro da casaOrario flessibile
- ...attraverso i nostri Valori: People, Knowledge e Innovation. Per ampliamento del team Aerospace & Defence, siamo alla ricerca di AIV ENGINEER. La risorsa sarà inserita all’interno di un importante programma in ambito aerospaziale e fornirà supporto all’Ingegneria di...Impiego permanenteOrario flessibile
35.110,4 € - 54.624 €
...Ruolo Observability & FinOps Engineer (HPC/AI) – sede di Roma, Leonardo. Responsabilità Progettazione end-to-end di metriche, logging e tracing per OpenStack, Kubernetes (inclusi GPU) e job HPC (Slurm). Definizione del modello dati costi/consumi (CPU/GPU-h...Impiego permanenteLavoro ibridoRemoto- ...Leonardo Worldwide Corporation cerca un Senior Advance Computer Software & Platform Engineer per sviluppare e personalizzare servizi OpenStack e Kubernetes. La posizione richiede un'ottima preparazione in ambienti HPC e GPU-accelerated, con forte enfasi su collaborazione...Impiego permanenteLavoro ibrido
- ...Serco Space Services - Europe Careers seeks a Satellite and Ground Segment Engineer for a full-time, permanent role in Rome. This hybrid position involves direct contribution to Copernicus missions, ensuring operational performance and data flow. The successful candidate...Tempo pienoLavoro ibrido
- Una società energetica in espansione cerca un professionista per gestire progetti di energia rinnovabile. Il candidato ideale ha una laurea in ingegneria ambientale e 2-5 anni di esperienza. Offriamo un inserimento a tempo indeterminato, smart working e perks come buoni...Impiego permanenteSmart workingRemoto
39.000 €
...all'applicazione delle policy SSHE Competenze ed esperienza Laurea in Ingegneria Fino a 2 anni di esperienza in ambito engineering, maintenance o operations Buone capacità analitiche, organizzative e relazionali Attitudine al lavoro in team e orientamento...Tempo determinatoSmart working- ...alla definizione dell’architettura applicativa (frontend, backend, integrazioni, database); Lavorerai a stretto contatto con data engineer, data scientist e altri AI engineer per trasformare prototipi in soluzioni software stabili e manutenibili; Documenterai in...Orario flessibile
- Bridgestone is seeking a Tire Certification Engineer in Rome, Italy, to manage product homologation and certification within the Tire Certification group. The ideal candidate will hold a university degree in engineering and have at least three years of experience in product...Lavoro ibrido
Vuoi ricevere più offerte di lavoro?
Iscriviti per ricevere offerte simili a Freelance Agent Evaluation Engineer. Iscriviti ora!


