Machine Learning Evaluation Specialist

Machine Learning Evaluation Specialist

Posted 1 day ago by Alignerr

Negotiable
Undetermined
Remote
London, England, United Kingdom

Summary: The Machine Learning Evaluation Specialist role focuses on designing complex evaluation challenges for AI systems, leveraging deep machine learning knowledge and domain expertise. Candidates will create original problems that challenge state-of-the-art models, shaping the future of AI measurement and improvement. This position offers full autonomy and the opportunity to collaborate with a global team of researchers. It is a fully remote, hourly contract position with flexible hours.

Key Responsibilities:

  • Design complex, original machine learning problems rooted in specific domain expertise
  • Craft evaluation tasks requiring advanced domain knowledge beyond standard ML pipelines
  • Draw from research experience to create challenging problems for state-of-the-art AI
  • Define rigorous problem statements, evaluation criteria, and gold-standard solutions
  • Assess AI-generated ML solutions for correctness, creativity, and methodological soundness
  • Document problem difficulty levels, required domain knowledge, and expected AI failure modes
  • Collaborate asynchronously with a global team of researchers and engineers

Key Skills:

  • Graduate-level expertise (MS or PhD preferred) in a scientific or technical domain intersecting with machine learning
  • Strong working knowledge of ML methods — model selection, feature engineering, evaluation metrics, and pipeline design
  • Deep familiarity with active research problems in the field
  • Able to identify where general ML knowledge falls short and specialized domain insight is critical
  • Experience publishing or conducting original research is highly valued
  • Excellent written communication skills
  • Self-motivated and comfortable working independently on intellectually demanding tasks

Salary (Rate): £200.00/hr

City: London

Country: United Kingdom

Working Arrangements: remote

IR35 Status: undetermined

Seniority Level: undetermined

Industry: IT

Detailed Description From Employer:

About The Role

The quality of AI depends entirely on the quality of the problems used to test it. We're looking for researchers and domain experts with deep machine learning knowledge to design the evaluation challenges that define — and push — the limits of today's most capable AI systems. This isn't routine review work. You'll apply your hard-earned research expertise to craft problems that state-of-the-art models genuinely struggle to solve. Your contributions directly shape how the next generation of AI is measured, benchmarked, and improved.

Organization: Alignerr

Type: Hourly Contract

Location: Fully Remote

Commitment: 10–40 hours/week

What You'll Do

  • Design complex, original machine learning problems rooted in your specific domain of expertise
  • Craft evaluation tasks that require advanced domain knowledge well beyond standard ML pipelines
  • Draw from your own research experience to create problems that genuinely challenge state-of-the-art AI
  • Define rigorous problem statements, evaluation criteria, and gold-standard solutions
  • Assess AI-generated ML solutions for correctness, creativity, and methodological soundness
  • Document problem difficulty levels, required domain knowledge, and expected AI failure modes
  • Collaborate asynchronously with a global team of researchers and engineers

Who You Are

  • Graduate-level expertise (MS or PhD preferred) in a scientific or technical domain that intersects with machine learning
  • Strong working knowledge of ML methods — model selection, feature engineering, evaluation metrics, and pipeline design
  • Deep familiarity with active research problems in your field
  • Able to identify precisely where general ML knowledge falls short and specialized domain insight becomes critical
  • Experience publishing or conducting original research is highly valued
  • Excellent written communication — you can articulate complex problems clearly and precisely
  • Self-motivated and comfortable working independently on intellectually demanding tasks

Example Domains (Not Exhaustive)

  • Computational biology, genomics, or bioinformatics
  • Climate science and environmental modeling
  • Medical imaging and healthcare ML
  • Materials science and computational chemistry
  • Astrophysics and signal processing
  • Natural language processing for low-resource or specialized corpora
  • Robotics, control theory, or reinforcement learning in complex environments
  • Financial modeling and quantitative analysis

Why Join Us

  • Work at the true frontier of AI evaluation and safety research
  • Collaborate with top research labs pushing the boundaries of what AI can do
  • Finally put your specialized domain expertise to use in a high-impact, meaningful way
  • Full autonomy over your schedule — work when and how you do your best thinking
  • Flexible, fully remote contract with potential for ongoing work and deeper research involvement
  • Build your profile as a recognized contributor to cutting-edge AI development
  • Join a global community of researchers and engineers who take this work seriously