Title:
AI Safety and Evaluations Engineer
Job Type:
Contract
Contract Length:
12 Months
Pay Range:
$50/hr – $175/hr
Start Date:
ASAP
Location:
Remote
About the Opportunity:
Our client, a leader in AI testing and Generative AI solutions, is looking for a skilled AI Safety and Evaluations Engineer
to join their team for a 12-month engagement. This project involves designing and building rigorous evaluation frameworks to measure model bias, hallucinations, and toxicity, ensuring models are safe and compliant before deployment. This is a high-impact role that requires a self-motivated professional who can hit the ground running and deliver results quickly.
Key Responsibilities & Deliverables:
This role is focused on the successful completion of specific tasks and deliverables. Your responsibilities will include:
- Designing and building rigorous evaluation frameworks to measure model bias, hallucinations, and toxicity.
- Creating automated "Eval" datasets to benchmark new models before they are promoted to production.
- Developing metrics for "Grounding" and "Faithfulness" in RAG-based systems.
- Building monitoring tools that flag harmful or non-compliant AI outputs in real-time.
- Partnering with legal and ethics teams to translate policy into technical safety constraints.
We are looking for someone with a proven track record of successful contract engagements. The ideal candidate will have:
- 3+ years of experience in AI Research or Quality Engineering.
- Deep expertise in model evaluation techniques and NLP metrics (ROUGE, BLEU, BERTScore). This isn't a learning role—you need to be a subject matter expert.
- Demonstrated ability to work autonomously and manage your own time effectively to meet project goals.
- Experience with Python, data analysis tools, and LLM-as-a-Judge frameworks.
- Strong communication skills to provide clear and concise status updates to the project team.
#LI-JN1





