[Remote] AI Evals Technical Lead

Note: The job is a remote job and is open to candidates in USA. P-1 AI is focused on building an engineering AGI to impact the built world significantly. The AI Evals Technical Lead will be responsible for developing and validating evaluation tests to ensure the AI, Archie, meets industry skill expectations and effectively performs engineering tasks.

Responsibilities

Implement the system for organizing, transforming, running, grading, and reporting on eval benchmarks
Design and execute the process by which we develop and QA our evals, incorporating contributions from our own engineering team, industrial partners, and subject-matter experts
Ensure that evals run effectively within our CI/CD system, continuously benchmarking our evolving AI platform and the experiments we’re performing around it
Create methods for detecting and testing for common quality challenges of AI, including hallucinations, undesirable stochasticity, and regressions
Be a technical leader in the consistent implementation and organization of automated tests across other areas of our technology stacks

Skills

Feel an unshakeable pull to work on agentic AI
Can usually break an AI or a piece of software in under a minute (if you want to)
Are a skilled developer yourself
Always develop an interest in the subject matter you're building tests for, and are eager to do the same for the industrial products that run the world
Believe in manifesting the future of physical engineering
Experience in constructing comprehensive test suites for software and/or AI systems, including coordinating the contributions of others
Experience designing metrics to evaluate systems and visualize their performance, including differences across successive generations
Good communication skills with a variety of stakeholders (AI researchers, domain experts, application developers)
Proficiency in Python programming, complex modules and modern software development tools and practices (Git, CI/CD, etc.)
Ability to thrive in a fast-paced, dynamic startup environment
Experience in developing, managing, and running evals against LLM-based systems is a strong plus

Company Overview

P-1 AI is a technology company focused on developing an artificial general engineering intelligence (AGEI). It was founded in 2024, and is headquartered in Henderson, Nevada, USA, with a workforce of 2-10 employees. Its website is https://p-1.ai.

Responsibilities

Skills

Company Overview

Job Type