[Remote] AI Evals Technical Lead
P-1 AINote: The job is a remote job and is open to candidates in USA. P-1 AI is focused on building an engineering AGI to impact the built world significantly. The AI Evals Technical Lead will be responsible for developing and validating evaluation tests to ensure the AI, Archie, meets industry skill expectations and effectively performs engineering tasks.
Responsibilities
- Implement the system for organizing, transforming, running, grading, and reporting on eval benchmarks
- Design and execute the process by which we develop and QA our evals, incorporating contributions from our own engineering team, industrial partners, and subject-matter experts
- Ensure that evals run effectively within our CI/CD system, continuously benchmarking our evolving AI platform and the experiments we’re performing around it
- Create methods for detecting and testing for common quality challenges of AI, including hallucinations, undesirable stochasticity, and regressions
- Be a technical leader in the consistent implementation and organization of automated tests across other areas of our technology stacks
Skills
- Feel an unshakeable pull to work on agentic AI
- Can usually break an AI or a piece of software in under a minute (if you want to)
- Are a skilled developer yourself
- Always develop an interest in the subject matter you're building tests for, and are eager to do the same for the industrial products that run the world
- Believe in manifesting the future of physical engineering
- Experience in constructing comprehensive test suites for software and/or AI systems, including coordinating the contributions of others
- Experience designing metrics to evaluate systems and visualize their performance, including differences across successive generations
- Good communication skills with a variety of stakeholders (AI researchers, domain experts, application developers)
- Proficiency in Python programming, complex modules and modern software development tools and practices (Git, CI/CD, etc.)
- Ability to thrive in a fast-paced, dynamic startup environment
- Experience in developing, managing, and running evals against LLM-based systems is a strong plus
Company Overview
- P-1 AI is a technology company focused on developing an artificial general engineering intelligence (AGEI). It was founded in 2024, and is headquartered in Henderson, Nevada, USA, with a workforce of 2-10 employees. Its website is https://p-1.ai.
Job Type
- Job Type
- Full Time
- Location
- United States
Share this job:
