Evals.sh

Feature Guides

Everything you need to know to harness the power of Evals.sh autonomous agents. Master frontend auditing, automated code reviews, and deep project evaluation.

Project Evaluation

Our flagship agent. The Project Evaluator actively spawns a headless browser instance, logs into your application, and attempts to accomplish a natural language goal exactly like a human QA tester would.

Writing a Good Prompt

The agent relies on clear instructions. A good prompt defines the end goal, not just the steps:

Bad: Click the blue button, then type text, then submit.
Good: Navigate to the store, add a pair of running shoes to the cart, and proceed to the checkout page ensuring the total updates correctly.

Protected Routes

  • If your app requires login (like a SaaS dashboard), check the Require Authentication toggle.
  • Provide dummy/test credentials. Never provide real production credentials or passwords you use elsewhere.

Frontend Audit

The Frontend Audit agent analyzes your live web application for critical user-facing issues, ensuring your app is fast, accessible, and secure.

How to use it

  • Enter a publicly accessible URL (e.g., https://example.com). Localhost URLs cannot be scanned.
  • Toggle the specific metrics you care about: Performance, Accessibility, or Security.
  • Provide optional instructions (e.g., "Focus on optimizing the hero image and reducing initial load time").

What to expect

The agent will simulate a Chromium browser, extract the DOM, run Lighthouse-style heuristics, and return a comprehensive report with performance metrics and detailed AI observations explaining exactly what is wrong and how to improve it.

Code Review

Acting as a staff-level engineer, the Code Review agent analyzes your source code to catch logic bugs, memory leaks, and anti-patterns before they hit production.

example-snippet.ts
const processData = (data: any) => { // AI will flag this 'any' type & suggest strict typing! return data.map(item => item.value * 2); }

How to use it

  • Paste raw code directly into the editor playground.
  • Select the specific language to help the AST parser context.
  • Check the "Detailed Explanations" toggle if you are a junior developer wanting to learn *why* the changes are suggested.

Supported Languages

TypeScript, JavaScript, Python, Go, Rust, and standard web technologies (HTML/CSS).

Groups & Bulk Jobs

When you need to evaluate ten, fifty, or a hundred different URLs at once, the Groups feature lets you batch tasks together and run them in parallel.

How to use it

  • Click Create Group on the Groups dashboard.
  • Select the evaluation type (Frontend Audit, Code Review, or Project Evaluation) that you want to run for everything in this batch.
  • You can manually enter a list of URLs, or upload a CSV file containing URLs to bulk-process dozens of Frontend Audits or Project Evaluations instantly.

Email Notifications & Scale

Bulk groups run async in our distributed worker clusters. You don't need to stay on the page. We highly recommend toggling on the Send me an email when this batch completes switch so you can walk away while the evaluations run.