An autouser is a UX-research persona implemented as a Computer Use agent. It receives a stimulus (URL, screenshot, or video), navigates or inspects it the way a real first-time user would, and emits a Rating in the same shape a human rater produces.Documentation Index
Fetch the complete documentation index at: https://docs.autousers.ai/llms.txt
Use this file to discover all available pages before exploring further.
Anatomy
| Field | Type | Notes |
|---|---|---|
id | string | auto_<cuid> for system; auto_<cuid> for team. |
name | string | ”First-time buyer”, “Power user”, “Skeptical evaluator”. |
role | string | One-line persona summary surfaced to the model. |
systemPrompt | string | Full instructions. The model’s context. |
isSystem | boolean | true for built-ins; false for team-created. |
visibility | enum | private (team) or public (team-publishable). |
calibrationStatus | enum | uncalibrated, calibrating, calibrated, frozen. |
activeRubricId | string? | The frozen rubric in use, if any. |
Listing
?source=system|team.
Creating a custom autouser
uncalibrated — it can rate, but its scores have
no inter-rater reliability data yet.
Calibration
Calibration is how an autouser learns to agree with itself across runs. We feed it a small panel of comparisons, run it N times, measure the consistency of its ratings, and either freeze the rubric (locking in stable behaviour) or iterate the system prompt. Lifecycle:| State | Meaning |
|---|---|
uncalibrated | Never run a calibration pass. |
calibrating | A CalibrationRun is in flight. |
calibrated | Stable Krippendorff α ≥ 0.6 across recent self-runs. |
frozen | The rubric is locked. New evaluations use this rubric forever or until thawed. |
calibration.frozen webhook. Downstream pipelines
should listen for it before promoting an autouser to production.
Runs
When you call/v1/evaluations/{id}/run-autousers, every entry in
selectedAutousers is expanded by agentCount into individual
AutouserRun rows. Each row tracks:
status:pending→running→completedorfailed.currentStep,currentAction,currentNarration: live worker progress.inputTokens,outputTokens,estimatedCostUsd: cost telemetry.artifactsPath: GCS prefix for video, screenshots, transcripts.
agentCount × comparisonCount.
Cost
Autouser runs price againstgemini-3-flash-preview (the only Gemini
SKU with native Computer Use). Typical run: 0.12 depending on
page complexity, navigation depth, and dimension count. Use dryRun
on the parent evaluation to forecast before queueing.