An autonomous coding agent for papercuts

Yak.

The small stuff — flaky tests, Sentry bugs, copy tweaks, broken CSV exports — handled while you work on what matters.

Yak watches Slack, Linear, Sentry, and our CI. When something small breaks, it opens a branch, writes the fix, verifies it in CI, and hands you a pull request with screenshots and a video walkthrough. Got a question about the codebase instead? You'll get an answer back, not a PR.

Meets you where you are

Slack Linear Sentry GitHub + flaky tests, research mode, PR reviews

Input channels

Mention it. Assign it. Forget it.

Yak plugs into the tools the team already lives in. Tag a Slack thread, assign a Linear issue, let a Sentry alert fire — the routing layer takes it from there.

# Slack @yak

#engineering · 10:42

maria @yak the CSV export is dropping the trailing zip digit again 🙃

yak On it. I see three ways this could read — picking the right one first…

yak ✓ PR #284 opened — full regression test + screenshot attached.

Ambiguity-aware. If the ask could be read two ways, Yak replies with grounded options before writing a single line of code.

L Linear assign · @yak

GEO-1284 · Dark mode contrast on pricing cards

assigned · Yak in‑review

yak → acknowledged in the agent session

yak → In Review · PR ready for human review

Runs via Linear's Agents API · no seat required

Assign-driven. Assign any issue to the Yak agent — fix or question, Yak classifies it upfront. A research label is optional, just a shortcut past the classifier.

! Sentry auto-triage

TypeError: Cannot read 'accuracy' of undefined

events: 43 · users: 12 · first seen 2h ago

yak filter: actionable ✓ · not CSP · not transient infra

yak pulling breadcrumbs via Sentry MCP…

yak ✓ fix + regression test pushed

Filters the noise. CSP violations, redis blips, and one-off user errors get dropped — only real, actionable issues with enough signal get a task.

Goes looking for trouble

Picks up Sentry issues and flaky tests on its own.

Yak watches Sentry and CI in the background. It wakes up when an actionable issue crosses the threshold or when a test starts failing on main. Then it reads the stacktrace, forms a hypothesis, and writes a fix plus a regression test.

Aggressive pre-filters: CSP, transient infra, low-signal events dropped before they cost a dime.
Flaky test triage: real bug or racey test? — Yak reports back honestly instead of papering over it.
Per-repo budget and daily cost caps keep alert storms from wreaking havoc.

Recent auto-pickups

last 24 h

Sentry

TypeError in GeocodeController@show

events 43 · users 12 · sentry-priority: medium

PR opened

Flaky

BatchUploadTest::it_handles_merged_headers

fails 3/10 builds · racy fixture timestamp

fixed

Linear

Dark-mode contrast on pricing cards

GEO-1284 · assigned to Yak

in review

Sentry

Null dereference in AccuracyScorer

events 18 · seer: actionable ✓

PR opened

.yak-artifacts/research.html

Research findings

Deprecated accuracy_type usage across the API.

7 endpoints still read accuracy_type directly. Two customer-facing, five internal. Estimated migration: 2 PRs, ~180 LOC. No blocking callers.

GeocodeCtrl

BatchCtrl

ReverseCtrl

LookupSvc

ExportSvc

→ file refs w/ line numbers → risk: low → effort: 2 PRs

Ask a question

Not every ask is a fix. Sometimes you just want an answer.

"When is the welcome email triggered?" "How bad would this refactor be?" Just ask. A lightweight classifier decides up front whether your request is a fix or a research question, so there are no magic prefixes to remember.

Quick factual questions come back as a conversational answer in the same thread — no branch, no CI, no PR.
Bigger investigations produce a self-contained HTML findings page with file references, charts, risks, and an effort estimate.
Explicit overrides still work: a research label on Linear or a research: prefix in Slack skips the classifier.

Reviews PRs, too

Every open PR gets a rubric-driven review — with suggestion blocks.

Flip a switch on a repo and Yak reviews every non-draft PR. Findings land as GitHub line-level comments with category and severity labels; one-to-ten-line fixes come through as native suggestion blocks you can commit with a click.

Incremental on synchronize: only new commits get re-reviewed. Force-pushes safely fall back to a full review.
Runs in the same sandbox as fixes, so it can actually run the tests and linters for the files the PR touches.
Reaction counts (👍 / 👎) are tracked — the dashboard shows which kinds of feedback your team finds useful.

Yak · geocodio/api #412

app/Services/GeocodeClient.php:87

Performance should_fix

Retry loop re-creates the Http::withHeaders call on each attempt, losing connection reuse.

```suggestion
$client = Http::withHeaders($headers);
foreach (range(1, 3) as $attempt) {
    $response = $client->get($url);
}
```

The safety model

Yak opens pull requests.
Humans merge them.
No auto-merge. No exceptions.

1 Yak

Writes the fix

Branch, commit, local tests — every run in a fresh sandbox. Never pushes to main.

2 CI

Proves it

Full test suite runs on real CI. If it fails, Yak retries once. If that fails, a human takes over.

3 Yak

Opens the PR

Screenshots, video, summary, cost, session id. Large diffs get a yak-large-change label.

4 You

Reviews & iterates

Push follow-up commits, request changes, or take over entirely. Yak's branch is just a starting point.

5 You

Merges

Always. The Yak GitHub App has no merge authority and is never on your bypass list by design.

No merge authority

The Yak GitHub App can push branches and open PRs — nothing more. Branch protection stays on.

One sandbox per task

Every run lives in its own Incus container, cloned from a ZFS snapshot. Firewalled off from Yak itself, its database, and every other task.

Bounded retries

At most one retry on CI failure. Two strikes and a human picks up — no endless flailing.

Per-task + daily budgets

Every run is capped. The daily routing budget is enforced by job middleware before a single token is spent.

Visual capture: done

REC · 00:12

Proof of work

Every UI change comes with a screenshot and a video walkthrough.

When Yak touches the frontend, it spins up the dev server, logs in as a seeded test user, drives the affected page through the exact flow it just changed, and attaches the recordings to the PR.

Real Chromium navigation, authenticated session, video recorded end-to-end.
Partial captures when something blocks the full flow — never a silent skip.
Every PR clearly states whether the capture was complete, partial, or skipped.

Add a repo

Dev environment, frozen on a snapshot.

Paste the clone URL. Yak dispatches a one-time setup task inside a fresh Incus container — docker-compose up, dependencies, migrations, test suite — then freezes the result as a ZFS copy-on-write snapshot. Every future task on this repo clones that snapshot in about two seconds.

Setup runs once. After that, tasks start from a warm, verified template.
Sandboxes are destroyed at the end of every task — nothing leaks between runs.
Up to four tasks run in parallel, each in its own container with its own network.
New deps broke the environment? Re-run setup and Yak builds a fresh snapshot.

yak · setup task — api

# One-time setup inside a fresh Incus container › incus launch yak-base yak-setup-api ok › reading README.md, CLAUDE.md, docker-compose.yml › docker-compose up -d · mysql · redis · meilisearch ok › composer install && npm ci && npm run build ok › php artisan migrate --seed ok › pest --compact 432 passed # Promote to a ZFS snapshot › incus snapshot create yak-setup-api ready ok › status: setup → ready # Future tasks clone from snapshot → live in ~2s › incus copy yak-tpl-api/ready task-42 1.8s