Appearance
Site Interaction
Some workflows can only be done by a human clicking through a website. There's no CLI, no public API, no way to script it. But the website's frontend talks to a backend API — and if you can discover that API, you can drive it directly.
Problem: You Don't Know What API the Site Uses
Most sites have no documented API. You need to figure out what endpoints exist, what they accept, and what they return.
Solution: scout the site by intercepting network traffic. Launch the browser, click through the workflow manually, and watch what the frontend calls:
javascript
page.on("response", async (res) => {
const url = res.url();
if (!url.includes("/api/")) return;
const body = await res.text().catch(() => "");
console.log(
`[${res.request().method()} ${res.status()}] ${url}`,
);
if (body.length < 1000) console.log(` ${body}`);
});This reveals endpoints, request shapes, and response formats. These endpoints are stable — the frontend depends on them.
Trial each step independently before combining into an end-to-end script.
Problem: You Need to Automate Actions on the Site
Once you know the API (or if there isn't one), you need to drive the site programmatically.
Solution A: call the API via page.evaluate(() => fetch(...)). This is the preferred approach — faster, more resilient to UI redesigns, and auth headers are included automatically because fetch runs in the page context:
javascript
// GET
const items = await page.evaluate(async () => {
const res = await fetch("/api/items");
return res.json();
});
// POST with JSON
const result = await page.evaluate(async (payload) => {
const res = await fetch("/api/items", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(payload),
});
return { status: res.status, body: await res.json() };
}, { name: "new item", value: 42 });Solution B: drive the UI directly. Some actions have no API equivalent — CAPTCHAs, OAuth flows, drag-and-drop, file inputs that trigger client-side processing:
javascript
await page.locator('button[aria-label="Submit"]').click();
await page.locator('input[name="email"]').fill("a@b.com");
await page.locator('input[type="file"]').setInputFiles("f.zip");When to use which:
| API | UI |
|---|---|
| Fast — no rendering or animations | Slow — waits for DOM updates |
| Resilient to UI redesigns | Breaks when selectors change |
| Can't solve CAPTCHAs or OAuth | Can handle any visual interaction |
Prefer API. Fall back to UI when necessary.
Problem: The Page Isn't Ready Yet
SPAs render asynchronously. goto() resolves before the app is usable. waitForSelector() is fragile — selectors change across deploys.
Solution: poll a lightweight API endpoint for readiness instead of inspecting the DOM:
javascript
let attempts = 0;
while (attempts < 120) {
await page.waitForTimeout(2000);
const ok = await page.evaluate(async () => {
try {
const res = await fetch("/api/me");
return res.ok;
} catch { return false; }
}).catch(() => false);
if (ok) break;
attempts++;
}This works regardless of how the UI renders. If the API responds, the session is live.
Problem: The Site Redirected Somewhere Unexpected
You navigated to /app/settings but landed on /login, /welcome, or /customize instead.
Solution: check the URL after navigation and take corrective action:
javascript
await page.goto("https://example.com/app/settings");
const url = page.url();
if (url.includes("/login")) {
console.log("Redirected to login. Waiting for sign-in...");
// Wait for auth (see authentication.md)
} else if (url.includes("/welcome")) {
// Click through to the right page
await page.getByText("Go to settings").click();
}Problem: You Can't Tell Why the Script Failed
The automation broke somewhere. Without seeing what the browser was showing, debugging is guesswork.
Solution: screenshot the page on every unhandled error:
javascript
try {
await doWork(page);
} catch (err) {
await page.screenshot({ path: "error.png" });
console.error("Failed:", err.message);
console.error("Screenshot: error.png");
process.exit(1);
} finally {
await browser.close();
}Save to a known path and log the location so the human (or agent) can inspect the browser state at the moment of failure.