I Set 2 Playwright Goals at 4 AM and Both Beat Fajr
This Saturday morning I woke up at 4 AM with the kind of half-baked idea that only survives a brain that hasn’t fully booted. I wanted to see if I could give an LLM a real browser, a goal written in two sentences, and walk away. Not a toy. Not a sandbox. LinkedIn. Tokopedia. The real web, with logins and JavaScript and the kind of DOM that makes scrapers cry.
By the time I finished praying Fajr and closed my Quran, both jobs were done. One returned 23 LinkedIn profiles as structured JSON. The other returned 14 G-Shock listings under Rp 1.000.000 from Jakarta sellers offering instant payment. I had not touched the keyboard.
This is the prompt pattern that made it work — and the three things I almost got wrong.
The pattern: goal, not steps
The first version of my Playwright harness was built like a Selenium script. Click here, type this, wait for selector X, extract Y. It worked. It also broke the moment a website changed its layout, which is to say it worked for about forty minutes. Every adjustment was a code change, every code change was a deploy, and every deploy turned the LLM into an expensive wrapper around a brittle macro recorder.
The shift that changed everything was a small one: stop telling the model how to do the job, tell it what success looks like.
For the LinkedIn task, the prompt was 38 words:
> Log in to LinkedIn with these credentials. Search for people whose titles include “CTO” and “Series B” and who are based in Indonesia. Open each profile, extract name, current title, company, location, and the first three lines of their About section. Return the results as a JSON array.
For the Tokopedia task, it was 41 words:
> Open Tokopedia. Search for “G-Shock”. Filter results to location Jakarta, payment method Instant, and price under Rp 1.000.000. For each listing extract product name, price, seller name, seller location, and rating. Return as JSON.
That is the entire spec. No DOM selectors. No XPath. No “wait for the cookie banner to disappear.” I gave the model the same instructions I would give a junior intern on their first day: here is the goal, here is the output format, go.
The LLM did the rest. It located the login form by reading labels. It handled the “We use cookies” modal by reading the button text. It recovered from a captcha prompt by waiting and retrying. None of this was in the prompt. All of it was in the model’s prior.
What almost broke it
Three things nearly killed both jobs.
1. Session cookies and headless mode. My first LinkedIn attempt ran in headless Chromium and got blocked at the login screen. LinkedIn’s bot detection in 2026 is not subtle — it fingerprints headless browsers, flags datacenter IPs, and serves a different login page to anything that looks automated. I switched to headless=False running inside a virtual display (xvfb-run), warmed up the session with a real human-style delay, and used the same browser profile that I had manually logged into once. The second attempt sailed through.
2. Stale element references after navigation. Playwright throws ElementNotFound errors when you cache a selector across page loads. The model handled this gracefully on Tokopedia (it retried with a fresh page.locator(...)), but on LinkedIn it sometimes clicked a button that had moved. The fix was teaching the model to always re-query the DOM after any goto or click that triggers a redirect. I baked that into a system-prompt rule: “After every page transition, re-acquire your locators.”
3. Output schema drift. On the first run, the model returned JSON with price as a string (“Rp 950.000”), then as a number (950000), then as a string with commas (“950,000”). The pattern that fixed it was simple but mandatory: include a JSON schema in the prompt, and tell the model to validate its output against the schema before returning. One sentence — “Before sending, validate your output against this schema: {…}” — eliminated 100% of the drift.
The setup, in case you want to try this
Both jobs ran on the same hardware: an M720q homelab with 32 GB RAM and an old i5-6500T, running Ubuntu 22.04, Python 3.11, Playwright 1.49, and a Gemini 2.5 Flash model accessed via the OpenAI-compatible endpoint. The whole stack — model API, browser, control loop — fits in under 1 GB of RAM. The browser profile is the only stateful part, and it is just a Chromium user-data directory.
The control loop is roughly 200 lines of Python. It boots a Playwright browser, hands the model a playwright-prefixed tool surface (browser_navigate, browser_click, browser_type, browser_extract_text, browser_screenshot, browser_evaluate), and lets the model drive a ReAct-style loop until it returns a final_answer tool call. Each tool call logs the inputs and outputs to a local SQLite database, so I can replay any session step by step.
The model sees the live DOM — document.body.innerText plus a structural summary — on every step. It does not see screenshots by default (that is too slow), but the browser_screenshot tool is available when the model decides it needs visual confirmation. In practice, the model asks for a screenshot maybe once per 30 steps, usually when a layout looks unfamiliar.
Total cost for the two jobs: about Rp 4.500 in Gemini API calls. Total wall time: 11 minutes. Total lines of code I had to write: zero — I had already built the harness for an earlier project and just handed it the two prompts.
Why this surprised me
I have been writing Selenium scrapers since 2014. I have built Playwright pipelines that can extract 50.000 product listings overnight. I know how to wait for selectors and handle infinite scroll and rotate proxies. None of that prepared me for the experience of watching a model open LinkedIn, read a profile, and decide for itself that the About section is worth three lines of extraction.
The thing I keep coming back to is not the technical achievement. It is the gap between “I told it what success looks like” and “I told it how to achieve success.” The first version of any automation I write is always a step-by-step script, because that is how I think about a new problem. The model does not think that way. The model thinks in goals.
I do not think this replaces every scraper I have ever written. For high-volume, high-stability extraction (price monitoring across 200 e-commerce sites, say), the deterministic script is still cheaper and faster. But for the long tail of one-off jobs — extract this profile, fill this form, check this dashboard — the goal-prompt pattern is now the lowest-friction path. Lower than writing the script. Lower than hiring a VA. Lower than doing it myself at 6 AM.
What I am changing about how I work
The jobs I am giving this pattern next: a daily 7 AM check of my bank statements (CSV export, parse, alert on anything over Rp 500.000), a weekly competitor-pricing sweep for the SH blog’s affiliate partners, and a recurring job to renew my passport appointment 60 days before expiry because the government portal opens exactly 47 slots per day and they vanish in four minutes.
None of these are exciting. All of them used to be the kind of task I procrastinated on for months, then did badly in a hurry. The model does them at 4 AM while I am at Fajr.
The morning after the LinkedIn and Tokopedia runs, I caught myself adding a third prompt to the cron before I had finished my Quran recital. That is the moment I knew this was a real shift, not a one-off trick. The cost of automation dropped below the cost of the task. The next bottleneck is no longer engineering. It is noticing what to automate.
That, I suspect, is the actual unlock. Not the 200-line harness, not the prompt pattern, not the model. The unlock is the part where you stop thinking “this would be faster to just do myself” and start thinking “what would I tell a model to do if I had one in front of me.” I had one in front of me. I just needed to get out of its way.
—
Playwright is the unsexy hero of this story. It is the layer that turns “the model wants to click a button” into “the button got clicked.” If you have not tried it yet, the install is one pip install playwright followed by playwright install chromium, and you can have a working browser in under two minutes. The hard part is not the tool. The hard part is the prompt.
Discover more from Susiloharjo
Subscribe to get the latest posts sent to your email.