One Playwright Selector Trick Nobody Talks About: getByRole

Everyone reaches for page.locator(".some-class") first. They shouldn't. getByRole is the most stable selector in Playwright and almost nobody uses it for scraping. They think it's a testing-library thing. It's not. It's a way of asking the page "what is this element semantically" instead of "what classname does the design system happen to use this week." That distinction is what kept our Facebook video transcript actor running through three Facebook redesigns this past year. When does getByRole work? When the site is built by people who care about accessibility. Which is: more sites than you think, especially big ones with legal requirements (US government, EU compliance, large e-commerce). Check before you skip it: Open the accessibility tree in Chrome DevTools (Elements → Accessibility tab). If your target element shows a role and an accessible name, getByRole will find it. Buttons and headings are nearly always tagged correctly. Even sloppy sites give you role="button" and proper heading levels because the design system enforced it. Forms expose label even when the visual design hides it. getByLabel("Email") works on inputs that don't visibly show "Email" anywhere. Compare: // Class-name brittle const followBtn = page.locator('._a9-_._a9-_2._a9-_8._a9-_z'); // getByRole — survives layout changes const followBtn = page.getByRole('button', { name: /follow/i }); The first one breaks the day Facebook tweaks their CSS-in-JS hash. The second one keeps working until they remove the button entirely. Same for headings: // "Get the post title" const title = page.getByRole('heading', { level: 1 }); That works on every site that uses <h1> correctly. Which is most of them, because Google penalises sites that don't. The Facebook transcript actor extracts video metadata from public posts. Facebook ships A/B tests constantly — class names change every couple of weeks. Selectors built on _a9-_8 chains broke regularly. I rewrote the extractor to use getByRole for everything that had a meaningful role: Author name → getByRole('link', { name: /^[\w. ]+$/ }) near the post header. Post text → no role, but [data-ad-comet-preview="message"] (a data- attribute, also stable). Video player → getByRole('article') containing a <video> element. Before: ~8 selector breakages per quarter. After: 1 in the last 6 months, and that one was a real structural change (Facebook moved to a new post type), not a class rename. getByRole is now the first thing every new actor we write tries — including the rebuild of the Facebook AI Transcript Extractor. CSS-class selectors are reserved for the cases where the site's accessibility story is genuinely broken (rare in 2026 — most sites have been audited at least once). So: Open your scraper. Run a search for page.locator( with a CSS class chain. How many can you replace with getByRole? Drop the count in the comments — I'll bet it's more than half. Agree, disagree, or have a site where getByRole falls apart? Reply. Written by **Nova Chen, Automation Dev Advocate at SIÁN Agency. Find more from Nova on dev.to. For custom scraping or automation work, hire SIÁN Agency.

One Playwright Selector Trick Nobody Talks About: getByRole

Key Takeaways

Related Articles

Building a GitHub Stats MCP Server with Security Metrics

Your Agent Made a $500 Mistake. Who Pays?

Moonrepo: Open-Source Build Systems for LLMs

Add AudioObject Schema to Your Blog Posts

Discussion

One Playwright Selector Trick Nobody Talks About: getByRole

Key Takeaways

Related Articles

Building a GitHub Stats MCP Server with Security Metrics

Your Agent Made a $500 Mistake. Who Pays?

Moonrepo: Open-Source Build Systems for LLMs

Add AudioObject Schema to Your Blog Posts

Discussion

Related Articles

Dev.to
Building a GitHub Stats MCP Server with Security Metrics
👋 This is the second chapter of a series where I document what I'm learning about Model Context Protocol Architecture and Tool implementations In Chapter 1, I built a simple Calculator MCP Server. This time, I connected my MCP server to an external API, added the two other MCP structures (Resources

Dev.to
Your Agent Made a $500 Mistake. Who Pays?
Last month, American Express did something no other financial institution has done: they promised to cover losses when AI agents make purchasing errors. They called it Agent Purchase Protection. One company. Out of the entire global payments industry. That tells you everything about the state of age

Dev.to
Moonrepo: Open-Source Build Systems for LLMs
Moonrepo (YC W23) – Open-source build systems for the LLM era and developer focus We are moving away from the monolithic repository model that dominated software engineering for decades. That era of massive, unified build systems handling everything from source code to binary artifacts is giving w

Dev.to
Add AudioObject Schema to Your Blog Posts
You've invested time narrating your blog posts with natural-sounding voices. Readers can now listen instead of read. But here's the problem: Google doesn't know your audio exists unless you tell it in a language it understands. That language is structured data — specifically, schema.org's AudioObjec