One Playwright Selector Trick Nobody Talks About: getByRole
Everyone reaches for page.locator(".some-class") first. They shouldn't. getByRole is the most stable selector in Playwright and almost nobody uses it for scraping. They think it's a testing-library thing. It's not. It's a way of asking the page "what is this element semantically" instead of "what cl

Everyone reaches for page.locator(".some-class") first. They shouldn't. getByRole is the most stable selector in Playwright and almost nobody uses it for scraping. They think it's a testing-library thing. It's not. It's a way of asking the page "what is this element semantically" instead of "what classname does the design system happen to use this week." That distinction is what kept our Facebook video transcript actor running through three Facebook redesigns this past year. When does getByRole work? When the site is built by people who care about accessibility. Which is: more sites than you think, especially big ones with legal requirements (US government, EU compliance, large e-commerce). Check before you skip it: Open the accessibility tree in Chrome DevTools (Elements โ Accessibility tab). If your target element shows a role and an accessible name, getByRole will find it. Buttons and headings are nearly always tagged correctly. Even sloppy sites give you role="button" and proper heading levels because the design system enforced it. Forms expose label even when the visual design hides it. getByLabel("Email") works on inputs that don't visibly show "Email" anywhere. Compare: // Class-name brittle const followBtn = page.locator('._a9-_._a9-_2._a9-_8._a9-_z'); // getByRole โ survives layout changes const followBtn = page.getByRole('button', { name: /follow/i }); The first one breaks the day Facebook tweaks their CSS-in-JS hash. The second one keeps working until they remove the button entirely. Same for headings: // "Get the post title" const title = page.getByRole('heading', { level: 1 }); That works on every site that uses <h1> correctly. Which is most of them, because Google penalises sites that don't. The Facebook transcript actor extracts video metadata from public posts. Facebook ships A/B tests constantly โ class names change every couple of weeks. Selectors built on _a9-_8 chains broke regularly. I rewrote the extractor to use getByRole for everything that had a meaningful role: Author name โ getByRole('link', { name: /^[\w. ]+$/ }) near the post header. Post text โ no role, but [data-ad-comet-preview="message"] (a data- attribute, also stable). Video player โ getByRole('article') containing a <video> element. Before: ~8 selector breakages per quarter. After: 1 in the last 6 months, and that one was a real structural change (Facebook moved to a new post type), not a class rename. getByRole is now the first thing every new actor we write tries โ including the rebuild of the Facebook AI Transcript Extractor. CSS-class selectors are reserved for the cases where the site's accessibility story is genuinely broken (rare in 2026 โ most sites have been audited at least once). So: Open your scraper. Run a search for page.locator( with a CSS class chain. How many can you replace with getByRole? Drop the count in the comments โ I'll bet it's more than half. Agree, disagree, or have a site where getByRole falls apart? Reply. Written by **Nova Chen, Automation Dev Advocate at SIรN Agency. Find more from Nova on dev.to. For custom scraping or automation work, hire SIรN Agency.
Key Takeaways
- โขEveryone reaches for page.locator(".some-class") first
- โขThis story was reported by Dev.to, covering developments in the dev space.
- โขAI advancements continue to reshape industries โ read the full article on Dev.to for complete coverage.
๐ Continue reading the full article:
Read Full Article on Dev.to โShare this article



