Stories from the Web Crawling trenches in Playwright

The Complete Playwright Cheatsheet

Author: Mohan Ganesan

Date: Dec 21, 2023

Playwright is a Node.js library for cross-browser end-to-end testing across Chromium, Firefox, and WebKit.

How to Use Proxy in Playwright in 2024

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to use proxies in Playwright for web scraping to avoid IP blocks, authenticate proxies, configure proxy protocols, intercept network traffic, and more.

Playwright vs Puppeteer: A Side-by-Side Comparison for Test Automation

Author: Mohan Ganesan

Date: Jan 9, 2024

Playwright and Puppeteer are popular browser testing tools that offer speed, capabilities, and reliability. Playwright has an advantage in terms of speed, browser support, and API design. Both tools are suitable for web app testing, but Puppeteer is recommended for web scraping tasks.

Playwright vs Puppeteer for Web Scraping: How To Choose For Robust Data Extraction

Author: Mohan Ganesan

Date: Jan 9, 2024

Playwright and Puppeteer are both powerful tools for web scraping, but Puppeteer has an edge in speed and stealth capabilities, while Playwright excels in handling complex page state changes and offers a more flexible data extraction. Both libraries can serve most scraping needs, but Puppeteer is the top choice for advanced scenarios.

What are the limitations of BeautifulSoup?

Author: Mohan Ganesan

Date: Feb 5, 2024

BeautifulSoup is a Python library for parsing and extracting data from HTML and XML documents. It struggles with modern JavaScript sites and cannot bypass most bot protections. CSS selectors and navigation logic can get complex. Consider alternatives like Scrapy, Puppeteer, or Playwright for professional web scraping.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!