Stories from the Web Crawling trenches in speed

Speed Up HTTP Requests: When to Use http.client over requests

Author: Mohan Ganesan

Date: Feb 3, 2024

Python offers options for HTTP requests with http.client and requests. http.client is faster for simple requests, while requests is more feature-rich. Use http.client for speed and requests for complex applications.

Playwright vs Puppeteer for Web Scraping: How To Choose For Robust Data Extraction

Author: Mohan Ganesan

Date: Jan 9, 2024

Playwright and Puppeteer are both powerful tools for web scraping, but Puppeteer has an edge in speed and stealth capabilities, while Playwright excels in handling complex page state changes and offers a more flexible data extraction. Both libraries can serve most scraping needs, but Puppeteer is the top choice for advanced scenarios.

urllib Connection Pool in Python

Author: Mohan Ganesan

Date: Feb 8, 2024

Using a connection pool in Python's urllib module is a best practice for making multiple requests, boosting efficiency and speed.

Leveraging Sockets for Network Communication in Python

Author: Mohan Ganesan

Date: Feb 8, 2024

Sockets in Python provide a low-level networking interface for sending and receiving data across networks and the internet.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!