Stories from the Web Crawling trenches in metadata

Controlling HTTP Requests with urllib Headers

Author: Mohan Ganesan

Date: Feb 6, 2024

The Python urllib module provides a powerful way to make HTTP requests in your code. Headers allow you to specify important metadata about the request, like the user agent, authentication credentials, caching settings, and more.

Scraping Reddit Posts in Node.js

Author: Mohan Ganesan

Date: Jan 9, 2024

Guide to scraping image URLs from a Reddit page using Node.js, focusing on identifying and extracting post blocks with images and metadata.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!