Stories from the Web Crawling trenches in HTTP request

Sending Multipart Form Data with Python Requests

Author: Mohan Ganesan

Date: Feb 3, 2024

When building web applications in Python, you may need to send multipart form data in an HTTP request. Here are some troubleshooting tips for sending multipart form data with Requests.

Customizing the User Agent for urllib in Python

Author: Mohan Ganesan

Date: Feb 6, 2024

Customize the user agent string in Python's urllib library to mimic a web browser, identify your application, or adhere to site requirements.

Scraping All Images from a Website with R

Author: Mohan Ganesan

Date: Dec 13, 2023

Scrape web pages using R libraries, send HTTP requests, parse HTML, extract data, download images, and overcome IP blocking with a rotating proxy server.

Debugging Empty Responses from HTTP Requests in Python

Author: Mohan Ganesan

Date: Feb 3, 2024

Frustrated with empty response bodies in Python HTTP requests? Check response body format, content encoding, decode response bytes, log full response details, test in Postman.

Scraping All the Images from a Website with Go

Author: Mohan Ganesan

Date: Dec 13, 2023

This Go program scrapes dog breed images from a Wikipedia page using web scraping and goquery package.

Scraping Reddit Posts in Node.js

Author: Mohan Ganesan

Date: Jan 9, 2024

Guide to scraping image URLs from a Reddit page using Node.js, focusing on identifying and extracting post blocks with images and metadata.

Web Scraping Wikipedia in Scala

Author: Mohan Ganesan

Date: Dec 6, 2023

Wikipedia scraping using Scala and Jsoup to extract structured data from tables. Simplified steps include importing libraries, defining URL, setting user agent, sending HTTP request, parsing HTML, extracting data, and printing scraped data.

Understanding the Aiohttp Request Object in Python

Author: Mohan Ganesan

Date: Mar 3, 2024

The aiohttp request object provides valuable information about incoming HTTP requests in Python web applications.

Scraping all the Images from a Website with Ruby

Author: Mohan Ganesan

Date: Dec 13, 2023

Scraping dog breed information and images from Wikipedia using Ruby and Nokogiri library. Save locally with breed name, group, and local name.

Scraping Wikipedia With Ruby

Author: Mohan Ganesan

Date: Dec 6, 2023

Wikipedia web scraping using Ruby's Nokogiri library to extract structured data from HTML tables.

Scraping Real Estate Listings From Realtor with Objective C

Author: Mohan Ganesan

Date: Jan 9, 2024

Web scraping code in Objective-C using NSXMLParser to extract real estate listing data from Realtor.com.

Scraping All Images from a Website with Scala

Author: Mohan Ganesan

Date: Dec 13, 2023

Learn how to use Scala and Jsoup to scrape images from a website. Make HTTP requests, extract data from HTML, and download images.

Scraping Real Estate Listings From Realtor Using Rust

Author: Mohan Ganesan

Date: Jan 9, 2024

Web scraping article using Rust programming language to extract real estate listing data from Realtor.com using HTML parsing and HTTP requests.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!