Stories from the Web Crawling trenches in Gumbo

Scraping Data from Wikipedia in C++

Author: Mohan Ganesan

Date: Dec 6, 2023

Scraping Wikipedia using cURL and Gumbo to extract details on US presidents from a table.

Scraping Yelp Business Listings with C++

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping article on extracting business listing data from Yelp using C++ and libraries libcurl and Gumbo.

Scraping New York Times News Headlines in C++

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping is a technique for extracting data from websites using C++. This article explains how to scrape article titles and links from The New York Times. It covers concepts like HTTP requests, HTML structure, libcurl, and Gumbo. It also mentions the challenges of IP blocking and suggests using a rotating proxy service like Proxies API.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!