Stories from the Web Crawling trenches in libxml2

The Complete Libxml2 C++ Cheatsheet

Author: Mohan Ganesan

Date: Oct 31, 2023

Libxml2 is a XML processing library written in C for use in C/C++ applications. It provides DOM, SAX, XMLReader, XPath and XPointer support.

How to Scrape All the Images from a Website with C++

Author: Mohan Ganesan

Date: Dec 13, 2023

Scraping and downloading images from a website using C++ libraries like libcurl and libxml2. Requires HTML, CSS, and programming knowledge.

Scraping Real Estate Listings From Realtor with C++

Author: Mohan Ganesan

Date: Jan 9, 2024

Web scraping tutorial in C++ using libcurl and libxml2 to extract data from Realtor.com listings.

Scraping eBay Listings with C++ and libcurl in 2023

Author: Mohan Ganesan

Date: Oct 5, 2023

Scrape and extract key data from eBay listings using C++ and the libcurl library.

The Ultimate HTMLParser Cheatsheet

Author: Mohan Ganesan

Date: Oct 31, 2023

HTMLParser is an Objective-C wrapper for libxml2 that allows parsing HTML documents. It provides an event-driven interface like NSXMLParser.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!