Stories from the Web Crawling trenches in DOM

The Complete Libxml2 C++ Cheatsheet

Author: Mohan Ganesan

Date: Oct 31, 2023

Libxml2 is a XML processing library written in C for use in C/C++ applications. It provides DOM, SAX, XMLReader, XPath and XPointer support.

The Ultimate Gumbo C++ Cheatsheet

Author: Mohan Ganesan

Date: Oct 31, 2023

Gumbo is an HTML5 parsing library in C++ that allows for easy manipulation and extraction of HTML. It provides various functions for selecting, traversing, and manipulating nodes in the DOM.

Scraping Booking.com Property Listings with JavaScript in 2023

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to scrape property listings from Booking.com using JavaScript. Use Axios and Cheerio to fetch HTML content and extract details like property name, location, ratings, etc.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!