Stories from the Web Crawling trenches in manipulation

The Ultimate Loofah Cheatsheet for Ruby

Author: Mohan Ganesan

Date: Nov 4, 2023

Loofah is a Ruby library for parsing and manipulating HTML/XML documents. It provides a simple API for traversing, manipulating, and extracting data from markup. It also offers XSS sanitization and integrates with Rails. Loofah is built on top of Nokogiri, providing speed and Ruby idioms.

The Ultimate HTML::Parser Perl Cheat Sheet

Author: Mohan Ganesan

Date: Oct 31, 2023

HTML::Parser is a Perl module for parsing HTML/XML documents and extracting/manipulating their content.

The Ultimate KSoup Cheatsheet for Kotlin

Author: Mohan Ganesan

Date: Oct 31, 2023

KSoup is an HTML parser for Kotlin that provides a convenient DSL for extracting and manipulating data from HTML documents.

The Ultimate Gumbo C++ Cheatsheet

Author: Mohan Ganesan

Date: Oct 31, 2023

Gumbo is an HTML5 parsing library in C++ that allows for easy manipulation and extraction of HTML. It provides various functions for selecting, traversing, and manipulating nodes in the DOM.

The Complete HTML Agility Pack Cheat Sheet in VB

Author: Mohan Ganesan

Date: Oct 31, 2023

HTML Agility Pack is an HTML parser for .NET that allows easy manipulation and data extraction from HTML documents.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!