Stories from the Web Crawling trenches in manipulating

The Ultimate HTML::TreeBuilder Cheatsheet in Perl

Author: Mohan Ganesan

Date: Oct 31, 2023

HTML::TreeBuilder is a Perl module for parsing and manipulating HTML and XML documents into a tree structure.

The Ultimate DOMDocument Cheat Sheet for PHP

Author: Mohan Ganesan

Date: Oct 31, 2023

DOMDocument allows manipulating HTML/XML documents in PHP. This cheat sheet is a comprehensive reference for working with DOMDocument.

URL Parsing in Python with urllib.parse

Author: Mohan Ganesan

Date: Feb 6, 2024

Understanding and manipulating URLs is crucial for Python web programming. The urllib.parse module provides functions for parsing, composing, and manipulating URLs in Python.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!