Stories from the Web Crawling trenches in HTML::TreeBuilder

The Ultimate HTML::TreeBuilder Cheatsheet in Perl

Author: Mohan Ganesan

Date: Oct 31, 2023

HTML::TreeBuilder is a Perl module for parsing and manipulating HTML and XML documents into a tree structure.

Web Scraping with Perl & ChatGPT

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping in Perl with ChatGPT assistance. Use HTML::TreeBuilder and WWW::Mechanize for data extraction. Generate code snippets and explanations with ChatGPT.

Scraping All Images from a Website with Perl

Author: Mohan Ganesan

Date: Dec 13, 2023

Guide to scraping image URLs and data from a Wikipedia page using Perl script. Extracts names, groups, local names, and image URLs for dog breeds.

Scraping Multiple Pages in Perl with LWP::UserAgent and HTML::TreeBuilder

Author: Mohan Ganesan

Date: Oct 15, 2023

Web scraping in Perl using LWP::UserAgent and HTML::TreeBuilder modules to extract data from multiple pages. Use XPath queries and proxies for efficient data extraction.

Scraping Yelp Business Listings Using Perl

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping is the process of extracting data from websites through automated scripts. This article provides a beginner tutorial on scraping business listings from Yelp using modules like HTML::TreeBuilder and LWP::UserAgent.

Scraping Craigslist Listings with Perl

Author: Mohan Ganesan

Date: Oct 1, 2023

Learn how to scrape Craigslist apartment listings using Perl and modules LWP::UserAgent and HTML::TreeBuilder. Avoid IP blocking with a rotating proxy server.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!