Stories from the Web Crawling trenches in LWP::UserAgent

Using Proxies in LWP::UserAgent in Perl in 2024

Author: Mohan Ganesan

Date: Jan 9, 2024

Proxies are essential for web scraping to prevent blocks. LWP::UserAgent makes it easy to configure proxies for large-scale scraping. Learn how to use proxies, handle proxy authentication, make SSL/HTTPS requests, and debug common issues.

Downloading Images from a Website with Perl and Mojo::DOM

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to use Perl and modules like LWP::UserAgent and Mojo::DOM to download images of dog breeds from a Wikipedia page.

Scraping All Images from a Website with Perl

Author: Mohan Ganesan

Date: Dec 13, 2023

Guide to scraping image URLs and data from a Wikipedia page using Perl script. Extracts names, groups, local names, and image URLs for dog breeds.

Building a Simple Proxy Rotator with Perl and Mojo

Author: Mohan Ganesan

Date: Oct 2, 2023

Use Mojo::UserAgent to fetch and parse proxy lists, extract proxies, refresh periodically, select a random proxy, and make proxied requests with LWP::UserAgent. Consider using a rotating proxy service like Proxies API to solve IP blocking problems.

Scraping Craigslist Listings with Perl

Author: Mohan Ganesan

Date: Oct 1, 2023

Learn how to scrape Craigslist apartment listings using Perl and modules LWP::UserAgent and HTML::TreeBuilder. Avoid IP blocking with a rotating proxy server.

Scraping Multiple Pages in Perl with LWP::UserAgent and HTML::TreeBuilder

Author: Mohan Ganesan

Date: Oct 15, 2023

Web scraping in Perl using LWP::UserAgent and HTML::TreeBuilder modules to extract data from multiple pages. Use XPath queries and proxies for efficient data extraction.

Scraping New York Times News Headlines in Perl

Author: Mohan Ganesan

Date: Dec 6, 2023

Learn how to scrape the NYT website using Perl, LWP::UserAgent, and Mojo::DOM. Extract headlines and links programmatically.

Scraping Yelp Business Listings Using Perl

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping is the process of extracting data from websites through automated scripts. This article provides a beginner tutorial on scraping business listings from Yelp using modules like HTML::TreeBuilder and LWP::UserAgent.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!