Stories from the Web Crawling trenches in LWP::UserAgent

Using Proxies in LWP::UserAgent in Perl in 2024

Author: Mohan Ganesan

Date: Jan 9, 2024

Proxies are essential for web scraping to prevent blocks. LWP::UserAgent makes it easy to configure proxies for large-scale scraping. Learn how to use proxies, handle proxy authentication, make SSL/HTTPS requests, and debug common issues.

Scraping All Images from a Website with Perl

Author: Mohan Ganesan

Date: Dec 13, 2023

Guide to scraping image URLs and data from a Wikipedia page using Perl script. Extracts names, groups, local names, and image URLs for dog breeds.

Downloading Images from a Website with Perl and Mojo::DOM

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to use Perl and modules like LWP::UserAgent and Mojo::DOM to download images of dog breeds from a Wikipedia page.

Scraping Multiple Pages in Perl with LWP::UserAgent and HTML::TreeBuilder

Author: Mohan Ganesan

Date: Oct 15, 2023

Web scraping in Perl using LWP::UserAgent and HTML::TreeBuilder modules to extract data from multiple pages. Use XPath queries and proxies for efficient data extraction.

Building a Simple Proxy Rotator with Perl and Mojo

Author: Mohan Ganesan

Date: Oct 2, 2023

Use Mojo::UserAgent to fetch and parse proxy lists, extract proxies, refresh periodically, select a random proxy, and make proxied requests with LWP::UserAgent. Consider using a rotating proxy service like Proxies API to solve IP blocking problems.

Scraping Yelp Business Listings Using Perl

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping is the process of extracting data from websites through automated scripts. This article provides a beginner tutorial on scraping business listings from Yelp using modules like HTML::TreeBuilder and LWP::UserAgent.

Scraping Craigslist Listings with Perl

Author: Mohan Ganesan

Date: Oct 1, 2023

Learn how to scrape Craigslist apartment listings using Perl and modules LWP::UserAgent and HTML::TreeBuilder. Avoid IP blocking with a rotating proxy server.

Scraping New York Times News Headlines in Perl

Author: Mohan Ganesan

Date: Dec 6, 2023

Learn how to scrape the NYT website using Perl, LWP::UserAgent, and Mojo::DOM. Extract headlines and links programmatically.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!