Stories from the Web Crawling trenches in web crawling

How to install urllib in Python?

Author: Mohan Ganesan

Date: Feb 20, 2024

The urllib module in Python allows easy access to internet data and parsing URLs. It is a must-know module for every Python programmer.

Building a Simple Proxy Rotator with PHP and SimpleHTMLDOM

Author: Mohan Ganesan

Date: Oct 2, 2023

Implement a rotating proxy in PHP using free proxies from sslproxies.org. Use SimpleHTMLDOM and cURL to fetch and parse the proxies. Rotate IPs and User-Agent-String to avoid IP blocking with Proxies API.

Building a Simple Proxy Rotator with Ruby and Nokogiri

Author: Mohan Ganesan

Date: Oct 2, 2023

Fetch and use public proxies in Ruby projects using Nokogiri and free proxy lists. Scale to thousands of links with a rotating proxy service like Proxies API.

Building a Simple Proxy Rotator with JavaScript and Puppeteer

Author: Mohan Ganesan

Date: Oct 2, 2023

Fetch and parse proxies using Puppeteer and cheerio, and select a random proxy for JavaScript projects.

Building a Simple Proxy Rotator with C++ and libcurl

Author: Mohan Ganesan

Date: Oct 2, 2023

A simple proxy rotator in C++ using libcurl and RapidXML to fetch and parse proxies from sslproxies.org. Consider using a rotating proxy service for production use.

Building a Simple Proxy Rotator with Scala and Scraping

Author: Mohan Ganesan

Date: Oct 2, 2023

A simple Scala proxy rotator using ScalaJS for web scraping, fetching and parsing proxies periodically from a proxy site.

Building a Simple Proxy Rotator with Perl and Mojo

Author: Mohan Ganesan

Date: Oct 2, 2023

Use Mojo::UserAgent to fetch and parse proxy lists, extract proxies, refresh periodically, select a random proxy, and make proxied requests with LWP::UserAgent. Consider using a rotating proxy service like Proxies API to solve IP blocking problems.

Building a Simple Proxy Rotator with Objective-C

Author: Mohan Ganesan

Date: Oct 2, 2023

Fetch and parse proxies from free proxy pools to rotate and use in Objective-C projects, solving IP blocking problems with a rotating proxy service.

Is Scrapy free?

Author: Mohan Ganesan

Date: Feb 5, 2024

Scrapy is an open source web crawling and web scraping framework written in Python. It provides versatile crawling capabilities and has a thriving community.

Web Crawling vs Web Scraping: What's the Difference?

Author: Mohan Ganesan

Date: Jan 9, 2024

Web crawling and web scraping are automated processes for discovering new web pages and extracting specific data for analysis.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!