Stories from the Web Crawling trenches in Nokogiri

The Ultimate Nokogiri Cheat Sheet for Ruby

Author: Mohan Ganesan

Date: Oct 31, 2023

Nokogiri is a powerful HTML/XML parsing and scraping library for Ruby. This cheat sheet covers its extensive capabilities.

Downloading Images from a Website with Ruby and Nokogiri

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to use Ruby and Nokogiri to scrape data and images from HTML tables, download and save images, and overcome challenges like CAPTCHAs and IP blocks with Proxies API.

Building a Simple Proxy Rotator with Ruby and Nokogiri

Author: Mohan Ganesan

Date: Oct 2, 2023

Fetch and use public proxies in Ruby projects using Nokogiri and free proxy lists. Scale to thousands of links with a rotating proxy service like Proxies API.

Web Scraping with Ruby & ChatGPT

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping in Ruby with Nokogiri, Mechanize, and ChatGPT. Get code snippets and explanations for scraping tasks.

Scraping Wikipedia With Ruby

Author: Mohan Ganesan

Date: Dec 6, 2023

Wikipedia web scraping using Ruby's Nokogiri library to extract structured data from HTML tables.

Scraping all the Images from a Website with Ruby

Author: Mohan Ganesan

Date: Dec 13, 2023

Scraping dog breed information and images from Wikipedia using Ruby and Nokogiri library. Save locally with breed name, group, and local name.

Scraping Multiple Pages in Ruby with Nokogiri

Author: Mohan Ganesan

Date: Oct 15, 2023

Web scraping in Ruby using Nokogiri to extract data from multiple pages. Use base URL pattern, loop through pages, parse HTML, and extract data.

Web Scraping New York Times News Headlines in Ruby

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping allows automatic data extraction from websites. This article demonstrates web scraping using Ruby, Nokogiri, and Net::HTTP. It covers CSS selectors, handling errors, and overcoming IP blocks.

Scraping Reddit Posts with Ruby

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to scrape data from Reddit using Ruby, Nokogiri, and open-uri. Collect public data, analyze posting trends, and build Reddit bots or apps.

Scraping Craigslist Listings with Ruby

Author: Mohan Ganesan

Date: Oct 1, 2023

Learn how to scrape Craigslist apartment listings using Ruby and Nokogiri. Avoid IP blocking with a rotating proxy server.

Scraping eBay Listings with Ruby and Nokogiri in 2023

Author: Mohan Ganesan

Date: Oct 5, 2023

eBay is a large online marketplace. This tutorial explains how to scrape and extract data from eBay listings using Ruby and Nokogiri.

Scraping Booking.com Property Listings in Ruby in 2023

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to scrape property listings from Booking.com using Ruby, Nokogiri, and OpenURI libraries. Use proxies for scaling web scraping.

Scraping Yelp Business Listings using Ruby - A step by step guide

Author: Mohan Ganesan

Date: Dec 6, 2023

Learn how to scrape Yelp business listings using Ruby and Nokogiri, bypassing anti-bot mechanisms with premium proxies.

Scraping Real Estate Listings From Realtor with Ruby

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to use Ruby and the Nokogiri and HTTParty gems for web scraping, specifically for extracting real estate listing data from Realtor.com.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!