Stories from the Web Crawling trenches in Ruby

The Ultimate Loofah Cheatsheet for Ruby

Author: Mohan Ganesan

Date: Nov 4, 2023

Loofah is a Ruby library for parsing and manipulating HTML/XML documents. It provides a simple API for traversing, manipulating, and extracting data from markup. It also offers XSS sanitization and integrates with Rails. Loofah is built on top of Nokogiri, providing speed and Ruby idioms.

The Ultimate Nokogiri Cheat Sheet for Ruby

Author: Mohan Ganesan

Date: Oct 31, 2023

Nokogiri is a powerful HTML/XML parsing and scraping library for Ruby. This cheat sheet covers its extensive capabilities.

Web Scraping with Ruby & ChatGPT

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping in Ruby with Nokogiri, Mechanize, and ChatGPT. Get code snippets and explanations for scraping tasks.

Building a Simple Proxy Rotator with Ruby and Nokogiri

Author: Mohan Ganesan

Date: Oct 2, 2023

Fetch and use public proxies in Ruby projects using Nokogiri and free proxy lists. Scale to thousands of links with a rotating proxy service like Proxies API.

Building a Super Simple HTTP Proxy in Ruby in just 9 lines of code

Author: Mohan Ganesan

Date: Oct 1, 2023

Building a Simple HTTP Proxy in Ruby. Learn how to create a basic HTTP proxy using Ruby's socket library and net/http. Also, discover the importance of using a rotating proxy service to avoid IP blocking.

Downloading Images from a Website with Ruby and Nokogiri

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to use Ruby and Nokogiri to scrape data and images from HTML tables, download and save images, and overcome challenges like CAPTCHAs and IP blocks with Proxies API.

Scrape Any Website with OpenAI Function Calling in Ruby

Author: Mohan Ganesan

Date: Sep 25, 2023

Web scraping with OpenAI in Ruby allows for resilient data extraction from HTML using function calling.

Making HTTP Requests in Ruby with the httpx Gem

Author: Mohan Ganesan

Date: Feb 5, 2024

The httpx gem provides a simple and flexible way to make HTTP requests in Ruby, with features like persistent connections and timeouts. It's great for APIs, web scraping, and tasks involving HTTP requests.

Scraping Multiple Pages in Ruby with Nokogiri

Author: Mohan Ganesan

Date: Oct 15, 2023

Web scraping in Ruby using Nokogiri to extract data from multiple pages. Use base URL pattern, loop through pages, parse HTML, and extract data.

Scraping Booking.com Property Listings in Ruby in 2023

Author: Mohan Ganesan

Date: Oct 15, 2023

Learn how to scrape property listings from Booking.com using Ruby, Nokogiri, and OpenURI libraries. Use proxies for scaling web scraping.

Scraping all the Images from a Website with Ruby

Author: Mohan Ganesan

Date: Dec 13, 2023

Scraping dog breed information and images from Wikipedia using Ruby and Nokogiri library. Save locally with breed name, group, and local name.

Scraping Reddit Posts with Ruby

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to scrape data from Reddit using Ruby, Nokogiri, and open-uri. Collect public data, analyze posting trends, and build Reddit bots or apps.

Scraping Craigslist Listings with Ruby

Author: Mohan Ganesan

Date: Oct 1, 2023

Learn how to scrape Craigslist apartment listings using Ruby and Nokogiri. Avoid IP blocking with a rotating proxy server.

Scraping Yelp Business Listings using Ruby - A step by step guide

Author: Mohan Ganesan

Date: Dec 6, 2023

Learn how to scrape Yelp business listings using Ruby and Nokogiri, bypassing anti-bot mechanisms with premium proxies.

Web Scraping New York Times News Headlines in Ruby

Author: Mohan Ganesan

Date: Dec 6, 2023

Web scraping allows automatic data extraction from websites. This article demonstrates web scraping using Ruby, Nokogiri, and Net::HTTP. It covers CSS selectors, handling errors, and overcoming IP blocks.

Scraping eBay Listings with Ruby and Nokogiri in 2023

Author: Mohan Ganesan

Date: Oct 5, 2023

eBay is a large online marketplace. This tutorial explains how to scrape and extract data from eBay listings using Ruby and Nokogiri.

Scraping Real Estate Listings From Realtor with Ruby

Author: Mohan Ganesan

Date: Jan 9, 2024

Learn how to use Ruby and the Nokogiri and HTTParty gems for web scraping, specifically for extracting real estate listing data from Realtor.com.

Downloading Images from URLs in Ruby

Author: Mohan Ganesan

Date: May 5, 2024

Learn how to download images from URLs using Ruby with OpenURI, Net::HTTP, HTTParty, Down, and Faraday.

Tired of getting blocked while scraping the web?

ProxiesAPI handles headless browsers and rotates proxies for you.
Get access to 1,000 free API credits, no credit card required!