Scraping Real Estate Listings From Realtor with Ruby

Web scraping can be a valuable skill for extracting publicly available data from websites. This article will explain how to use Ruby and the Nokogiri and HTTParty gems to scrape real estate listing data from Realtor.com.

This is the listings page we are talking about…

Installation

Before running the code, you'll need to install Ruby, Nokogiri, and HTTParty:

# Install Ruby
$ sudo apt install ruby-full

# Install bundler
$ gem install bundler

# Install Nokogiri
$ gem install nokogiri

# Install HTTParty
$ gem install httparty

When installed correctly, you should see version information when checking:

$ ruby -v
ruby 2.7.1p83

$ gem list | grep nokogiri
nokogiri (1.13.10)

$ gem list | grep httparty
httparty (0.20.0)

Now we're ready to run the scraper!

Understanding the Script Flow

Let's break down what the script is doing at a high level:

Require the nokogiri and httparty libraries we installed
Define the target URL to scrape (https://www.realtor.com/...)
Set a user agent header to look like a real web browser
Make an HTTP GET request to fetch the page content
Check if the request succeeded (status code 200)
If successful, parse the HTML content using Nokogiri
Find all listing blocks by their CSS class name
Loop through each listing block
Extract data like price, beds, broker name etc. by CSS selector
Print out the extracted data

The key thing to understand is that first we fetch the page HTML, then we use CSS selectors to pinpoint specific data elements in that HTML and extract their text.

Next let's see exactly how the data extraction works.

Extracting Data from Listing Blocks

Inspecting the element

When we inspect element in Chrome we can see that each of the listing blocks is wrapped in a div with a class value as shown below…

The most complex part of this scraper is extracting multiple data fields from each listing block.

Here is the loop that processes each listing:

listing_blocks.each do |listing_block|

  # Extract the broker information
  broker_info = listing_block.css(".BrokerTitle_brokerTitle__ZkbBW")
  broker_name = broker_info.css(".BrokerTitle_titleText__20u1P").text.strip()

  # Extract price
  price = listing_block.css(".card-price").text.strip()

  # Extract beds, baths
  beds_element = listing_block.css("[data-testid='property-meta-beds']")
  baths_element = listing_block.css("[data-testid='property-meta-baths']")

  # Extract address
  address = listing_block.css(".card-address").text.strip()
end

Let's look at how the broker name field is extracted:

broker_info = listing_block.css(".BrokerTitle_brokerTitle__ZkbBW")
broker_name = broker_info.css(".BrokerTitle_titleText__20u1P").text.strip()

First, we use the CSS selector .BrokerTitle_brokerTitle__ZkbBW to find the block containing broker info. Then we drill down one level further with .BrokerTitle_titleText__20u1P to pinpoint just the broker name text element. Finally, we call .text to extract the text content and .strip() to clean surrounding whitespace.

The price extraction follows the same approach:

price = listing_block.css(".card-price").text.strip()

We use the .card-price CSS class to target the price text specifically.

One tricky part is handling fields like beds and baths that may be missing:

beds_element = listing_block.css("[data-testid='property-meta-beds']")

beds = beds_element.text.strip() unless beds_element.empty?

First we try to select the beds element. Then we check if that returned no results by calling .empty? Before extracting text, otherwise we'd get an error.

Full Code Example

Below is the full runnable script to scrape Realtor listings:

require 'nokogiri'
require 'httparty'

# Define the URL of the Realtor.com search page
url = "https://www.realtor.com/realestateandhomes-search/San-Francisco_CA"

# Define a User-Agent header
headers = {
  "User-Agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"  # Replace with your User-Agent string
}

# Send a GET request to the URL with the User-Agent header
response = HTTParty.get(url, headers: headers)

# Check if the request was successful (status code 200)
if response.code == 200
  # Parse the HTML content of the page using Nokogiri
  doc = Nokogiri::HTML(response.body)

  # Find all the listing blocks using the provided class name
  listing_blocks = doc.css(".BasePropertyCard_propertyCardWrap__J0xUj")

  # Loop through each listing block and extract information
  listing_blocks.each do |listing_block|
    # Extract the broker information
    broker_info = listing_block.css(".BrokerTitle_brokerTitle__ZkbBW")
    broker_name = broker_info.css(".BrokerTitle_titleText__20u1P").text.strip()

    # Extract the status (e.g., For Sale)
    status = listing_block.css(".message").text.strip()

    # Extract the price
    price = listing_block.css(".card-price").text.strip()

    # Extract other details like beds, baths, sqft, and lot size
    beds_element = listing_block.css("[data-testid='property-meta-beds']")
    baths_element = listing_block.css("[data-testid='property-meta-baths']")
    sqft_element = listing_block.css("[data-testid='property-meta-sqft']")
    lot_size_element = listing_block.css("[data-testid='property-meta-lot-size']")

    # Check if the elements exist before extracting their text
    beds = beds_element.text.strip() unless beds_element.empty?
    baths = baths_element.text.strip() unless baths_element.empty?
    sqft = sqft_element.text.strip() unless sqft_element.empty?
    lot_size = lot_size_element.text.strip() unless lot_size_element.empty?

    # Extract the address
    address = listing_block.css(".card-address").text.strip()

    # Print the extracted information
    puts "Broker: #{broker_name}"
    puts "Status: #{status}"
    puts "Price: #{price}"
    puts "Beds: #{beds || 'N/A'}"
    puts "Baths: #{baths || 'N/A'}"
    puts "Sqft: #{sqft || 'N/A'}"
    puts "Lot Size: #{lot_size || 'N/A'}"
    puts "Address: #{address}"
    puts "-" * 50  # Separating listings
  end
else
  puts "Failed to retrieve the page. Status code: #{response.code}"
end

The key things to remember are:

Use CSS selectors to target specific text elements

Handle missing data gracefully with checks

Don't modify the literal class/ID strings as they depend on the live site markup

With some debugging and tweaking, you can adapt this approach to build scrapers extracting all sorts of data!

Scraping Real Estate Listings From Realtor with Ruby

Installation

Understanding the Script Flow

Extracting Data from Listing Blocks

Full Code Example

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Scraping Real Estate Listings From Realtor with Ruby

Installation

Understanding the Script Flow

Extracting Data from Listing Blocks

Full Code Example

The easiest way to do Web Scraping

Don't leave just yet!