Getting Started

To follow along with this code, you'll need to have R installed along with the rvest and stringr packages. Here is the install code:

install.packages("rvest")
install.packages("stringr")

Now let's jump right into the code...

Overview

This script does the following high-level steps:

Defines the URL to scrape (Realtor.com for San Francisco)
Sends a request to that URL
Checks if the request succeeded
Finds all real estate listing blocks on the page
Loops through each listing block to extract key details

Next we'll break down that last step of extracting data from each listing...

This is the listings page we are talking about…

Extracting Listing Data

The most complex part of this web scraping script is finding and extracting the specific data points about each real estate listing from the HTML.

Inspecting the element

When we inspect element in Chrome we can see that each of the listing blocks is wrapped in a div with a class value as shown below…

Within the loop through each listing block, different HTML selectors are used to locate specific elements and extract their text. Let's break this down selector-by-selector:

Broker Name

The broker name is extracted with this selector:

div.BrokerTitle_brokerTitle___ZkbBW

This finds the outer

tag with a class name that includes BrokerTitle_brokerTitle. Within that is a tag containing the actual text of the broker's name.

The R code uses this selector and extracts the name text like:

broker_info <- html_node(listing_block, "div.BrokerTitle_brokerTitle___ZkbBW")

broker_name <- str_trim(html_text(html_node(broker_info, "span.BrokerTitle_titleText___20u1P")))

It first finds that broker

, then within that finds the name , and extracts the text after trimming whitespace.

Status

The status (e.g. "For Sale") is extracted using:

div.message

This selector finds the

with class message that contains the status text.

The R code extracts it simply:

status <- str_trim(html_text(html_node(listing_block, "div.message")))

Finding that status

and getting its text content.

Price

The price selector is:

div.card-price

Which locates the

with class card-price containing the price text.

Extracted via:

price <- str_trim(html_text(html_node(listing_block, "div.card-price")))

Beds, Baths etc.

The key details like beds and baths use slightly more advanced selectors like:

li[data-testid='property-meta-beds']

This finds

tags with a data-testid attribute matching property-meta-beds. This attribute happens to contain the relevant metadata.

The code checks if these elements exist before extracting their text:

beds_element <- html_node(listing_block, "li[data-testid='property-meta-beds']")

beds <- ifelse(!is.null(beds_element), str_trim(html_text(beds_element)), "N/A")

First finding that beds

, then checking if it exists, and if so extracting the text, otherwise setting beds to "N/A".

This pattern is followed for baths, square footage, lot size, and any other metadata elements.

Address

Finally, the address selector is simple:

div.card-address

Finding the address

to extract the text from:

address <- str_trim(html_text(html_node(listing_block, "div.card-address")))

And that covers all the key fields extracted for each listing! As you can see it relies heavily on using selectors to pinpoint very specific DOM elements and extract just the needed text.

Full Code

For reference, here is the full runnable scraper code:

# Load necessary libraries
library(rvest)
library(stringr)

# Define the URL of the Realtor.com search page
url <- "https://www.realtor.com/realestateandhomes-search/San-Francisco_CA"

# Define a User-Agent header
user_agent <- "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"

# Send a GET request to the URL with the User-Agent header
page <- read_html(url, user_agent(user_agent))

# Check if the request was successful (status code 200)
if (http_status(page)$status_code == 200) {
  # Find all the listing blocks using the provided class name
  listing_blocks <- html_nodes(page, "div.BasePropertyCard_propertyCardWrap__J0xUj")

  # Loop through each listing block and extract information
  for (listing_block in listing_blocks) {
    # Extract the broker information
    broker_info <- html_node(listing_block, "div.BrokerTitle_brokerTitle__ZkbBW")
    broker_name <- str_trim(html_text(html_node(broker_info, "span.BrokerTitle_titleText__20u1P")))

    # Extract the status (e.g., For Sale)
    status <- str_trim(html_text(html_node(listing_block, "div.message")))

    # Extract the price
    price <- str_trim(html_text(html_node(listing_block, "div.card-price")))

    # Extract other details like beds, baths, sqft, and lot size
    beds_element <- html_node(listing_block, "li[data-testid='property-meta-beds']")
    baths_element <- html_node(listing_block, "li[data-testid='property-meta-baths']")
    sqft_element <- html_node(listing_block, "li[data-testid='property-meta-sqft']")
    lot_size_element <- html_node(listing_block, "li[data-testid='property-meta-lot-size']")

    # Check if the elements exist before extracting their text
    beds <- ifelse(!is.null(beds_element), str_trim(html_text(beds_element)), "N/A")
    baths <- ifelse(!is.null(baths_element), str_trim(html_text(baths_element)), "N/A")
    sqft <- ifelse(!is.null(sqft_element), str_trim(html_text(sqft_element)), "N/A")
    lot_size <- ifelse(!is.null(lot_size_element), str_trim(html_text(lot_size_element)), "N/A")

    # Extract the address
    address <- str_trim(html_text(html_node(listing_block, "div.card-address")))

    # Print the extracted information
    cat("Broker:", broker_name, "\n")
    cat("Status:", status, "\n")
    cat("Price:", price, "\n")
    cat("Beds:", beds, "\n")
    cat("Baths:", baths, "\n")
    cat("Sqft:", sqft, "\n")
    cat("Lot Size:", lot_size, "\n")
    cat("Address:", address, "\n")
    cat("-" * 50, "\n")  # Separating listings
  }
} else {
  cat("Failed to retrieve the page. Status code:", http_status(page)$status_code, "\n")
}

Scraping Real Estate Listings From Realtor in R

Getting Started

Overview

Extracting Listing Data

Broker Name

Status

Price

Beds, Baths etc.

Address

Full Code

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Scraping Real Estate Listings From Realtor in R

Getting Started

Overview

Extracting Listing Data

Broker Name

Status

Price

Beds, Baths etc.

Address

Full Code

The easiest way to do Web Scraping

Don't leave just yet!