Scraping Real Estate Listings From Realtor in Node.js

Jan 9, 2024 · 6 min read

Overview

We will write code to extract real estate listing data from Realtor.com for properties in San Francisco. Our code will make a request to the Realtor.com website, loads the HTML content of the search result page, then uses Cheerio to target specific elements in the HTML to extract useful information about each listed property.

This is the listings page we are talking about…

Some key things it will extract for every listing:

  • Broker name
  • Status (For Sale, etc.)
  • Price
  • Beds
  • Baths
  • Square footage
  • Lot size
  • Address
  • Let's dive into the details and see how it works!

    Importing Libraries

    const axios = require('axios');
    const cheerio = require('cheerio');
    

    First, we import Axios and Cheerio using require.

    Axios allows us to make HTTP requests to load content from websites.

    Cheerio allows us to parse and target parts of the loaded HTML using jQuery style selectors.

    Defining the URL

    const url = "<https://www.realtor.com/realestateandhomes-search/San-Francisco_CA>";
    

    The URL is defined for the Realtor.com page that will be scraped. We are targeting the San Francisco real estate listings specifically.

    Setting User Agent

    const headers = {
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
    };
    

    We set a User-Agent header that identifies the client making the requests as a Chrome browser. This helps avoid blocked requests.

    Making GET Request with Axios

    axios.get(url, {headers})
    .then(response => {
      // Process response
    });
    

    This axios GET request fetches the content of the defined Realtor.com URL.

    The headers are passed in to include the User-Agent string.

    Loading HTML and Using Cheerio

    const $ = cheerio.load(response.data);
    

    On a successful response, we load the HTML content from response.data into Cheerio using the load() method.

    This allows us to then use Cheerio to target elements on the page using CSS selectors.

    Finding Listing Blocks

    Inspecting the element

    When we inspect element in Chrome we can see that each of the listing blocks is wrapped in a div with a class value as shown below…

    const listing_blocks = $("div.BasePropertyCard_propertyCardWrap__J0xUj");
    

    Here we target all the

    elements with class BasePropertyCard_propertyCardWrap__J0xUj, which identifies the wrapping element for each listing block in the HTML document.

    This allows us to isolate listing content from the rest of the page.

    Extracting Listing Data

    Next, we loop through each listing block and use selectors to extract details:

    listing_blocks.each((index, element) => {
    
      // Extract broker name
      const broker_name = $(element).find("span.BrokerTitle_titleText__20u1P").text().trim();
    
      // Extract status
      const status = $(element).find("div.message").text().trim();
    
      // Extract price
      const price = $(element).find("div.card-price").text().trim();
    
      // Extract beds, baths, sqft, lot size
      const beds = $(element).find("li[data-testid='property-meta-beds']").text().trim() || "N/A";
    
      const baths = $(element).find("li[data-testid='property-meta-baths']").text().trim() || "N/A";
    
      // Extract address
      const address = $(element).find("div.card-address").text().trim();
    
    });
    

    Breaking down one example for beds:

    $(element)
    

    Refers to the current listing block element in the loop

    .find() lets us search within this block for our target elements

    li[data-testid='property-meta-beds'] selects the

  • element with matching attribute value

    Finally .text().trim() extracts the text inside this element, trimmed of whitespace

    The || "N/A" sets a fallback value if no beds data exists for that listing

    And the const beds saves this value for usage

    The process is similar for every other bit of data extracted.

    This allows us to pull all the relevant data for each individual property listing on the page.

    Console Output

    At the end, the code logs everything extracted to the console, listing by listing:

    Console.log("Broker:", broker_name);
    Console.log("Beds:", beds);
    Console.log("Address:", address);
    

    So the final output will show broker name, beds, baths, pricing, address etc. for every property listing extracted from the page.

    Conclusion

    That covers the key functionality of how this code is able to scrape multiple real estate listing details from Realtor.com leveraging Axios to load page content and Cheerio to parse and target specific HTML elements on the page.

    The key patterns allow it to structurally isolate listing blocks and then individually extract different data points for each listing.

    Full code:

    const axios = require('axios');
    const cheerio = require('cheerio');
    
    // Define the URL of the Realtor.com search page
    const url = "https://www.realtor.com/realestateandhomes-search/San-Francisco_CA";
    
    // Define a User-Agent header
    const headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36" // Replace with your User-Agent string
    };
    
    // Send a GET request to the URL with the User-Agent header using axios
    axios.get(url, { headers })
        .then(response => {
            // Check if the request was successful (status code 200)
            if (response.status === 200) {
                // Load the HTML content of the page using Cheerio
                const $ = cheerio.load(response.data);
                
                // Find all the listing blocks using the provided class name
                const listing_blocks = $("div.BasePropertyCard_propertyCardWrap__J0xUj");
    
                // Loop through each listing block and extract information
                listing_blocks.each((index, element) => {
                    // Extract the broker information
                    const broker_info = $(element).find("div.BrokerTitle_brokerTitle__ZkbBW");
                    const broker_name = broker_info.find("span.BrokerTitle_titleText__20u1P").text().trim();
    
                    // Extract the status (e.g., For Sale)
                    const status = $(element).find("div.message").text().trim();
    
                    // Extract the price
                    const price = $(element).find("div.card-price").text().trim();
    
                    // Extract other details like beds, baths, sqft, and lot size
                    const beds_element = $(element).find("li[data-testid='property-meta-beds']");
                    const baths_element = $(element).find("li[data-testid='property-meta-baths']");
                    const sqft_element = $(element).find("li[data-testid='property-meta-sqft']");
                    const lot_size_element = $(element).find("li[data-testid='property-meta-lot-size']");
    
                    // Check if the elements exist before extracting their text
                    const beds = beds_element.text().trim() || "N/A";
                    const baths = baths_element.text().trim() || "N/A";
                    const sqft = sqft_element.text().trim() || "N/A";
                    const lot_size = lot_size_element.text().trim() || "N/A";
    
                    // Extract the address
                    const address = $(element).find("div.card-address").text().trim();
    
                    // Print the extracted information
                    console.log("Broker:", broker_name);
                    console.log("Status:", status);
                    console.log("Price:", price);
                    console.log("Beds:", beds);
                    console.log("Baths:", baths);
                    console.log("Sqft:", sqft);
                    console.log("Lot Size:", lot_size);
                    console.log("Address:", address);
                    console.log("-".repeat(50)); // Separating listings
                });
            } else {
                console.log("Failed to retrieve the page. Status code:", response.status);
            }
        })
        .catch(error => {
            console.error("An error occurred:", error);
        });

    Browse by tags:

    Browse by language:

  • Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!