Scraping Real Estate Listings From Realtor in Node.js

Overview

We will write code to extract real estate listing data from Realtor.com for properties in San Francisco. Our code will make a request to the Realtor.com website, loads the HTML content of the search result page, then uses Cheerio to target specific elements in the HTML to extract useful information about each listed property.

This is the listings page we are talking about…

Some key things it will extract for every listing:

Broker name

Status (For Sale, etc.)

Price

Beds

Baths

Square footage

Lot size

Address

Let's dive into the details and see how it works!

Importing Libraries

const axios = require('axios');
const cheerio = require('cheerio');

First, we import Axios and Cheerio using require.

Axios allows us to make HTTP requests to load content from websites.

Cheerio allows us to parse and target parts of the loaded HTML using jQuery style selectors.

Defining the URL

const url = "<https://www.realtor.com/realestateandhomes-search/San-Francisco_CA>";

The URL is defined for the Realtor.com page that will be scraped. We are targeting the San Francisco real estate listings specifically.

Setting User Agent

const headers = {
  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
};

We set a User-Agent header that identifies the client making the requests as a Chrome browser. This helps avoid blocked requests.

Making GET Request with Axios

axios.get(url, {headers})
.then(response => {
  // Process response
});

This axios GET request fetches the content of the defined Realtor.com URL.

The headers are passed in to include the User-Agent string.

Loading HTML and Using Cheerio

const $ = cheerio.load(response.data);

On a successful response, we load the HTML content from response.data into Cheerio using the load() method.

This allows us to then use Cheerio to target elements on the page using CSS selectors.

Finding Listing Blocks

Inspecting the element

When we inspect element in Chrome we can see that each of the listing blocks is wrapped in a div with a class value as shown below…

const listing_blocks = $("div.BasePropertyCard_propertyCardWrap__J0xUj");

Here we target all the

elements with class BasePropertyCard_propertyCardWrap__J0xUj, which identifies the wrapping element for each listing block in the HTML document.

This allows us to isolate listing content from the rest of the page.

Extracting Listing Data

Next, we loop through each listing block and use selectors to extract details:

listing_blocks.each((index, element) => {

  // Extract broker name
  const broker_name = $(element).find("span.BrokerTitle_titleText__20u1P").text().trim();

  // Extract status
  const status = $(element).find("div.message").text().trim();

  // Extract price
  const price = $(element).find("div.card-price").text().trim();

  // Extract beds, baths, sqft, lot size
  const beds = $(element).find("li[data-testid='property-meta-beds']").text().trim() || "N/A";

  const baths = $(element).find("li[data-testid='property-meta-baths']").text().trim() || "N/A";

  // Extract address
  const address = $(element).find("div.card-address").text().trim();

});

Breaking down one example for beds:

$(element)

Refers to the current listing block element in the loop

.find() lets us search within this block for our target elements

li[data-testid='property-meta-beds'] selects the

element with matching attribute value

Finally .text().trim() extracts the text inside this element, trimmed of whitespace

The || "N/A" sets a fallback value if no beds data exists for that listing

And the const beds saves this value for usage

The process is similar for every other bit of data extracted.

This allows us to pull all the relevant data for each individual property listing on the page.

Console Output

At the end, the code logs everything extracted to the console, listing by listing:

Console.log("Broker:", broker_name);
Console.log("Beds:", beds);
Console.log("Address:", address);

So the final output will show broker name, beds, baths, pricing, address etc. for every property listing extracted from the page.

Conclusion

That covers the key functionality of how this code is able to scrape multiple real estate listing details from Realtor.com leveraging Axios to load page content and Cheerio to parse and target specific HTML elements on the page.

The key patterns allow it to structurally isolate listing blocks and then individually extract different data points for each listing.

Full code:

const axios = require('axios');
const cheerio = require('cheerio');

// Define the URL of the Realtor.com search page
const url = "https://www.realtor.com/realestateandhomes-search/San-Francisco_CA";

// Define a User-Agent header
const headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36" // Replace with your User-Agent string
};

// Send a GET request to the URL with the User-Agent header using axios
axios.get(url, { headers })
    .then(response => {
        // Check if the request was successful (status code 200)
        if (response.status === 200) {
            // Load the HTML content of the page using Cheerio
            const $ = cheerio.load(response.data);
            
            // Find all the listing blocks using the provided class name
            const listing_blocks = $("div.BasePropertyCard_propertyCardWrap__J0xUj");

            // Loop through each listing block and extract information
            listing_blocks.each((index, element) => {
                // Extract the broker information
                const broker_info = $(element).find("div.BrokerTitle_brokerTitle__ZkbBW");
                const broker_name = broker_info.find("span.BrokerTitle_titleText__20u1P").text().trim();

                // Extract the status (e.g., For Sale)
                const status = $(element).find("div.message").text().trim();

                // Extract the price
                const price = $(element).find("div.card-price").text().trim();

                // Extract other details like beds, baths, sqft, and lot size
                const beds_element = $(element).find("li[data-testid='property-meta-beds']");
                const baths_element = $(element).find("li[data-testid='property-meta-baths']");
                const sqft_element = $(element).find("li[data-testid='property-meta-sqft']");
                const lot_size_element = $(element).find("li[data-testid='property-meta-lot-size']");

                // Check if the elements exist before extracting their text
                const beds = beds_element.text().trim() || "N/A";
                const baths = baths_element.text().trim() || "N/A";
                const sqft = sqft_element.text().trim() || "N/A";
                const lot_size = lot_size_element.text().trim() || "N/A";

                // Extract the address
                const address = $(element).find("div.card-address").text().trim();

                // Print the extracted information
                console.log("Broker:", broker_name);
                console.log("Status:", status);
                console.log("Price:", price);
                console.log("Beds:", beds);
                console.log("Baths:", baths);
                console.log("Sqft:", sqft);
                console.log("Lot Size:", lot_size);
                console.log("Address:", address);
                console.log("-".repeat(50)); // Separating listings
            });
        } else {
            console.log("Failed to retrieve the page. Status code:", response.status);
        }
    })
    .catch(error => {
        console.error("An error occurred:", error);
    });

Scraping Real Estate Listings From Realtor in Node.js

Overview

Importing Libraries

Defining the URL

Setting User Agent

Making GET Request with Axios

Loading HTML and Using Cheerio

Finding Listing Blocks

Extracting Listing Data

Console Output

Conclusion

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Scraping Real Estate Listings From Realtor in Node.js

Overview

Importing Libraries

Defining the URL

Setting User Agent

Making GET Request with Axios

Loading HTML and Using Cheerio

Finding Listing Blocks

Extracting Listing Data

Console Output

Conclusion

The easiest way to do Web Scraping

Don't leave just yet!