In this article, we will learn how to scrape real estate listings from Realtor.com using PHP and cURL. The full code is provided at the end for reference as we go through each section in detail.

We will cover:

Installing requirements like PHP and cURL

Defining the target URL

Setting headers like the user agent

Initializing and configuring cURL

Sending the request and handling the response

Using DOMDocument and XPath to extract data

Outputting and formatting the scraped listing details

This is the listings page we are talking about…

Let's get started!

Installation

Before running the scraping script, some prerequisites need to be installed:

Installing/Enabling cURL

This script also requires the cURL extension for PHP which allows making HTTP requests in PHP.

cURL may already be enabled depending on your PHP installation. You can check if it is enabled by running:

php -m | grep curl

If you see "curl" in the output, then cURL is installed and you should be good to go.

If cURL is not enabled, follow these instructions to install and enable it for PHP.

In short, on Ubuntu/Debian:

sudo apt-get install php-curl
sudo service apache2 restart

And the cURL extension should now be enabled.

Okay, with the prerequisites out of the way, let's look at the code!

Defining the Target URL

First, we define the root URL of the Realtor.com page we want to scrape:

$url = "<https://www.realtor.com/realestateandhomes-search/San-Francisco\\_CA>";

This URL targets real estate listings in San Francisco.

When web scraping, always double check you have the right homepage/search URL to scrape from! This saves trouble later.

Setting Request Headers with User-Agent

Many websites block default browser user agents these days to prevent scraping. So it's important to spoof a browser:

$headers = [
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
];

We define a heades array containing a typical Chrome desktop user agent string. This mimics a real browser visit to avoid blocks.

There are services where you can generate browser user agents for free. Or grab them from your browsers dev tools Network tab on a site.

Initializing and Configuring cURL

With target prepared, we can initialize a cURL session which will handle connecting to the page and transferring data:

$ch = curl_init();

Next we set a number of cURL options to configure our request:

curl_setopt($ch, CURLOPT_URL, $url);

curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

Let's understand them:

CURLOPT_URL: The target request URL we defined earlier.

CURLOPT_HTTPHEADER: Sets custom headers, here our user agent string.

CURLOPT_RETURNTRANSFER: Returns output directly instead of printing, to allow catching in a variable.

These are the main options needed. There are many more available to configure proxies, authentication, cookies, etc.

Sending the Request and Processing Output

With cURL initialized, headers set, and options configured - we send the GET request:

$response = curl_exec($ch);

This executes the cURL request defined by our handle $ch and returns output in $response.

We can check if the request succeeded:

if ($response && curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200) {
   // Request succeeded!
} else {
   // Request failed
}

Here we:

Check if $response contains a value
Use curl_getinfo() to get the HTTP status code and compare to 200 OK

This protects against failed requests down the line.

Parsing Listings with DOMDocument

With HTML response stored, we can parse it to extract data. We'll use PHP's DOMDocument and DOMXPath to query elements:

$dom = new DOMDocument();
$dom->loadHTML($response);

$xpath = new DOMXPath($dom);

This:

Creates a new DOMDocument
Loads our HTML response to parse
Gets an XPath object to query DOM elements

XPath allows querying elements by CSS selector like syntax.

Extracting Listing Data

Now the fun part - using XPath to extract key real estate listing details!

Inspecting the element

When we inspect element in Chrome we can see that each of the listing blocks is wrapped in a div with a class value as shown below…

First we locate all listing container elements on the page:

$listing_blocks = $xpath->query('//div[contains(@class, "BasePropertyCard_propertyCardWrap__J0xUj")]');

This queries DIVs with the class BasePropertyCard_propertyCardWrap__J0xUj which contain data for individual properties.

Note: We check for the class name containing that string since it can be minified or changed compared to inspecting the element in browser dev tools.

With listing containers selected, we loop through them:

foreach ($listing_blocks as $listing_block) {

  // Extract data from $listing_block

}

And use XPath queries within each listing block to extract details, e.g.:

Get broker name:

$broker_info = $xpath->query('.//div[contains(@class, "BrokerTitle_brokerTitle__ZkbBW")]', $listing_block);

$broker_name = $broker_info->textContent;

Get number of beds:

$beds = $xpath->query('.//li[@data-testid="property-meta-beds"]', $listing_block)->textContent;

And similarly for other fields like status, price, baths, etc. targeted by class/attribute values.

Note again how we match partial class names and check attributes to create resilient selectors.

The key is properly crafting XPath queries to extract intended listing data from the DOM.

Output and Formatting

Finally, we output and format the scraped details:

echo "Broker: " . $broker_name . "\\n";

echo "Beds: " . $beds . "\\n";

// And so on...

echo str_repeat("-", 50) . "\\n"; // Separator

Here we:

Echo the data fields, concatenating labels

Add newlines between fields for readability

Print a dashed line to separate individual listings

The output can then be processed further based on the use case - maybe store to CSV or JSON, insert to a database, etc.

And that covers a typical web scraping workflow!

Full Code

For reference, here is the full code:

<?php
// Define the URL of the Realtor.com search page
$url = "https://www.realtor.com/realestateandhomes-search/San-Francisco_CA";

// Define a User-Agent header
$headers = [
    "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"  // Replace with your User-Agent string
];

// Initialize cURL session
$ch = curl_init();

// Set cURL options
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

// Send a GET request to the URL with the User-Agent header
$response = curl_exec($ch);

// Check if the request was successful (HTTP status code 200)
if ($response && curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200) {
    // Parse the HTML content of the page using DOMDocument and DOMXPath
    $dom = new DOMDocument();
    libxml_use_internal_errors(true); // Suppress parsing errors
    $dom->loadHTML($response);
    libxml_clear_errors();

    // Create an XPath object
    $xpath = new DOMXPath($dom);

    // Find all the listing blocks using the provided class name
    $listing_blocks = $xpath->query('//div[contains(@class, "BasePropertyCard_propertyCardWrap__J0xUj")]');

    // Loop through each listing block and extract information
    foreach ($listing_blocks as $listing_block) {
        // Extract the broker information
        $broker_info = $xpath->query('.//div[contains(@class, "BrokerTitle_brokerTitle__ZkbBW")]', $listing_block)->item(0);
        $broker_name = $xpath->query('.//span[contains(@class, "BrokerTitle_titleText__20u1P")]', $broker_info)->item(0)->textContent;

        // Extract the status (e.g., For Sale)
        $status = $xpath->query('.//div[contains(@class, "message")]', $listing_block)->item(0)->textContent;

        // Extract the price
        $price = $xpath->query('.//div[contains(@class, "card-price")]', $listing_block)->item(0)->textContent;

        // Extract other details like beds, baths, sqft, and lot size
        $beds_element = $xpath->query('.//li[@data-testid="property-meta-beds"]', $listing_block)->item(0);
        $baths_element = $xpath->query('.//li[@data-testid="property-meta-baths"]', $listing_block)->item(0);
        $sqft_element = $xpath->query('.//li[@data-testid="property-meta-sqft"]', $listing_block)->item(0);
        $lot_size_element = $xpath->query('.//li[@data-testid="property-meta-lot-size"]', $listing_block)->item(0);

        // Check if the elements exist before extracting their text
        $beds = $beds_element ? $beds_element->textContent : "N/A";
        $baths = $baths_element ? $baths_element->textContent : "N/A";
        $sqft = $sqft_element ? $sqft_element->textContent : "N/A";
        $lot_size = $lot_size_element ? $lot_size_element->textContent : "N/A";

        // Extract the address
        $address = $xpath->query('.//div[contains(@class, "card-address")]', $listing_block)->item(0)->textContent;

        // Print the extracted information
        echo "Broker: " . trim($broker_name) . "\n";
        echo "Status: " . trim($status) . "\n";
        echo "Price: " . trim($price) . "\n";
        echo "Beds: " . trim($beds) . "\n";
        echo "Baths: " . trim($baths) . "\n";
        echo "Sqft: " . trim($sqft) . "\n";
        echo "Lot Size: " . trim($lot_size) . "\n";
        echo "Address: " . trim($address) . "\n";
        echo str_repeat("-", 50) . "\n";  // Separating listings
    }

} else {
    echo "Failed to retrieve the page. Status code: " . curl_getinfo($ch, CURLINFO_HTTP_CODE) . "\n";
}

// Close the cURL session
curl_close($ch);
?>

Scraping Real Estate Listings From Realtor with PHP

Installation

Installing/Enabling cURL

Defining the Target URL

Setting Request Headers with User-Agent

Initializing and Configuring cURL

Sending the Request and Processing Output

Parsing Listings with DOMDocument

Extracting Listing Data

Output and Formatting

Full Code

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Scraping Real Estate Listings From Realtor with PHP

Installation

Installing/Enabling cURL

Defining the Target URL

Setting Request Headers with User-Agent

Initializing and Configuring cURL

Sending the Request and Processing Output

Parsing Listings with DOMDocument

Extracting Listing Data

Output and Formatting

Full Code

The easiest way to do Web Scraping

Don't leave just yet!