Scraping Real Estate Listings From Realtor with Go

Jan 9, 2024 · 5 min read

Web scraping real estate listings can be a useful way to collect and analyze housing data. This article will teach you how to scrape real estate listing data from Realtor.com using Go and the goquery library.

This is the listings page we are talking about…

Prerequisites

To follow along, you'll need:

  • Golang installed on your machine
  • The following packages imported:
  • import (
        "fmt"
        "log"
        "net/http"
        "strings"
        "github.com/PuerkitoBio/goquery"
    )
    

    To install goquery, run:

    go get github.com/PuerkitoBio/goquery
    

    Making the Initial Request

    First we'll define the URL of the Realtor.com search page we want to scrape:

    url := "<https://www.realtor.com/realestateandhomes-search/San-Francisco_CA>"
    

    And set a User-Agent header to mimic a real browser request:

    userAgent := "Mozilla/5..."
    
    req.Header.Set("User-Agent", userAgent)
    

    We make a GET request to this URL using Go's http client. And check that the status code in the response is 200 OK:

    resp, err := client.Do(req)
    
    if resp.StatusCode == 200 {
        // Parsing logic here
    }
    

    So far so good! We've made a request to Realtor.com and verified we can access the page. Next we'll extract the data...

    Parsing the Page with goquery

    We'll use the goquery library to parse the HTML content of the page into a document:

    doc, err := goquery.NewDocumentFromReader(resp.Body)
    

    goquery allows querying elements similar to how you would using jQuery.

    Extracting Listing Data

    Now the real work begins - extracting actual listing data from the HTML document.

    Inspecting the element

    When we inspect element in Chrome we can see that each of the listing blocks is wrapped in a div with a class value as shown below…

    Realtor.com conveniently puts each listing card in a div with class BasePropertyCard_propertyCardWrap__J0xUj.

    We can select all of these using goquery's .Find():

    doc.Find(".BasePropertyCard_propertyCardWrap__J0xUj").Each(func(i int, listingBlock *goquery.Selection) {
    
        // Extract data from listingBlock here
    
    })
    

    This loops through each individual listing block for us to extract data from. Let's go through each data field one-by-one:

    Broker Name

    The broker name is nested under some additional span and div elements:

    <div class="BrokerTitle_brokerTitle__ZkbBW">
      <span class="BrokerTitle_titleText__20u1P">Compass</span>
    </div>
    

    We can use a nested selector to get to the broker name text:

    brokerInfo := listingBlock.Find(".BrokerTitle_brokerTitle__ZkbBW")
    brokerName := brokerInfo.Find("span.BrokerTitle_titleText__20u1P").Text()
    

    Status

    The status (e.g. 'For Sale') is under a div with class message:

    status := listingBlock.Find(".message").Text()
    

    Price

    The price is conveniently in a div with class card-price:

    price := listingBlock.Find(".card-price").Text()
    

    And so on for other fields like beds, baths etc which we can see have explicit data-testid attributes to select against:

    beds := listingBlock.Find("li[data-testid=property-meta-beds]").Text()
    
    baths := listingBlock.Find("li[data-testid=property-meta-baths]").Text()
    

    Finally we print out all the extracted info!

    fmt.Println("Broker:", strings.TrimSpace(brokerName))
    fmt.Println("Status:", strings.TrimSpace(status))
    fmt.Println("Price:", strings.TrimSpace(price))
    fmt.Println("Address:", strings.TrimSpace(address))
    

    And that's it! Here the key things to understand are:

  • Use goquery to loop through each listing block
  • Use CSS selectors to extract specific data fields
  • Nest selectors when data is buried (like broker name)
  • Trim extra whitespace from strings before printing
  • With just a few lines of goquery selectors, we were able to extract useful fields from complicated HTML.

    The full code can be seen below for reference:

    package main
    
    import (
        "fmt"
        "log"
        "net/http"
        "strings"
    
        "github.com/PuerkitoBio/goquery"
    )
    
    func main() {
        // Define the URL of the Realtor.com search page
        url := "https://www.realtor.com/realestateandhomes-search/San-Francisco_CA"
    
        // Define a User-Agent header
        userAgent := "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
    
        // Send a GET request to the URL with the User-Agent header
        client := &http.Client{}
        req, err := http.NewRequest("GET", url, nil)
        if err != nil {
            log.Fatal("Failed to create a GET request:", err)
        }
        req.Header.Set("User-Agent", userAgent)
        resp, err := client.Do(req)
        if err != nil {
            log.Fatal("Failed to send GET request:", err)
        }
        defer resp.Body.Close()
    
        // Check if the request was successful (status code 200)
        if resp.StatusCode == 200 {
            // Parse the HTML content of the page using goquery
            doc, err := goquery.NewDocumentFromReader(resp.Body)
            if err != nil {
                log.Fatal("Failed to parse HTML:", err)
            }
    
            // Find all the listing blocks using the provided class name
            doc.Find(".BasePropertyCard_propertyCardWrap__J0xUj").Each(func(i int, listingBlock *goquery.Selection) {
                // Extract the broker information
                brokerInfo := listingBlock.Find(".BrokerTitle_brokerTitle__ZkbBW")
                brokerName := brokerInfo.Find("span.BrokerTitle_titleText__20u1P").Text()
    
                // Extract the status (e.g., For Sale)
                status := listingBlock.Find(".message").Text()
    
                // Extract the price
                price := listingBlock.Find(".card-price").Text()
    
                // Extract other details like beds, baths, sqft, and lot size
                beds := listingBlock.Find("li[data-testid=property-meta-beds]").Text()
                baths := listingBlock.Find("li[data-testid=property-meta-baths]").Text()
                sqft := listingBlock.Find("li[data-testid=property-meta-sqft]").Text()
                lotSize := listingBlock.Find("li[data-testid=property-meta-lot-size]").Text()
    
                // Extract the address
                address := listingBlock.Find(".card-address").Text()
    
                // Print the extracted information
                fmt.Println("Broker:", strings.TrimSpace(brokerName))
                fmt.Println("Status:", strings.TrimSpace(status))
                fmt.Println("Price:", strings.TrimSpace(price))
                fmt.Println("Beds:", strings.TrimSpace(beds))
                fmt.Println("Baths:", strings.TrimSpace(baths))
                fmt.Println("Sqft:", strings.TrimSpace(sqft))
                fmt.Println("Lot Size:", strings.TrimSpace(lotSize))
                fmt.Println("Address:", strings.TrimSpace(address))
                fmt.Println(strings.Repeat("-", 50)) // Separating listings
            })
        } else {
            log.Println("Failed to retrieve the page. Status code:", resp.StatusCode)
        }
    }
    

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!