Scraping Real Estate Listings From Realtor in Elixir

In this article, we'll go through Elixir code that scrapes real estate listings from Realtor.com for properties in San Francisco.

This is the listings page we are talking about…

Getting Set Up

First, we'll need to add two dependencies to our mix.exs file:

def deps do
  [
    {:httpoison, "~> 1.8"},
    {:floki, "~> 0.31.0"}
  ]
end

HTTPoison allows us to make HTTP requests to the Realtor website. Floki helps parse and select elements from the HTML we get back.

The Realtor Scraper Module

Let's walk through the module step-by-step:

defmodule RealtorScraper do

This declares a module called RealtorScraper where we'll define functions for scraping.

Configuration

First we set up some configuration:

@url "<https://www.realtor.com/realestateandhomes-search/San-Francisco_CA>"

@headers %{
  "User-Agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
}

The @url points to the Realtor search page for San Francisco listings.

Some sites block scrapers - so we pass browser-like headers via @headers to appear as a regular web request.

Making the Initial Request

The scrape/0 function will make the HTTP request and kick off processing:

def scrape do

  case HTTPoison.get(@url, headers()) do

    {:ok, %{status_code: 200, body: body}} ->
      process_html(body)

    {:ok, %{status_code: status_code}} ->
      IO.puts("Failed, status code: #{status_code}")

    {:error, reason} ->
      IO.puts("Request failed: #{reason}")

  end

end

We use HTTPoison.get/2 to fetch the page. There are a few different code paths:

On a 200 OK with body, we pass to a process_html/1 function.

Other status codes print an error.

Network failures go to the catch-all error handler.

Headers Configuration

The headers/0 function formats the headers map into a list:

defp headers do
  Enum.into(@headers, [])
end

This transforms the map into a format HTTPoison expects.

Processing Listing Data

Inspecting the element

When we inspect element in Chrome we can see that each of the listing blocks is wrapped in a div with a class value as shown below…

Now let's look at parsing the HTML:

defp process_html(html) do

  {:ok, document} = Floki.parse_document(html)

  listing_blocks = Floki.find(document, "div.BasePropertyCard_propertyCardWrap__J0xUj")

  Enum.each(listing_blocks, &extract_listing_info/1)

end

Floki lets us parse the HTML into a nested structure. We search for DOM elements with class BasePropertyCard_propertyCardWrap__J0xUj which contain listing info.

Then we pass each listing block into extract_listing_info/1 to pull out details.

Extracting Listing Details

The key part is mapping Realtor's DOM structure into data we want:

defp extract_listing_info(listing_block) do

  broker_info = Floki.find_one(listing_block, "div.BrokerTitle_brokerTitle__ZkbBW")

  broker_name = Floki.find_one(broker_info, "span.BrokerTitle_titleText__20u1P") |> Floki.text() |> String.trim()

  status = Floki.find_one(listing_block, "div.message") |> Floki.text() |> String.trim()

  price = Floki.find_one(listing_block, "div.card-price") |> Floki.text() |> String.trim()

  beds_element = Floki.find_one(listing_block, "li[data-testid='property-meta-beds']")

  # And so on for other fields...

end

The key ideas here:

Use CSS selectors to extract elements

find_one gets a single element

Floki.text returns inner text

String.trim() cleans whitespace

Let's go through each extracted field one-by-one:

Broker Name

To get the broker name, we first select their info block:

broker_info = Floki.find_one(listing_block, "div.BrokerTitle_brokerTitle__ZkbBW")

Then grab the name text itself:

broker_name = Floki.find_one(broker_info, "span.BrokerTitle_titleText__20u1P") |> Floki.text() |> String.trim()

This allows extracting nested elements.

Listing Status

The status div has class message:

status = Floki.find_one(listing_block, "div.message") |> Floki.text() |> String.trim()

We use the same pattern: select element, extract text, trim whitespace.

Price

Similar idea for pricing data:

price = Floki.find_one(listing_block, "div.card-price") |> Floki.text() |> String.trim()

The key thing is identifying the correct class or attribute that targets each data field.

Beds & Baths

Beds and baths use a data-testid attribute:

beds_element = Floki.find_one(listing_block, "li[data-testid='property-meta-beds']")

beds = beds_element |> Floki.text() |> String.trim() || "N/A"

We also handle missing values by falling back to "N/A".

Square Feet & Lot Size

The pattern continues:

sqft_element = Floki.find_one(listing_block, "li[data-testid='property-meta-sqft']")

sqft = sqft_element |> Floki.text() |> String.trim() || "N/A"

lot_size_element = Floki.find_one(listing_block, "li[data-testid='property-meta-lot-size']")

lot_size = lot_size_element |> Floki.text() |> String.trim() || "N/A"

Each field uses a different test ID. We handle missing data consistently.

Address

Finally, address data resides in a card-address div:

address = Floki.find_one(listing_block, "div.card-address") |> Floki.text() |> String.trim()

And that covers extracting all the listing details!

The full code allows iterating through pagination and accumulating data on thousands of properties.

Printing Output

Finally, we can call RealtorScraper.scrape() and see the scraper in action:

Broker: Cliff Gamble - eXp Realty of California I
Status: Sale Contingent
Price: $1,795,000
Beds: 3
Baths: 3
Sqft: 2,882
Lot Size: 5,662
Address: 2 Santa Clara Ave

It extracts and prints all key fields for each listing.

The complete code allows processing hundreds of listings to build a comprehensive data set.

# Make sure to add :httpoison and :floki as dependencies in your mix.exs file

defmodule RealtorScraper do
  @url "https://www.realtor.com/realestateandhomes-search/San-Francisco_CA"
  @headers %{"User-Agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"}

  def scrape do
    case HTTPoison.get(@url, headers()) do
      {:ok, %{status_code: 200, body: body}} ->
        process_html(body)
      {:ok, %{status_code: status_code}} ->
        IO.puts("Failed to retrieve the page. Status code: #{status_code}")
      {:error, reason} ->
        IO.puts("Failed to make the request: #{reason}")
    end
  end

  defp headers do
    Enum.into(@headers, [])
  end

  defp process_html(html) do
    {:ok, document} = Floki.parse_document(html)
    listing_blocks = Floki.find(document, "div.BasePropertyCard_propertyCardWrap__J0xUj")

    Enum.each(listing_blocks, &extract_listing_info/1)
  end

  defp extract_listing_info(listing_block) do
    broker_info = Floki.find_one(listing_block, "div.BrokerTitle_brokerTitle__ZkbBW")
    broker_name = Floki.find_one(broker_info, "span.BrokerTitle_titleText__20u1P") |> Floki.text() |> String.trim()

    status = Floki.find_one(listing_block, "div.message") |> Floki.text() |> String.trim()

    price = Floki.find_one(listing_block, "div.card-price") |> Floki.text() |> String.trim()

    beds_element = Floki.find_one(listing_block, "li[data-testid='property-meta-beds']")
    beds = beds_element |> Floki.text() |> String.trim() || "N/A"

    baths_element = Floki.find_one(listing_block, "li[data-testid='property-meta-baths']")
    baths = baths_element |> Floki.text() |> String.trim() || "N/A"

    sqft_element = Floki.find_one(listing_block, "li[data-testid='property-meta-sqft']")
    sqft = sqft_element |> Floki.text() |> String.trim() || "N/A"

    lot_size_element = Floki.find_one(listing_block, "li[data-testid='property-meta-lot-size']")
    lot_size = lot_size_element |> Floki.text() |> String.trim() || "N/A"

    address = Floki.find_one(listing_block, "div.card-address") |> Floki.text() |> String.trim()

    IO.puts("Broker: #{broker_name}")
    IO.puts("Status: #{status}")
    IO.puts("Price: #{price}")
    IO.puts("Beds: #{beds}")
    IO.puts("Baths: #{baths}")
    IO.puts("Sqft: #{sqft}")
    IO.puts("Lot Size: #{lot_size}")
    IO.puts("Address: #{address}")
    IO.puts(String.duplicate("-", 50))
  end
end

# To run the scraper
RealtorScraper.scrape()

Scraping Real Estate Listings From Realtor in Elixir

Getting Set Up

The Realtor Scraper Module

Configuration

Making the Initial Request

Headers Configuration

Processing Listing Data

Extracting Listing Details

Broker Name

Listing Status

Price

Beds & Baths

Square Feet & Lot Size

Address

Printing Output

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Scraping Real Estate Listings From Realtor in Elixir

Getting Set Up

The Realtor Scraper Module

Configuration

Making the Initial Request

Headers Configuration

Processing Listing Data

Extracting Listing Details

Broker Name

Listing Status

Price

Beds & Baths

Square Feet & Lot Size

Address

Printing Output

The easiest way to do Web Scraping

Don't leave just yet!