Scraping Booking.com Property Listings in Elixir in 2023

Oct 15, 2023 · 4 min read

In this article, we will learn how to scrape property listings from Booking.com using Elixir. We will use Elixir libraries like HTTPoison and Floki to fetch the HTML content and parse/extract details like property name, location, ratings etc.

Prerequisites

To follow along, you will need:

  • Elixir 1.9+ installed
  • Basic Elixir and HTML knowledge
  • Adding Dependencies

    We will use HTTPoison for sending requests and Floki for HTML parsing.

    Add them to mix.exs:

    def deps do
      [
        {:httpoison, "~> 1.8"},
        {:floki, "~> 0.30.0"}
      ]
    end
    

    Run mix deps.get to install.

    Importing Libraries

    Import the modules:

    import HTTPoison, only: [get: 1]
    import Floki
    

    Defining URL

    Define the target URL:

    url = "<https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2>"
    

    Setting User Agent

    Set the User Agent header:

    user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
    

    Fetching the Page

    Make the GET request to fetch HTML:

    response = get(url, [], hackney: [user_agent: user_agent])
    html = response.body
    

    Pass the configured User Agent.

    Parsing the HTML

    Parse the HTML with Floki:

    page = Floki.parse_document(html)
    

    Extracting Cards

    Get elements with the data-testid attribute:

    cards = page |> Floki.find("div[data-testid='property-card']")
    

    This extracts the property cards.

    Processing Each Card

    Loop through the cards:

    cards |> Enum.each(fn card ->
    
      # Extract data from card
    
    end)
    

    Inside we can extract details from each card.

    Extracting Title

    Get the h3 text:

    title = card |> Floki.find("h3") |> Floki.text
    

    Extracting Location

    Get address span text:

    location = card |> Floki.find("span[data-testid='address']") |> Floki.text
    

    Extracting Rating

    Get aria-label attribute value:

    rating = card |> Floki.find("div.e4755bbd60") |> Floki.attribute("aria-label")
    

    Filter by class name.

    Extracting Review Count

    Get text of the div:

    review_count = card |> Floki.find("div.abf093bdfe") |> Floki.text
    

    Extracting Description

    Get description div text:

    description = card |> Floki.find("div.d7449d770c") |> Floki.text
    

    Printing the Data

    Print out the extracted details:

    IO.puts("Name: #{title}")
    IO.puts("Location: #{location}")
    IO.puts("Rating: #{rating}")
    IO.puts("Review Count: #{review_count}")
    IO.puts("Description: #{description}")
    

    Full Script

    Here is the complete scraping script:

    import HTTPoison, only: [get: 1]
    import Floki
    
    url = "<https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2>"
    
    user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36"
    
    response = get(url, [], hackney: [user_agent: user_agent])
    html = response.body
    
    page = Floki.parse_document(html)
    
    cards = page |> Floki.find("div[data-testid='property-card']")
    
    cards |> Enum.each(fn card ->
    
      title = card |> Floki.find("h3") |> Floki.text
      location = card |> Floki.find("span[data-testid='address']") |> Floki.text
      rating = card |> Floki.find("div.e4755bbd60") |> Floki.attribute("aria-label")
      review_count = card |> Floki.find("div.abf093bdfe") |> Floki.text
      description = card |> Floki.find("div.d7449d770c") |> Floki.text
    
      IO.puts("Name: #{title}")
      IO.puts("Location: #{location}")
      IO.puts("Rating: #{rating}")
      IO.puts("Review Count: #{review_count}")
      IO.puts("Description: #{description}")
    
    end)
    

    This scrapes and extracts key data from Booking.com listings using Elixir. The same approach can be used for any website.

    While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.

    Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.

    This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.

    With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.

    Browse by tags:

    Browse by language:

    Tired of getting blocked while scraping the web?

    ProxiesAPI handles headless browsers and rotates proxies for you.
    Get access to 1,000 free API credits, no credit card required!