Scraping Booking.com Property Listings in Ruby in 2023

Oct 15, 2023 · 4 min read

In this article, we will learn how to scrape property listings from Booking.com using Ruby. We will use the Nokogiri and OpenURI libraries to fetch the HTML content and then extract key information like property name, location, ratings, etc.

Prerequisites

To follow along, you will need:

  • Ruby installed on your system
  • Basic knowledge of Ruby programming
  • Bundler for installing dependencies
  • Installing Dependencies

    We need to install the Nokogiri and OpenURI libraries:

    gem install nokogiri open_uri_redirections
    

    This will download and install the latest versions.

    Requiring Libraries

    At the top of your ruby script, require the libraries:

    require 'nokogiri'
    require 'open-uri'
    

    Nokogiri will help parse the HTML content.

    OpenURI allows making HTTP requests to fetch web pages.

    Defining the Target URL

    Let's define the URL we want to scrape:

    url = "<https://www.booking.com/searchresults.html?ss=New+York&>..."
    

    We won't paste the full URL here.

    Setting User Agent

    We need to set a valid user agent header:

    headers = { 'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...' }
    

    This will make the request appear to come from a real browser.

    Fetching the HTML Page

    We can use Nokogiri::HTML and open to fetch the page HTML:

    html = Nokogiri::HTML(open(url, headers))
    

    This makes the request and parses the response into a Nokogiri document.

    Extracting Property Cards

    The property cards have a data-testid attribute we can search for:

    property_cards = html.search('div[data-testid="property-card"]')
    

    This finds all matching div elements.

    Looping Through Cards

    We can iterate through the cards:

    property_cards.each do |card|
    
      # Extract data from card
    
    end
    

    Inside this loop we will extract information from each card node.

    Extracting Title

    To get the title, we search for the data-testid="title" element:

    title = card.at('div[data-testid="title"]']&.text
    

    We grab the text contents if the element is found.

    Extracting Location

    Similarly, the address is under a data-testid="address" element:

    location = card.at('span[data-testid="address"]']&.text
    

    The pattern is the same for other fields.

    Extracting Rating

    The star rating aria-label contains the score:

    rating = card.at('div.e4755bbd60')['aria-label']
    

    Here we get the aria-label attribute from the div.

    Extracting Review Count

    The review count text is inside a class="abf093bdfe" element:

    review_count = card.at('div.abf093bdfe')&.text
    

    Extracting Description

    The description is in a class="d7449d770c" element:

    description = card.at('div.d7449d770c')&.text
    

    Printing the Data

    Finally, we can print out the extracted data:

    puts "Name: #{title}"
    puts "Location: #{location}"
    puts "Rating: #{rating}"
    # etc...
    

    And that covers scraping Booking.com property listings in Ruby! Let me know if you have any other questions.

    Full Code

    Here is the complete Ruby script:

    require 'nokogiri'
    require 'open-uri'
    
    url = "https://www.booking.com/searchresults.en-gb.html?ss=New+York&checkin=2023-03-01&checkout=2023-03-05&group_adults=2"
    
    headers = { 'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)...' }
    
    html = Nokogiri::HTML(open(url, headers))
    
    property_cards = html.search('div[data-testid="property-card"]')
    
    property_cards.each do |card|
    
      title = card.at('div[data-testid="title"]']&.text
    
      location = card.at('span[data-testid="address"]']&.text
    
      rating = card.at('div.e4755bbd60')['aria-label']
    
      review_count = card.at('div.abf093bdfe')&.text
    
      description = card.at('div.d7449d770c')&.text
    
      puts "Name: #{title}"
      puts "Location: #{location}"
      puts "Rating: #{rating}"
      puts "Review Count: #{review_count}"
      puts "Description: #{description}"
    
    end
    

    While these examples are great for learning, scraping production-level sites can pose challenges like CAPTCHAs, IP blocks, and bot detection. Rotating proxies and automated CAPTCHA solving can help.

    Proxies API offers a simple API for rendering pages with built-in proxy rotation, CAPTCHA solving, and evasion of IP blocks. You can fetch rendered pages in any language without configuring browsers or proxies yourself.

    This allows scraping at scale without headaches of IP blocks. Proxies API has a free tier to get started. Check out the API and sign up for an API key to supercharge your web scraping.

    With the power of Proxies API combined with Python libraries like Beautiful Soup, you can scrape data at scale without getting blocked.

    Browse by tags:

    Browse by language:

    The easiest way to do Web Scraping

    Get HTML from any page with a simple API call. We handle proxy rotation, browser identities, automatic retries, CAPTCHAs, JavaScript rendering, etc automatically for you


    Try ProxiesAPI for free

    curl "http://api.proxiesapi.com/?key=API_KEY&url=https://example.com"

    <!doctype html>
    <html>
    <head>
        <title>Example Domain</title>
        <meta charset="utf-8" />
        <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
    ...

    X

    Don't leave just yet!

    Enter your email below to claim your free API key: